Proceedings of the Fifth Conference on Machine Translation

Loïc Barrault, Ondřej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussà, Christian Federmann, Mark Fishel, Alexander Fraser, Yvette Graham, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri (Editors)

Anthology ID:
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the Fifth Conference on Machine Translation
Loïc Barrault | Ondřej Bojar | Fethi Bougares | Rajen Chatterjee | Marta R. Costa-jussà | Christian Federmann | Mark Fishel | Alexander Fraser | Yvette Graham | Paco Guzman | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Makoto Morishita | Christof Monz | Masaaki Nagata | Toshiaki Nakazawa | Matteo Negri

pdf bib
Tohoku-AIP-NTT at WMT 2020 News Translation TaskAIP-NTT at WMT 2020 News Translation Task
Shun Kiyono | Takumi Ito | Ryuto Konno | Makoto Morishita | Jun Suzuki

In this paper, we describe the submission of Tohoku-AIP-NTT to the WMT’20 news translation task. We participated in this task in two language pairs and four language directions : English German and English Japanese. Our system consists of techniques such as back-translation and fine-tuning, which are already widely adopted in translation tasks. We attempted to develop new methods for both synthetic data filtering and reranking. However, the methods turned out to be ineffective, and they provided us with no significant improvement over the baseline. We analyze these negative results to provide insights for future studies.

pdf bib
NRC Systems for the 2020 Inuktitut-English News Translation TaskNRC Systems for the 2020 Inuktitut-English News Translation Task
Rebecca Knowles | Darlene Stewart | Samuel Larkin | Patrick Littell

We describe the National Research Council of Canada (NRC) submissions for the 2020 Inuktitut-English shared task on news translation at the Fifth Conference on Machine Translation (WMT20). Our submissions consist of ensembled domain-specific finetuned transformer models, trained using the Nunavut Hansard and news data and, in the case of Inuktitut-English, backtranslated news and parliamentary data. In this work we explore challenges related to the relatively small amount of parallel data, morphological complexity, and domain shifts.

pdf bib
CUNI Submission for the Inuktitut Language in WMT News 2020CUNI Submission for the Inuktitut Language in WMT News 2020
Tom Kocmi

This paper describes CUNI submission to the WMT 2020 News Translation Shared Task for the low-resource scenario InuktitutEnglish in both translation directions. Our system combines transfer learning from a CzechEnglish high-resource language pair and backtranslation. We notice surprising behaviour when using synthetic data, which can be possibly attributed to a narrow domain of training and test data. We are using the Transformer model in a constrained submission.

pdf bib
Speed-optimized, Compact Student Models that Distill Knowledge from a Larger Teacher Model : the UEDIN-CUNI Submission to the WMT 2020 News Translation TaskUEDIN-CUNI Submission to the WMT 2020 News Translation Task
Ulrich Germann | Roman Grundkiewicz | Martin Popel | Radina Dobreva | Nikolay Bogoychev | Kenneth Heafield

We describe the joint submission of the University of Edinburgh and Charles University, Prague, to the Czech / English track in the WMT 2020 Shared Task on News Translation. Our fast and compact student models distill knowledge from a larger, slower teacher. They are designed to offer a good trade-off between translation quality and inference efficiency. On the WMT 2020 Czech English test sets, they achieve translation speeds of over 700 whitespace-delimited source words per second on a single CPU thread, thus making neural translation feasible on consumer hardware without a GPU.

pdf bib
The University of Edinburgh’s submission to the German-to-English and English-to-German Tracks in the WMT 2020 News Translation and Zero-shot Translation Robustness TasksUniversity of Edinburgh’s submission to the German-to-English and English-to-German Tracks in the WMT 2020 News Translation and Zero-shot Translation Robustness Tasks
Ulrich Germann

This paper describes the University of Edinburgh’s submission of German-English systems to the WMT2020 Shared Tasks on News Translation and Zero-shot Robustness.

pdf bib
SJTU-NICT’s Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation TaskSJTU-NICT’s Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task
Zuchao Li | Hai Zhao | Rui Wang | Kehai Chen | Masao Utiyama | Eiichiro Sumita

In this paper, we introduced our joint team SJTU-NICT ‘s participation in the WMT 2020 machine translation shared task. In this shared task, we participated in four translation directions of three language pairs : English-Chinese, English-Polish on supervised machine translation track, German-Upper Sorbian on low-resource and unsupervised machine translation tracks. Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques : document-enhanced NMT, XLM pre-trained language model enhanced NMT, bidirectional translation as a pre-training, reference language based UNMT, data-dependent gaussian prior objective, and BT-BLEU collaborative filtering self-training. We also used the TF-IDF algorithm to filter the training set to obtain a domain more similar set with the test set for finetuning. In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.

pdf bib
CUNI English-Czech and English-Polish Systems in WMT20 : Robust Document-Level TrainingCUNI English-Czech and English-Polish Systems in WMT20: Robust Document-Level Training
Martin Popel

We describe our two NMT systems submitted to the WMT 2020 shared task in English-Czech and English-Polish news translation. One system is sentence level, translating each sentence independently. The second system is document level, translating multiple sentences, trained on multi-sentence sequences up to 3000 characters long.

pdf bib
OPPO’s Machine Translation Systems for WMT20OPPO’s Machine Translation Systems for WMT20
Tingxun Shi | Shiyu Zhao | Xiaopu Li | Xiaoxue Wang | Qian Zhang | Di Ai | Dawei Dang | Xue Zhengshan | Jie Hao

In this paper we demonstrate our (OPPO’s) machine translation systems for the WMT20 Shared Task on News Translation for all the 22 language pairs. We will give an overview of the common aspects across all the systems firstly, including two parts : the data preprocessing part will show how the data are preprocessed and filtered, and the system part will show our models architecture and the techniques we followed. Detailed information, such as training hyperparameters and the results generated by each technique will be depicted in the corresponding subsections. Our final submissions ranked top in 6 directions (English Czech, English Russian, French German and Tamil English), third in 2 directions (English German, English Japanese), and fourth in 2 directions (English Pashto and and English Tamil).\\leftrightarrow Czech, English \\leftrightarrow Russian, French \\rightarrow German and Tamil \\rightarrow English), third in 2 directions (English \\rightarrow German, English \\rightarrow Japanese), and fourth in 2 directions (English \\rightarrow Pashto and and English \\rightarrow Tamil).

pdf bib
HW-TSC’s Participation in the WMT 2020 News Translation Shared TaskHW-TSC’s Participation in the WMT 2020 News Translation Shared Task
Daimeng Wei | Hengchao Shang | Zhanglin Wu | Zhengzhe Yu | Liangyou Li | Jiaxin Guo | Minghan Wang | Hao Yang | Lizhi Lei | Ying Qin | Shiliang Sun

This paper presents our work in the WMT 2020 News Translation Shared Task. We participate in 3 language pairs including Zh / En, Km / En, and Ps / En and in both directions under the constrained condition. We use the standard Transformer-Big model as the baseline and obtain the best performance via two variants with larger parameter sizes. We perform detailed pre-processing and filtering on the provided large-scale bilingual and monolingual dataset. Several commonly used strategies are used to train our models such as Back Translation, Ensemble Knowledge Distillation, etc. We also conduct experiment with similar language augmentation, which lead to positive results, although not used in our submission. Our submission obtains remarkable results in the final evaluation.

pdf bib
The Volctrans Machine Translation System for WMT20WMT20
Liwei Wu | Xiao Pan | Zehui Lin | Yaoming Zhu | Mingxuan Wang | Lei Li

This paper describes our submission systems for VolcTrans for WMT20 shared news translation task. We participated in 8 translation directions. Our basic systems are based on Transformer (CITATION), into which we also employed new architectures (bigger or deeper Transformers, dynamic convolution). The final systems include text pre-process, subword(a.k.a. BPE(CITATION)), baseline model training, iterative back-translation, model ensemble, knowledge distillation and multilingual pre-training.

pdf bib
The NiuTrans Machine Translation Systems for WMT20NiuTrans Machine Translation Systems for WMT20
Yuhao Zhang | Ziyang Wang | Runzhe Cao | Binghao Wei | Weiqiao Shan | Shuhan Zhou | Abudurexiti Reheman | Tao Zhou | Xin Zeng | Laohu Wang | Yongyu Mu | Jingnan Zhang | Xiaoqian Liu | Xuanjun Zhou | Yinqiao Li | Bei Li | Tong Xiao | Jingbo Zhu

This paper describes NiuTrans neural machine translation systems of the WMT20 news translation tasks. We participated in Japanese-English, English-Chinese, Inuktitut-English and Tamil-English total five tasks and rank first in Japanese-English both sides. We mainly utilized iterative back-translation, different depth and widen model architectures, iterative knowledge distillation and iterative fine-tuning. And we find that adequately widened and deepened the model simultaneously, the performance will significantly improve. Also, iterative fine-tuning strategy we implemented is effective during adapting domain. For Inuktitut-English and Tamil-English tasks, we built multilingual models separately and employed pretraining word embedding to obtain better performance.

pdf bib
Gender Coreference and Bias Evaluation at WMT 2020WMT 2020
Tom Kocmi | Tomasz Limisiewicz | Gabriel Stanovsky

Gender bias in machine translation can manifest when choosing gender inflections based on spurious gender correlations. For example, always translating doctors as men and nurses as women. This can be particularly harmful as models become more popular and deployed within commercial systems. Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages : Czech, German, Polish, and Russian. To achieve this, we use WinoMT, a recent automatic test suite which examines gender coreference and bias when translating from English to languages with grammatical gender. We extend WinoMT to handle two new languages tested in WMT : Polish and Czech. We find that all systems consistently use spurious correlations in the data rather than meaningful contextual information.

pdf bib
Translating Similar Languages : Role of Mutual Intelligibility in Multilingual Transformers
Ife Adebara | El Moatez Billah Nagoudi | Muhammad Abdul Mageed

In this work we investigate different approaches to translate between similar languages despite low resource limitations. This work is done as the participation of the UBC NLP research group in the WMT 2019 Similar Languages Translation Shared Task. We participated in all language pairs and performed various experiments. We used a transformer architecture for all the models and used back-translation for one of the language pairs. We explore both bilingual and multi-lingual approaches. We describe the pre-processing, training, translation and results for each model. We also investigate the role of mutual intelligibility in model performance.

pdf bib
The IPN-CIC team system submission for the WMT 2020 similar language taskIPN-CIC team system submission for the WMT 2020 similar language task
Luis A. Menéndez-Salazar | Grigori Sidorov | Marta R. Costa-Jussà

This paper describes the participation of the NLP research team of the IPN Computer Research center in the WMT 2020 Similar Language Translation Task. We have submitted systems for the Spanish-Portuguese language pair (in both directions). The three submitted systems are based on the Transformer architecture and used fine tuning for domain Adaptation.

pdf bib
NUIG-Panlingua-KMI Hindi-Marathi MT Systems for Similar Language Translation Task @ WMT 2020NUIG-Panlingua-KMI Hindi-Marathi MT Systems for Similar Language Translation Task @ WMT 2020
Atul Kr. Ojha | Priya Rani | Akanksha Bansal | Bharathi Raja Chakravarthi | Ritesh Kumar | John P. McCrae

NUIG-Panlingua-KMI submission to WMT 2020 seeks to push the state-of-the-art in Similar Language Translation Task for HindiMarathi language pair. As part of these efforts, we conducteda series of experiments to address the challenges for translation between similar languages. Among the 4 MT systems prepared under this task, 1 PBSMT systems were prepared for HindiMarathi each and 1 NMT systems were developed for HindiMarathi using Byte PairEn-coding (BPE) into subwords. The results show that different architectures NMT could be an effective method for developing MT systems for closely related languages. Our Hindi-Marathi NMT system was ranked 8th among the 14 teams that participated and our Marathi-Hindi NMT system was ranked 8th among the 11 teams participated for the task.

pdf bib
Document Level NMT of Low-Resource Languages with BacktranslationNMT of Low-Resource Languages with Backtranslation
Sami Ul Haq | Sadaf Abdul Rauf | Arsalan Shaukat | Abdullah Saeed

This paper describes our system submission to WMT20 shared task on similar language translation. We examined the use of documentlevel neural machine translation (NMT) systems for low-resource, similar language pair MarathiHindi. Our system is an extension of state-of-the-art Transformer architecture with hierarchical attention networks to incorporate contextual information. Since, NMT requires large amount of parallel data which is not available for this task, our approach is focused on utilizing monolingual data with back translation to train our models. Our experiments reveal that document-level NMT can be a reasonable alternative to sentence-level NMT for improving translation quality of low resourced languages even when used with synthetic data.

pdf bib
The University of Maryland’s Submissions to the WMT20 Chat Translation Task : Searching for More Data to Adapt Discourse-Aware Neural Machine TranslationUniversity of Maryland’s Submissions to the WMT20 Chat Translation Task: Searching for More Data to Adapt Discourse-Aware Neural Machine Translation
Calvin Bao | Yow-Ting Shiue | Chujun Song | Jie Li | Marine Carpuat

This paper describes the University of Maryland’s submissions to the WMT20 Shared Task on Chat Translation. We focus on translating agent-side utterances from English to German. We started from an off-the-shelf BPE-based standard transformer model trained with WMT17 news and fine-tuned it with the provided in-domain training data. In addition, we augment the training set with its best matches in the WMT19 news dataset. Our primary submission uses a standard Transformer, while our contrastive submissions use multi-encoder Transformers to attend to previous utterances. Our primary submission achieves 56.7 BLEU on the agent side (ende), outperforming a baseline system provided by the task organizers by more than 13 BLEU points. Moreover, according to an evaluation on a set of carefully-designed examples, the multi-encoder architecture is able to generate more coherent translations.

pdf bib
Fast Interleaved Bidirectional Sequence Generation
Biao Zhang | Ivan Titov | Rico Sennrich

Independence assumptions during sequence generation can speed up inference, but parallel generation of highly inter-dependent tokens comes at a cost in quality. Instead of assuming independence between neighbouring tokens (semi-autoregressive decoding, SA), we take inspiration from bidirectional sequence generation and introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously. We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder by simply interleaving the two directions and adapting the word positions and selfattention masks. Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer, and on five machine translation tasks and two document summarization tasks, achieves a decoding speedup of ~2x compared to autoregressive decoding with comparable quality. Notably, it outperforms left-to-right SA because the independence assumptions in IBDecoder are more felicitous. To achieve even higher speedups, we explore hybrid models where we either simultaneously predict multiple neighbouring tokens per direction, or perform multi-directional decoding by partitioning the target sequence. These methods achieve speedups to 4x11x across different tasks at the cost of 1 BLEU or 0.5 ROUGE (on average)

pdf bib
Towards Multimodal Simultaneous Neural Machine Translation
Aizhan Imankulova | Masahiro Kaneko | Tosho Hirasawa | Mamoru Komachi

Simultaneous translation involves translating a sentence before the speaker’s utterance is completed in order to realize real-time understanding in multiple languages. This task is significantly more challenging than the general full sentence translation because of the shortage of input information during decoding. To alleviate this shortage, we propose multimodal simultaneous neural machine translation (MSNMT), which leverages visual information as an additional modality. Our experiments with the Multi30k dataset showed that MSNMT significantly outperforms its text-only counterpart in more timely translation situations with low latency. Furthermore, we verified the importance of visual information during decoding by performing an adversarial evaluation of MSNMT, where we studied how models behaved with incongruent input modality and analyzed the effect of different word order between source and target languages.

pdf bib
Document-aligned Japanese-English Conversation Parallel CorpusJapanese-English Conversation Parallel Corpus
Matīss Rikters | Ryokan Ri | Tong Li | Toshiaki Nakazawa

Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT, which is difficult to 1) train with little amount of DL data ; and 2) evaluate, as the main methods and data sets focus on SL evaluation. To address the first issue, we present a document-aligned Japanese-English conversation corpus, including balanced, high-quality business conversation data for tuning and testing. As for the second issue, we manually identify the main areas where SL MT fails to produce adequate translations in lack of context. We then create an evaluation set where these phenomena are annotated to alleviate automatic evaluation of DL systems. We train MT models using our corpus to demonstrate how using context leads to improvements.

pdf bib
Findings of the WMT 2020 Shared Task on Automatic Post-EditingWMT 2020 Shared Task on Automatic Post-Editing
Rajen Chatterjee | Markus Freitag | Matteo Negri | Marco Turchi

We present the results of the 6th round of the WMT task on MT Automatic Post-Editing. The task consists in automatically correcting the output of a black-box machine translation system by learning from existing human corrections of different sentences. This year, the challenge consisted of fixing the errors present in English Wikipedia pages translated into German and Chinese by state-ofthe-art, not domain-adapted neural MT (NMT) systems unknown to participants. Six teams participated in the English-German task, submitting a total of 11 runs. Two teams participated in the English-Chinese task submitting 2 runs each. Due to i) the different source / domain of data compared to the past (Wikipedia vs Information Technology), ii) the different quality of the initial translations to be corrected and iii) the introduction of a new language pair (English-Chinese), this year’s results are not directly comparable with last year’s round. However, on both language directions, participants’ submissions show considerable improvements over the baseline results. On English-German, the top ranked system improves over the baseline by -11.35 TER and +16.68 BLEU points, while on EnglishChinese the improvements are respectively up to -12.13 TER and +14.57 BLEU points. Overall, coherent gains are also highlighted by the outcomes of human evaluation, which confirms the effectiveness of APE to improve MT quality, especially in the new generic domain selected for this year’s round.

pdf bib
Results of the WMT20 Metrics Shared TaskWMT20 Metrics Shared Task
Nitika Mathur | Johnny Wei | Markus Freitag | Qingsong Ma | Ondřej Bojar

This paper presents the results of the WMT20 Metrics Shared Task. Participants were asked to score the outputs of the translation systems competing in the WMT20 News Translation Task with automatic metrics. Ten research groups submitted 27 metrics, four of which are reference-less metrics. In addition, we computed five baseline metrics, including sentBLEU, BLEU, TER and using the SacreBLEU scorer. All metrics were evaluated on how well they correlate at the system-, document- and segment-level with the WMT20 official human scores. We present an extensive analysis on influence of different reference translations on metric reliability, how well automatic metrics score human translations, and we also flag major discrepancies between metric and human scores when evaluating MT systems. Finally, we investigate whether we can use automatic metrics to flag incorrect human ratings.

pdf bib
Cross-Lingual Transformers for Neural Automatic Post-Editing
Dongjun Lee

In this paper, we describe the Bering Lab’s submission to the WMT 2020 Shared Task on Automatic Post-Editing (APE). First, we propose a cross-lingual Transformer architecture that takes a concatenation of a source sentence and a machine-translated (MT) sentence as an input to generate the post-edited (PE) output. For further improvement, we mask incorrect or missing words in the PE output based on word-level quality estimation and then predict the actual word for each mask based on the fine-tuned cross-lingual language model (XLM-RoBERTa). Finally, to address the over-correction problem, we select the final output among the PE outputs and the original MT sentence based on a sentence-level quality estimation. When evaluated on the WMT 2020 English-German APE test dataset, our system improves the NMT output by -3.95 and +4.50 in terms of TER and BLEU, respectively.

pdf bib
POSTECH-ETRI’s Submission to the WMT2020 APE Shared Task : Automatic Post-Editing with Cross-lingual Language ModelPOSTECH-ETRI’s Submission to the WMT2020 APE Shared Task: Automatic Post-Editing with Cross-lingual Language Model
Jihyung Lee | WonKee Lee | Jaehun Shin | Baikjin Jung | Young-Kil Kim | Jong-Hyeok Lee

This paper describes POSTECH-ETRI’s submission to WMT2020 for the shared task on automatic post-editing (APE) for 2 language pairs : English-German (En-De) and English-Chinese (En-Zh). We propose APE systems based on a cross-lingual language model, which jointly adopts translation language modeling (TLM) and masked language modeling (MLM) training objectives in the pre-training stage ; the APE models then utilize jointly learned language representations between the source language and the target language. In addition, we created 19 million new sythetic triplets as additional training data for our final ensemble model. According to experimental results on the WMT2020 APE development data set, our models showed an improvement over the baseline by TER of -3.58 and a BLEU score of +5.3 for the En-De subtask ; and TER of -5.29 and a BLEU score of +7.32 for the En-Zh subtask.

pdf bib
Alibaba’s Submission for the WMT 2020 APE Shared Task : Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERTAlibaba’s Submission for the WMT 2020 APE Shared Task: Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERT
Jiayi Wang | Ke Wang | Kai Fan | Yuqi Zhang | Jun Lu | Xin Ge | Yangbin Shi | Yu Zhao

The goal of Automatic Post-Editing (APE) is basically to examine the automatic methods for correcting translation errors generated by an unknown machine translation (MT) system. This paper describes Alibaba’s submissions to the WMT 2020 APE Shared Task for the English-German language pair. We design a two-stage training pipeline. First, a BERT-like cross-lingual language model is pre-trained by randomly masking target sentences alone. Then, an additional neural decoder on the top of the pre-trained model is jointly fine-tuned for the APE task. We also apply an imitation learning strategy to augment a reasonable amount of pseudo APE training data, potentially preventing the model to overfit on the limited real training data and boosting the performance on held-out data. To verify our proposed model and data augmentation, we examine our approach with the well-known benchmarking English-German dataset from the WMT 2017 APE task. The experiment results demonstrate that our system significantly outperforms all other baselines and achieves the state-of-the-art performance. The final results on the WMT 2020 test dataset show that our submission can achieve +5.56 BLEU and -4.57 TER with respect to the official MT baseline.

pdf bib
LIMSI @ WMT 2020LIMSI @ WMT 2020
Sadaf Abdul Rauf | José Carlos Rosales Núñez | Minh Quang Pham | François Yvon

This paper describes LIMSI’s submissions to the translation shared tasks at WMT’20. This year we have focused our efforts on the biomedical translation task, developing a resource-heavy system for the translation of medical abstracts from English into French, using back-translated texts, terminological resources as well as multiple pre-processing pipelines, including pre-trained representations. Systems were also prepared for the robustness task for translating from English into German ; for this large-scale task we developed multi-domain, noise-robust, translation systems aim to handle the two test conditions : zero-shot and few-shot domain adaptation.

pdf bib
Elhuyar submission to the Biomedical Translation Task 2020 on terminology and abstracts translation
Ander Corral | Xabier Saralegi

This article describes the systems submitted by Elhuyar to the 2020 Biomedical Translation Shared Task, specifically the systems presented in the subtasks of terminology translation for English-Basque and abstract translation for English-Basque and English-Spanish. In all cases a Transformer architecture was chosen and we studied different strategies to combine open domain data with biomedical domain data for building the training corpora. For the English-Basque pair, given the scarcity of parallel corpora in the biomedical domain, we set out to create domain training data in a synthetic way. The systems presented in the terminology and abstract translation subtasks for the English-Basque language pair ranked first in their respective tasks among four participants, achieving 0.78 accuracy for terminology translation and a BLEU of 0.1279 for the translation of abstracts. In the abstract translation task for the English-Spanish pair our team ranked second (BLEU=0.4498) in the case of OK sentences.

pdf bib
YerevaNN’s Systems for WMT20 Biomedical Translation Task : The Effect of Fixing Misaligned Sentence PairsYerevaNN’s Systems for WMT20 Biomedical Translation Task: The Effect of Fixing Misaligned Sentence Pairs
Karen Hambardzumyan | Hovhannes Tamoyan | Hrant Khachatrian

This report describes YerevaNN’s neural machine translation systems and data processing pipelines developed for WMT20 biomedical translation task. We provide systems for English-Russian and English-German language pairs. For the English-Russian pair, our submissions achieve the best BLEU scores, with enru direction outperforming the other systems by a significant margin. We explain most of the improvements by our heavy data preprocessing pipeline which attempts to fix poorly aligned sentences in the parallel data.\\rightarrowru direction outperforming the other systems by a significant margin. We explain most of the improvements by our heavy data preprocessing pipeline which attempts to fix poorly aligned sentences in the parallel data.

pdf bib
Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine TranslationEnglish-Basque Biomedical Neural Machine Translation
Inigo Jauregi Unanue | Massimo Piccardi

This paper describes the machine translation systems proposed by the University of Technology Sydney Natural Language Processing (UTS_NLP) team for the WMT20 English-Basque biomedical translation tasks. Due to the limited parallel corpora available, we have proposed to train a BERT-fused NMT model that leverages the use of pretrained language models. Furthermore, we have augmented the training corpus by backtranslating monolingual data. Our experiments show that NMT models in low-resource scenarios can benefit from combining these two training techniques, with improvements of up to 6.16 BLEU percentual points in the case of biomedical abstract translations.

pdf bib
Lite Training Strategies for Portuguese-English and English-Portuguese TranslationPortuguese-English and English-Portuguese Translation
Alexandre Lopes | Rodrigo Nogueira | Roberto Lotufo | Helio Pedrini

Despite the widespread adoption of deep learning for machine translation, it is still expensive to develop high-quality translation models. In this work, we investigate the use of pre-trained models, such as T5 for Portuguese-English and English-Portuguese translation tasks using low-cost hardware. We explore the use of Portuguese and English pre-trained language models and propose an adaptation of the English tokenizer to represent Portuguese characters, such as diaeresis, acute and grave accents. We compare our models to the Google Translate API and MarianMT on a subset of the ParaCrawl dataset, as well as to the winning submission to the WMT19 Biomedical Translation Shared Task. We also describe our submission to the WMT20 Biomedical Translation Shared Task. Our results show that our models have a competitive performance to state-of-the-art models while being trained on modest hardware (a single 8 GB gaming GPU for nine days). Our data, models and code are available in our GitHub repository.

pdf bib
Addressing Exposure Bias With Document Minimum Risk Training : Cambridge at the WMT20 Biomedical Translation TaskCambridge at the WMT20 Biomedical Translation Task
Danielle Saunders | Bill Byrne

The 2020 WMT Biomedical translation task evaluated Medline abstract translations. This is a small-domain translation task, meaning limited relevant training data with very distinct style and vocabulary. Models trained on such data are susceptible to exposure bias effects, particularly when training sentence pairs are imperfect translations of each other. This can result in poor behaviour during inference if the model learns to neglect the source sentence. The UNICAM entry addresses this problem during fine-tuning using a robust variant on Minimum Risk Training. We contrast this approach with data-filtering to remove ‘problem’ training examples. Under MRT fine-tuning we obtain good results for both directions of English-German and English-Spanish biomedical translation. In particular we achieve the best English-to-Spanish translation result and second-best Spanish-to-English result, despite using only single models with no ensembling.

pdf bib
Unbabel’s Participation in the WMT20 Metrics Shared TaskWMT20 Metrics Shared Task
Ricardo Rei | Craig Stewart | Ana C Farinha | Alon Lavie

We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics. We intend to participate on the segmentlevel, document-level and system-level tracks on all language pairs, as well as the QE as a Metric track. Accordingly, we illustrate results of our models in these tracks with reference to test sets from the previous year. Our submissions build upon the recently proposed COMET framework : we train several estimator models to regress on different humangenerated quality scores and a novel ranking model trained on relative ranks obtained from Direct Assessments. We also propose a simple technique for converting segment-level predictions into a document-level score. Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.

pdf bib
Incorporate Semantic Structures into Machine Translation Evaluation via UCCAUCCA
Jin Xu | Yinuo Guo | Junfeng Hu

Copying mechanism has been commonly used in neural paraphrasing networks and other text generation tasks, in which some important words in the input sequence are preserved in the output sequence. Similarly, in machine translation, we notice that there are certain words or phrases appearing in all good translations of one source text, and these words tend to convey important semantic information. Therefore, in this work, we define words carrying important semantic meanings in sentences as semantic core words. Moreover, we propose an MT evaluation approach named Semantically Weighted Sentence Similarity (SWSS). It leverages the power of UCCA to identify semantic core words, and then calculates sentence similarity scores on the overlap of semantic core words. Experimental results show that SWSS can consistently improve the performance of popular MT evaluation metrics which are based on lexical similarity.

pdf bib
Filtering Noisy Parallel Corpus using Transformers with Proxy Task Learning
Haluk Açarçiçek | Talha Çolakoğlu | Pınar Ece Aktan Hatipoğlu | Chong Hsuan Huang | Wei Peng

This paper illustrates Huawei’s submission to the WMT20 low-resource parallel corpus filtering shared task. Our approach focuses on developing a proxy task learner on top of a transformer-based multilingual pre-trained language model to boost the filtering capability for noisy parallel corpora. Such a supervised task also helps us to iterate much more quickly than using an existing neural machine translation system to perform the same task. After performing empirical analyses of the finetuning task, we benchmark our approach by comparing the results with past years’ state-of-theart records. This paper wraps up with a discussion of limitations and future work. The scripts for this study will be made publicly available.

pdf bib
Score Combination for Improved Parallel Corpus Filtering for Low Resource Conditions
Muhammad ElNokrashy | Amr Hendy | Mohamed Abdelghaffar | Mohamed Afify | Ahmed Tawfik | Hany Hassan Awadalla

This paper presents the description of our submission to WMT20 sentence filtering task. We combine scores from custom LASER built for each source language, a classifier built to distinguish positive and negative pairs and the original scores provided with the task. For the mBART setup, provided by the organizers, our method shows 7 % and 5 % relative improvement, over the baseline, in sacreBLEU score on the test set for Pashto and Khmer respectively.

pdf bib
An exploratory approach to the Parallel Corpus Filtering shared task WMT20WMT20
Ankur Kejriwal | Philipp Koehn

In this document we describe our submission to the parallel corpus filtering task using multilingual word embedding, language models and an ensemble of pre and post filtering rules. We use the norms of embedding and the perplexities of language models along with pre / post filtering rules to complement the LASER baseline scores and in the end get an improvement on the dev set in both language pairs.

pdf bib
PATQUEST : Papago Translation Quality EstimationPATQUEST: Papago Translation Quality Estimation
Yujin Baek | Zae Myung Kim | Jihyung Moon | Hyunjoong Kim | Eunjeong Park

This paper describes the system submitted by Papago team for the quality estimation task at WMT 2020. It proposes two key strategies for quality estimation : (1) task-specific pretraining scheme, and (2) task-specific data augmentation. The former focuses on devising learning signals for pretraining that are closely related to the downstream task. We also present data augmentation techniques that simulate the varying levels of errors that the downstream dataset may contain. Thus, our PATQUEST models are exposed to erroneous translations in both stages of task-specific pretraining and finetuning, effectively enhancing their generalization capability. Our submitted models achieve significant improvement over the baselines for Task 1 (Sentence-Level Direct Assessment ; EN-DE only), and Task 3 (Document-Level Score).

pdf bib
Two-Phase Cross-Lingual Language Model Fine-Tuning for Machine Translation Quality Estimation
Dongjun Lee

In this paper, we describe the Bering Lab’s submission to the WMT 2020 Shared Task on Quality Estimation (QE). For word-level and sentence-level translation quality estimation, we fine-tune XLM-RoBERTa, the state-of-the-art cross-lingual language model, with a few additional parameters. Model training consists of two phases. We first pre-train our model on a huge artificially generated QE dataset, and then we fine-tune the model with a human-labeled dataset. When evaluated on the WMT 2020 English-German QE test set, our systems achieve the best result on the target-side of word-level QE and the second best results on the source-side of word-level QE and sentence-level QE among all submissions.

pdf bib
IST-Unbabel Participation in the WMT20 Quality Estimation Shared TaskIST-Unbabel Participation in the WMT20 Quality Estimation Shared Task
João Moura | Miguel Vera | Daan van Stigt | Fabio Kepler | André F. T. Martins

We present the joint contribution of IST and Unbabel to the WMT 2020 Shared Task on Quality Estimation. Our team participated on all tracks (Direct Assessment, Post-Editing Effort, Document-Level), encompassing a total of 14 submissions. Our submitted systems were developed by extending the OpenKiwi framework to a transformer-based predictor-estimator architecture, and to cope with glass-box, uncertainty-based features coming from neural machine translation systems.

pdf bib
TransQuest at WMT2020 : Sentence-Level Direct AssessmentTransQuest at WMT2020: Sentence-Level Direct Assessment
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov

This paper presents the team TransQuest’s participation in Sentence-Level Direct Assessment shared task in WMT 2020. We introduce a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. The proposed methods achieve state-of-the-art results surpassing the results obtained by OpenKiwi, the baseline used in the shared task. We further fine tune the QE framework by performing ensemble and data augmentation. Our approach is the winning solution in all of the language pairs according to the WMT 2020 official results.

pdf bib
Tencent submission for WMT20 Quality Estimation Shared TaskWMT20 Quality Estimation Shared Task
Haijiang Wu | Zixuan Wang | Qingsong Ma | Xinjie Wen | Ruichen Wang | Xiaoli Wang | Yulin Zhang | Zhipeng Yao | Siyao Peng

This paper presents Tencent’s submission to the WMT20 Quality Estimation (QE) Shared Task : Sentence-Level Post-editing Effort for English-Chinese in Task 2. Our system ensembles two architectures, XLM-based and Transformer-based Predictor-Estimator models. For the XLM-based Predictor-Estimator architecture, the predictor produces two types of contextualized token representations, i.e., masked XLM and non-masked XLM ; the LSTM-estimator and Transformer-estimator employ two effective strategies, top-K and multi-head attention, to enhance the sentence feature representation. For Transformer-based Predictor-Estimator architecture, we improve a top-performing model by conducting three modifications : using multi-decoding in machine translation module, creating a new model by replacing the transformer-based predictor with XLM-based predictor, and finally integrating two models by a weighted average. Our submission achieves a Pearson correlation of 0.664, ranking first (tied) on English-Chinese.

pdf bib
NLPRL System for Very Low Resource Supervised Machine TranslationNLPRL System for Very Low Resource Supervised Machine Translation
Rupjyoti Baruah | Rajesh Kumar Mundotiya | Amit Kumar | Anil kumar Singh

This paper describes the results of the system that we used for the WMT20 very low resource (VLR) supervised MT shared task. For our experiments, we use a byte-level version of BPE, which requires a base vocabulary of size 256 only. BPE based models are a kind of sub-word models. Such models try to address the Out of Vocabulary (OOV) word problem by performing word segmentation so that segments correspond to morphological units. They are also reported to work across different languages, especially similar languages due to their sub-word nature. Based on BLEU cased score, our NLPRL systems ranked ninth for HSB to GER and tenth in GER to HSB translation scenario.

pdf bib
UdS-DFKI@WMT20 : Unsupervised MT and Very Low Resource Supervised MT for German-Upper SorbianUdS-DFKI@WMT20: Unsupervised MT and Very Low Resource Supervised MT for German-Upper Sorbian
Sourav Dutta | Jesujoba Alabi | Saptarashmi Bandyopadhyay | Dana Ruiter | Josef van Genabith

This paper describes the UdS-DFKI submission to the shared task for unsupervised machine translation (MT) and very low-resource supervised MT between German (de) and Upper Sorbian (hsb) at the Fifth Conference of Machine Translation (WMT20). We submit systems for both the supervised and unsupervised tracks. Apart from various experimental approaches like bitext mining, model pre-training, and iterative back-translation, we employ a factored machine translation approach on a small BPE vocabulary.

pdf bib
CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20
Ivana Kvapilíková | Tom Kocmi | Ondřej Bojar

This paper presents a description of CUNI systems submitted to the WMT20 task on unsupervised and very low-resource supervised machine translation between German and Upper Sorbian. We experimented with training on synthetic data and pre-training on a related language pair. In the fully unsupervised scenario, we achieved 25.5 and 23.7 BLEU translating from and into Upper Sorbian, respectively. Our low-resource systems relied on transfer learning from German-Czech parallel data and achieved 57.4 BLEU and 56.1 BLEU, which is an improvement of 10 BLEU points over the baseline trained only on the available small German-Upper Sorbian parallel corpus.

pdf bib
The University of Helsinki and Aalto University submissions to the WMT 2020 news and low-resource translation tasksUniversity of Helsinki and Aalto University submissions to the WMT 2020 news and low-resource translation tasks
Yves Scherrer | Stig-Arne Grönroos | Sami Virpioja

This paper describes the joint participation of University of Helsinki and Aalto University to two shared tasks of WMT 2020 : the news translation between Inuktitut and English and the low-resource translation between German and Upper Sorbian. For both tasks, our efforts concentrate on efficient use of monolingual and related bilingual corpora with scheduled multi-task learning as well as an optimized subword segmentation with sampling. Our submission obtained the highest score for Upper Sorbian-German and was ranked second for German-Upper Sorbian according to BLEU scores. For EnglishInuktitut, we reached ranks 8 and 10 out of 11 according to BLEU scores.

pdf bib
The NITS-CNLP System for the Unsupervised MT Task at WMT 2020NITS-CNLP System for the Unsupervised MT Task at WMT 2020
Salam Michael Singh | Thoudam Doren Singh | Sivaji Bandyopadhyay

We describe NITS-CNLP’s submission to WMT 2020 unsupervised machine translation shared task for German language (de) to Upper Sorbian (hsb) in a constrained setting i.e, using only the data provided by the organizers. We train our unsupervised model using monolingual data from both the languages by jointly pre-training the encoder and decoder and fine-tune using backtranslation loss. The final model uses the source side (de) monolingual data and the target side (hsb) synthetic data as a pseudo-parallel data to train a pseudo-supervised system which is tuned using the provided development set(dev set).

pdf bib
Adobe AMPS’s Submission for Very Low Resource Supervised Translation Task at WMT20AMPS’s Submission for Very Low Resource Supervised Translation Task at WMT20
Keshaw Singh

In this paper, we describe our systems submitted to the very low resource supervised translation task at WMT20. We participate in both translation directions for Upper Sorbian-German language pair. Our primary submission is a subword-level Transformer-based neural machine translation model trained on original training bitext. We also conduct several experiments with backtranslation using limited monolingual data in our post-submission work and include our results for the same. In one such experiment, we observe jumps of up to 2.6 BLEU points over the primary system by pretraining on a synthetic, backtranslated corpus followed by fine-tuning on the original parallel training data.

pdf bib
Human-Paraphrased References Improve Neural Machine Translation
Markus Freitag | George Foster | David Grangier | Colin Cherry

Automatic evaluation comparing candidate translations to human-generated paraphrases of reference translations has recently been proposed by freitag2020bleu. When used in place of original references, the paraphrased versions produce metric scores that correlate better with human judgment. This effect holds for a variety of different automatic metrics, and tends to favor natural formulations over more literal (translationese) ones. In this paper we compare the results of performing end-to-end system development using standard and paraphrased references. With state-of-the-art English-German NMT components, we show that tuning to paraphrased references produces a system that is ignificantly better according to human judgment, but 5 BLEU points worse when tested on standard references. Our work confirms the finding that paraphrased references yield metric scores that correlate better with human judgment, and demonstrates for the first time that using these scores for system development can lead to significant improvements.

pdf bib
Incorporating Terminology Constraints in Automatic Post-Editing
David Wan | Chris Kedzie | Faisal Ladhak | Marine Carpuat | Kathleen McKeown

Users of machine translation (MT) may want to ensure the use of specific lexical terminologies. While there exist techniques for incorporating terminology constraints during inference for MT, current APE approaches can not ensure that they will appear in the final translation. In this paper, we present both autoregressive and non-autoregressive models for lexically constrained APE, demonstrating that our approach enables preservation of 95 % of the terminologies and also improves translation quality on English-German benchmarks. Even when applied to lexically constrained MT output, our approach is able to improve preservation of the terminologies. However, we show that our models do not learn to copy constraints systematically and suggest a simple data augmentation technique that leads to improved performance and robustness.