Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Marco Turchi, Karin Verspoor (Editors)

Anthology ID:
Florence, Italy
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor

pdf bib
Results of the WMT19 Metrics Shared Task : Segment-Level and Strong MT Systems Pose Big ChallengesWMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges
Qingsong Ma | Johnny Wei | Ondřej Bojar | Yvette Graham

This paper presents the results of the WMT19 Metrics Shared Task. Participants were asked to score the outputs of the translations systems competing in the WMT19 News Translation Task with automatic metrics. 13 research groups submitted 24 metrics, 10 of which are reference-less metrics and constitute submissions to the joint task with WMT19 Quality Estimation Task, QE as a Metric. In addition, we computed 11 baseline metrics, with 8 commonly applied baselines (BLEU, SentBLEU, NIST, WER, PER, TER, CDER, and chrF) and 3 reimplementations (chrF+, sacreBLEU-BLEU, and sacreBLEU-chrF). Metrics were evaluated on the system level, how well a given metric correlates with the WMT19 official manual ranking, and segment level, how well the metric correlates with human judgements of segment quality. This year, we use direct assessment (DA) as our only form of manual evaluation.

pdf bib
Findings of the First Shared Task on Machine Translation Robustness
Xian Li | Paul Michel | Antonios Anastasopoulos | Yonatan Belinkov | Nadir Durrani | Orhan Firat | Philipp Koehn | Graham Neubig | Juan Pino | Hassan Sajjad

We share the findings of the first shared task on improving robustness of Machine Translation (MT). The task provides a testbed representing challenges facing MT models deployed in the real world, and facilitates new approaches to improve models’ robustness to noisy input and domain mismatch. We focus on two language pairs (English-French and English-Japanese), and the submitted systems are evaluated on a blind test set consisting of noisy comments on Reddit and professionally sourced translations. As a new task, we received 23 submissions by 11 participating teams from universities, companies, national labs, etc. All submitted systems achieved large improvements over baselines, with the best improvement having +22.33 BLEU. We evaluated submissions by both human judgment and automatic evaluation (BLEU), which shows high correlations (Pearson’s r = 0.94 and 0.95). Furthermore, we conducted a qualitative analysis of the submitted systems using compare-mt, which revealed their salient differences in handling challenges in this task. Such analysis provides additional insights when there is occasional disagreement between human judgment and BLEU, e.g. systems better at producing colloquial expressions received higher score from human judgment.

pdf bib
The University of Edinburgh’s Submissions to the WMT19 News Translation TaskUniversity of Edinburgh’s Submissions to the WMT19 News Translation Task
Rachel Bawden | Nikolay Bogoychev | Ulrich Germann | Roman Grundkiewicz | Faheem Kirefu | Antonio Valerio Miceli Barone | Alexandra Birch

The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions : EnglishGujarati, EnglishChinese, GermanEnglish, and EnglishCzech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For EnglishGujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For GermanEnglish, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. For EnglishCzech, we compared different preprocessing and tokenisation regimes.

pdf bib
GTCOM Neural Machine Translation Systems for WMT19GTCOM Neural Machine Translation Systems for WMT19
Chao Bei | Hao Zong | Conghu Yuan | Qingming Liu | Baoyong Fan

This paper describes the Global Tone Communication Co., Ltd.’s submission of the WMT19 shared news translation task. We participate in six directions : English to (Gujarati, Lithuanian and Finnish) and (Gujarati, Lithuanian and Finnish) to English. Further, we get the best BLEU scores in the directions of English to Gujarati and Lithuanian to English (28.2 and 36.3 respectively) among all the participants. The submitted systems mainly focus on back-translation, knowledge distillation and reranking to build a competitive model for this task. Also, we apply language model to filter monolingual data, back-translated data and parallel data. The techniques we apply for data filtering include filtering by rules, language models. Besides, We conduct several experiments to validate different knowledge distillation techniques and right-to-left (R2L) reranking.

pdf bib
DBMS-KU Interpolation for WMT19 News Translation TaskDBMS-KU Interpolation for WMT19 News Translation Task
Sari Dewi Budiwati | Al Hafiz Akbar Maulana Siagian | Tirana Noor Fatyanosa | Masayoshi Aritsugi

This paper presents the participation of DBMS-KU Interpolation system in WMT19 shared task, namely, Kazakh-English language pair. We examine the use of interpolation method using a different language model order. Our Interpolation system combines a direct translation with Russian as a pivot language. We use 3-gram and 5-gram language model orders to perform the language translation in this work. To reduce noise in the pivot translation process, we prune the phrase table of source-pivot and pivot-target. Our experimental results show that our Interpolation system outperforms the Baseline in terms of BLEU-cased score by +0.5 and +0.1 points in Kazakh-English and English-Kazakh, respectively. In particular, using the 5-gram language model order in our system could obtain better BLEU-cased score than utilizing the 3-gram one. Interestingly, we found that by employing the Interpolation system could reduce the perplexity score of English-Kazakh when using 3-gram language model order.

pdf bib
Lingua Custodia at WMT’19 : Attempts to Control TerminologyWMT’19: Attempts to Control Terminology
Franck Burlot

This paper describes Lingua Custodia’s submission to the WMT’19 news shared task for German-to-French on the topic of the EU elections. We report experiments on the adaptation of the terminology of a machine translation system to a specific topic, aimed at providing more accurate translations of specific entities like political parties and person names, given that the shared task provided no in-domain training parallel data dealing with the restricted topic. Our primary submission to the shared task uses backtranslation generated with a type of decoding allowing the insertion of constraints in the output in order to guarantee the correct translation of specific terms that are not necessarily observed in the data.

pdf bib
The TALP-UPC Machine Translation Systems for WMT19 News Translation Task : Pivoting Techniques for Low Resource MTTALP-UPC Machine Translation Systems for WMT19 News Translation Task: Pivoting Techniques for Low Resource MT
Noe Casas | José A. R. Fonollosa | Carlos Escolano | Christine Basta | Marta R. Costa-jussà

In this article, we describe the TALP-UPC research group participation in the WMT19 news translation shared task for Kazakh-English. Given the low amount of parallel training data, we resort to using Russian as pivot language, training subword-based statistical translation systems for Russian-Kazakh and Russian-English that were then used to create two synthetic pseudo-parallel corpora for Kazakh-English and English-Kazakh respectively. Finally, a self-attention model based on the decoder part of the Transformer architecture was trained on the two pseudo-parallel corpora.

pdf bib
UdS-DFKI Participation at WMT 2019 : Low-Resource (en-gu) and Coreference-Aware (en-de) SystemsUdS-DFKI Participation at WMT 2019: Low-Resource (en-gu) and Coreference-Aware (en-de) Systems
Cristina España-Bonet | Dana Ruiter

This paper describes the UdS-DFKI submission to the WMT2019 news translation task for GujaratiEnglish (low-resourced pair) and GermanEnglish (document-level evaluation). Our systems rely on the on-line extraction of parallel sentences from comparable corpora for the first scenario and on the inclusion of coreference-related information in the training data in the second one.

pdf bib
The IIIT-H Gujarati-English Machine Translation System for WMT19IIIT-H Gujarati-English Machine Translation System for WMT19
Vikrant Goyal | Dipti Misra Sharma

This paper describes the Neural Machine Translation system of IIIT-Hyderabad for the GujaratiEnglish news translation shared task of WMT19. Our system is basedon encoder-decoder framework with attention mechanism. We experimented with Multilingual Neural MT models. Our experiments show that Multilingual Neural Machine Translation leveraging parallel data from related language pairs helps in significant BLEU improvements upto 11.5, for low resource language pairs like Gujarati-English

pdf bib
Kingsoft’s Neural Machine Translation System for WMT19WMT19
Xinze Guo | Chang Liu | Xiaolong Li | Yiran Wang | Guoliang Li | Feng Wang | Zhitao Xu | Liuyi Yang | Li Ma | Changliang Li

This paper describes the Kingsoft AI Lab’s submission to the WMT2019 news translation shared task. We participated in two language directions : English-Chinese and Chinese-English. For both language directions, we trained several variants of Transformer models using the provided parallel data enlarged with a large quantity of back-translated monolingual data. The best translation result was obtained with ensemble and reranking techniques. According to automatic metrics (BLEU) our Chinese-English system reached the second highest score, and our English-Chinese system reached the second highest score for this subtask.

pdf bib
The MLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation TaskMLLP-UPV Supervised Machine Translation Systems for WMT19 News Translation Task
Javier Iranzo-Sánchez | Gonçal Garcés Díaz-Munío | Jorge Civera | Alfons Juan

This paper describes the participation of the MLLP research group of the Universitat Politcnica de Valncia in the WMT 2019 News Translation Shared Task. In this edition, we have submitted systems for the German English and German French language pairs, participating in both directions of each pair. Our submitted systems, based on the Transformer architecture, make ample use of data filtering, synthetic data and domain adaptation through fine-tuning.

pdf bib
Microsoft Translator at WMT 2019 : Towards Large-Scale Document-Level Neural Machine TranslationMicrosoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation
Marcin Junczys-Dowmunt

This paper describes the Microsoft Translator submissions to the WMT19 news translation shared task for English-German. Our main focus is document-level neural machine translation with deep transformer models. We start with strong sentence-level baselines, trained on large-scale data created via data-filtering and noisy back-translation and find that back-translation seems to mainly help with translationese input. We explore fine-tuning techniques, deeper models and different ensembling strategies to counter these effects. Using document boundaries present in the authentic and synthetic parallel data, we create sequences of up to 1000 subword segments and train transformer translation models. We experiment with data augmentation techniques for the smaller authentic data with document-boundaries and for larger authentic data without boundaries. We further explore multi-task training for the incorporation of document-level source language monolingual data via the BERT-objective on the encoder and two-pass decoding for combinations of sentence-level and document-level systems. Based on preliminary human evaluation results, evaluators strongly prefer the document-level systems over our comparable sentence-level system. The document-level systems also seem to score higher than the human references in source-based direct assessment.

pdf bib
CUNI Submission for Low-Resource Languages in WMT News 2019CUNI Submission for Low-Resource Languages in WMT News 2019
Tom Kocmi | Ondřej Bojar

This paper describes the CUNI submission to the WMT 2019 News Translation Shared Task for the low-resource languages : Gujarati-English and Kazakh-English. We participated in both language pairs in both translation directions. Our system combines transfer learning from a different high-resource language pair followed by training on backtranslated monolingual data. Thanks to the simultaneous training in both directions, we can iterate the backtranslation process. We are using the Transformer model in a constrained submission.

pdf bib
A Comparison on Fine-grained Pre-trained Embeddings for the WMT19Chinese-English News Translation TaskWMT19Chinese-English News Translation Task
Zhenhao Li | Lucia Specia

This paper describes our submission to the WMT 2019 Chinese-English (zh-en) news translation shared task. Our systems are based on RNN architectures with pre-trained embeddings which utilize character and sub-character information. We compare models with these different granularity levels using different evaluating metics. We find that a finer granularity embeddings can help the model according to character level evaluation and that the pre-trained embeddings can also be beneficial for model performance marginally when the training data is limited.

pdf bib
Incorporating Word and Subword Units in Unsupervised Machine Translation Using Language Model Rescoring
Zihan Liu | Yan Xu | Genta Indra Winata | Pascale Fung

This paper describes CAiRE’s submission to the unsupervised machine translation track of the WMT’19 news shared task from German to Czech. We leverage a phrase-based statistical machine translation (PBSMT) model and a pre-trained language model to combine word-level neural machine translation (NMT) and subword-level NMT models without using any parallel data. We propose to solve the morphological richness problem of languages by training byte-pair encoding (BPE) embeddings for German and Czech separately, and they are aligned using MUSE (Conneau et al., 2018). To ensure the fluency and consistency of translations, a rescoring mechanism is proposed that reuses the pre-trained language model to select the translation candidates generated through beam search. Moreover, a series of pre-processing and post-processing approaches are applied to improve the quality of final translations.

pdf bib
NICT’s Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation TaskNICT’s Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation Task
Benjamin Marie | Haipeng Sun | Rui Wang | Kehai Chen | Atsushi Fujita | Masao Utiyama | Eiichiro Sumita

This paper presents the NICT’s participation in the WMT19 unsupervised news translation task. We participated in the unsupervised translation direction : German-Czech. Our primary submission to the task is the result of a simple combination of our unsupervised neural and statistical machine translation systems. Our system is ranked first for the German-to-Czech translation task, using only the data provided by the organizers (constraint’), according to both BLEU-cased and human evaluation. We also performed contrastive experiments with other language pairs, namely, English-Gujarati and English-Kazakh, to better assess the effectiveness of unsupervised machine translation in for distant language pairs and in truly low-resource conditions.

pdf bib
Facebook FAIR’s WMT19 News Translation Task SubmissionFacebook FAIR’s WMT19 News Translation Task Submission
Nathan Ng | Kyra Yee | Alexei Baevski | Myle Ott | Michael Auli | Sergey Edunov

This paper describes Facebook FAIR’s submission to the WMT19 shared news translation task. We participate in four language directions, English-German and English-Russian in both directions. Following our submission from last year, our baseline systems are large BPE-based transformer models trained with the FAIRSEQ sequence modeling toolkit. This year we experiment with different bitext data filtering schemes, as well as with adding filtered back-translated data. We also ensemble and fine-tune our models on domain-specific data, then decode using noisy channel model reranking. Our system improves on our previous system’s performance by 4.5 BLEU points and achieves the best case-sensitive BLEU score for the translation direction EnglishRussian.

pdf bib
Tilde’s Machine Translation Systems for WMT 2019WMT 2019
Marcis Pinnis | Rihards Krišlauks | Matīss Rikters

The paper describes the development process of Tilde’s NMT systems for the WMT 2019 shared task on news translation. We trained systems for the English-Lithuanian and Lithuanian-English translation directions in constrained and unconstrained tracks. We build upon the best methods of the previous year’s competition and combine them with recent advancements in the field. We also present a new method to ensure source domain adherence in back-translated data. Our systems achieved a shared first place in human evaluation.

pdf bib
Apertium-fin-engRule-based Shallow Machine Translation for WMT 2019 Shared TaskWMT 2019 Shared Task
Tommi Pirinen

In this paper we describe a rule-based, bi-directional machine translation system for the FinnishEnglish language pair. The baseline system was based on the existing data of FinnWordNet, omorfi and apertium-eng. We have built the disambiguation, lexical selection and translation rules by hand. The dictionaries and rules have been developed based on the shared task data. We describe in this article the use of the shared task data as a kind of a test-driven development workflow in RBMT development and show that it suits perfectly to a modern software engineering continuous integration workflow of RBMT and yields big increases to BLEU scores with minimal effort.

pdf bib
English-Czech Systems in WMT19 : Document-Level TransformerEnglish-Czech Systems in WMT19: Document-Level Transformer
Martin Popel | Dominik Macháček | Michal Auersperger | Ondřej Bojar | Pavel Pecina

We describe our NMT systems submitted to the WMT19 shared task in EnglishCzech news translation. Our systems are based on the Transformer model implemented in either Tensor2Tensor (T2 T) or Marian framework. We aimed at improving the adequacy and coherence of translated documents by enlarging the context of the source and target. Instead of translating each sentence independently, we split the document into possibly overlapping multi-sentence segments. In case of the T2 T implementation, this document-level-trained system achieves a +0.6 BLEU improvement (p 0.05) relative to the same system applied on isolated sentences. To assess the potential effect document-level models might have on lexical coherence, we performed a semi-automatic analysis, which revealed only a few sentences improved in this aspect. Thus, we can not draw any conclusions from this week evidence.

pdf bib
Felix Stahlberg | Danielle Saunders | Adrià de Gispert | Bill Byrne

Two techniques provide the fabric of the Cambridge University Engineering Department’s (CUED) entry to the WMT19 evaluation campaign : elastic weight consolidation (EWC) and different forms of language modelling (LMs). We report substantial gains by fine-tuning very strong baselines on former WMT test sets using a combination of checkpoint averaging and EWC. A sentence-level Transformer LM and a document-level LM based on a modified Transformer architecture yield further gains. As in previous years, we also extract n-gram probabilities from SMT lattices which can be seen as a source-conditioned n-gram LM.

pdf bib
University of Tartu’s Multilingual Multi-domain WMT19 News Translation Shared Task SubmissionUniversity of Tartu’s Multilingual Multi-domain WMT19 News Translation Shared Task Submission
Andre Tättar | Elizaveta Korotkova | Mark Fishel

This paper describes the University of Tartu’s submission to the news translation shared task of WMT19, where the core idea was to train a single multilingual system to cover several language pairs of the shared task and submit its results. We only used the constrained data from the shared task. We describe our approach and its results and discuss the technical issues we faced.

pdf bib
The LMU Munich Unsupervised Machine Translation System for WMT19LMU Munich Unsupervised Machine Translation System for WMT19
Dario Stojanovski | Viktor Hangya | Matthias Huck | Alexander Fraser

We describe LMU Munich’s machine translation system for GermanCzech translation which was used to participate in the WMT19 shared task on unsupervised news translation. We train our model using monolingual data only from both languages. The final model is an unsupervised neural model using established techniques for unsupervised translation such as denoising autoencoding and online back-translation. We bootstrap the model with masked language model pretraining and enhance it with back-translations from an unsupervised phrase-based system which is itself bootstrapped using unsupervised bilingual word embeddings.

pdf bib
Combining Local and Document-Level Context : The LMU Munich Neural Machine Translation System at WMT19LMU Munich Neural Machine Translation System at WMT19
Dario Stojanovski | Alexander Fraser

We describe LMU Munich’s machine translation system for EnglishGerman translation which was used to participate in the WMT19 shared task on supervised news translation. We specifically participated in the document-level MT track. The system used as a primary submission is a context-aware Transformer capable of both rich modeling of limited contextual information and integration of large-scale document-level context with a less rich representation. We train this model by fine-tuning a big Transformer baseline. Our experimental results show that document-level context provides for large improvements in translation quality, and adding a rich representation of the previous sentence provides a small additional gain.

pdf bib
The University of Helsinki Submissions to the WMT19 News Translation TaskUniversity of Helsinki Submissions to the WMT19 News Translation Task
Aarne Talman | Umut Sulubacak | Raúl Vázquez | Yves Scherrer | Sami Virpioja | Alessandro Raganato | Arvi Hurskainen | Jörg Tiedemann

In this paper we present the University of Helsinki submissions to the WMT 2019 shared news translation task in three language pairs : English-German, English-Finnish and Finnish-English. This year we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German we trained both sentence-level transformer models as well as compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches and we also included a rule-based system for English-Finnish.

pdf bib
A Test Suite and Manual Evaluation of Document-Level NMT at WMT19NMT at WMT19
Kateřina Rysová | Magdaléna Rysová | Tomáš Musil | Lucie Poláková | Ondřej Bojar

As the quality of machine translation rises and neural machine translation (NMT) is moving from sentence to document level translations, it is becoming increasingly difficult to evaluate the output of translation systems. We provide a test suite for WMT19 aimed at assessing discourse phenomena of MT systems participating in the News Translation Task. We have manually checked the outputs and identified types of translation errors that are relevant to document-level translation.

pdf bib
SAO WMT19 Test Suite : Machine Translation of Audit ReportsSAO WMT19 Test Suite: Machine Translation of Audit Reports
Tereza Vojtěchová | Michal Novák | Miloš Klouček | Ondřej Bojar

This paper describes a machine translation test set of documents from the auditing domain and its use as one of the test suites in the WMT19 News Translation Task for translation directions involving Czech, English and German. Our evaluation suggests that current MT systems optimized for the general news domain can perform quite well even in the particular domain of audit reports. The detailed manual evaluation however indicates that deep factual knowledge of the domain is necessary. For the naked eye of a non-expert, translations by many systems seem almost perfect and automatic MT evaluation with one reference is practically useless for considering these details. Furthermore, we show on a sample document from the domain of agreements that even the best systems completely fail in preserving the semantics of the agreement, namely the identity of the parties.

pdf bib
WMDO : Fluency-based Word Mover’s Distance for Machine Translation EvaluationWMDO: Fluency-based Word Mover’s Distance for Machine Translation Evaluation
Julian Chow | Lucia Specia | Pranava Madhyastha

We propose WMDO, a metric based on distance between distributions in the semantic vector space. Matching in the semantic space has been investigated for translation evaluation, but the constraints of a translation’s word order have not been fully explored. Building on the Word Mover’s Distance metric and various word embeddings, we introduce a fragmentation penalty to account for fluency of a translation. This word order extension is shown to perform better than standard WMD, with promising results against other types of metrics.

pdf bib
Meteor++ 2.0 : Adopt Syntactic Level Paraphrase Knowledge into Machine Translation Evaluation
Yinuo Guo | Junfeng Hu

This paper describes Meteor++ 2.0, our submission to the WMT19 Metric Shared Task. The well known Meteor metric improves machine translation evaluation by introducing paraphrase knowledge. However, it only focuses on the lexical level and utilizes consecutive n-grams paraphrases. In this work, we take into consideration syntactic level paraphrase knowledge, which sometimes may be skip-grams. We describe how such knowledge can be extracted from Paraphrase Database (PPDB) and integrated into Meteor-based metrics. Experiments on WMT15 and WMT17 evaluation datasets show that the newly proposed metric outperforms all previous versions of Meteor.

pdf bib
YiSi-a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available ResourcesYiSi - a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available Resources
Chi-kiu Lo

We present YiSi, a unified automatic semantic machine translation quality evaluation and estimation metric for languages with different levels of available resources. Underneath the interface with different language resources settings, YiSi uses the same representation for the two sentences in assessment. Besides, we show significant improvement in the correlation of YiSi-1’s scores with human judgment is made by using contextual embeddings in multilingual BERTBidirectional Encoder Representations from Transformers to evaluate lexical semantic similarity. YiSi is open source and publicly available.

pdf bib
EED : Extended Edit Distance Measure for Machine TranslationEED: Extended Edit Distance Measure for Machine Translation
Peter Stanchev | Weiyue Wang | Hermann Ney

Over the years a number of machine translation metrics have been developed in order to evaluate the accuracy and quality of machine-generated translations. Metrics such as BLEU and TER have been used for decades. However, with the rapid progress of machine translation systems, the need for better metrics is growing. This paper proposes an extension of the edit distance, which achieves better human correlation, whilst remaining fast, flexible and easy to understand.

pdf bib
Filtering Pseudo-References by Paraphrasing for Automatic Evaluation of Machine Translation
Ryoma Yoshimura | Hiroki Shimanaka | Yukio Matsumura | Hayahide Yamagishi | Mamoru Komachi

In this paper, we introduce our participation in the WMT 2019 Metric Shared Task. We propose an improved version of sentence BLEU using filtered pseudo-references. We propose a method to filter pseudo-references by paraphrasing for automatic evaluation of machine translation (MT). We use the outputs of off-the-shelf MT systems as pseudo-references filtered by paraphrasing in addition to a single human reference (gold reference). We use BERT fine-tuned with paraphrase corpus to filter pseudo-references by checking the paraphrasability with the gold reference. Our experimental results of the WMT 2016 and 2017 datasets show that our method achieved higher correlation with human evaluation than the sentence BLEU (SentBLEU) baselines with a single reference and with unfiltered pseudo-references.

pdf bib
Naver Labs Europe’s Systems for the WMT19 Machine Translation Robustness TaskEurope’s Systems for the WMT19 Machine Translation Robustness Task
Alexandre Berard | Ioan Calapodescu | Claude Roux

This paper describes the systems that we submitted to the WMT19 Machine Translation robustness task. This task aims to improve MT’s robustness to noise found on social media, like informal language, spelling mistakes and other orthographic variations. The organizers provide parallel data extracted from a social media website in two language pairs : French-English and Japanese-English (one for each language direction). The goal is to obtain the best scores on unseen test sets from the same source, according to automatic metrics (BLEU) and human evaluation. We propose one single and one ensemble system for each translation direction. Our ensemble models ranked first in all language pairs, according to BLEU evaluation. We discuss the pre-processing choices that we made, and present our solutions for robustness to noise and domain adaptation.

pdf bib
System Description : The Submission of FOKUS to the WMT 19 Robustness TaskFOKUS to the WMT 19 Robustness Task
Cristian Grozea

This paper describes the systems of Fraunhofer FOKUS for the WMT 2019 machine translation robustness task. We have made submissions to the EN-FR, FR-EN, and JA-EN language pairs. The first two were made with a baseline translator, trained on clean data for the WMT 2019 biomedical translation task. These baselines improved over the baselines from the MTNT paper by 2 to 4 BLEU points, but where not trained on the same data. The last one used the same model class and training procedure, with induced typos in the training data to increase the model robustness.

pdf bib
CUNI System for the WMT19 Robustness TaskCUNI System for the WMT19 Robustness Task
Jindřich Helcl | Jindřich Libovický | Martin Popel

We present our submission to the WMT19 Robustness Task. Our baseline system is the Charles University (CUNI) Transformer system trained for the WMT18 shared task on News Translation. Quantitative results show that the CUNI Transformer system is already far more robust to noisy input than the LSTM-based baseline provided by the task organizers. We further improved the performance of our model by fine-tuning on the in-domain noisy data without influencing the translation quality on the news domain.

pdf bib
NTT’s Machine Translation Systems for WMT19 Robustness TaskNTT’s Machine Translation Systems for WMT19 Robustness Task
Soichiro Murakami | Makoto Morishita | Tsutomu Hirao | Masaaki Nagata

This paper describes NTT’s submission to the WMT19 robustness task. This task mainly focuses on translating noisy text (e.g., posts on Twitter), which presents different difficulties from typical translation tasks such as news. Our submission combined techniques including utilization of a synthetic corpus, domain adaptation, and a placeholder mechanism, which significantly improved over the previous baseline. Experimental results revealed the placeholder mechanism, which temporarily replaces the non-standard tokens including emojis and emoticons with special placeholder tokens during translation, improves translation accuracy even with noisy texts.