Proceedings of the Third Conference on Machine Translation: Shared Task Papers

Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor (Editors)


Anthology ID:
W18-64
Month:
October
Year:
2018
Address:
Belgium, Brussels
Venues:
EMNLP | WMT | WS
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/W18-64
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/W18-64.pdf

bib
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor

pdf bib
Findings of the WMT 2018 Biomedical Translation Shared Task : Evaluation on Medline test setsWMT 2018 Biomedical Translation Shared Task: Evaluation on Medline test sets
Mariana Neves | Antonio Jimeno Yepes | Aurélie Névéol | Cristian Grozea | Amy Siu | Madeleine Kittner | Karin Verspoor

Machine translation enables the automatic translation of textual documents between languages and can facilitate access to information only available in a given language for non-speakers of this language, e.g. research results presented in scientific publications. In this paper, we provide an overview of the Biomedical Translation shared task in the Workshop on Machine Translation (WMT) 2018, which specifically examined the performance of machine translation systems for biomedical texts. This year, we provided test sets of scientific publications from two sources (EDP and Medline) and for six language pairs (English with each of Chinese, French, German, Portuguese, Romanian and Spanish). We describe the development of the various test sets, the submissions that we received and the evaluations that we carried out. We obtained a total of 39 runs from six teams and some of this year’s BLEU scores were somewhat higher that last year’s, especially for teams that made use of biomedical resources or state-of-the-art MT algorithms (e.g. Transformer). Finally, our manual evaluation scored automatic translations higher than the reference translations for German and Spanish.

pdf bib
An Empirical Study of Machine Translation for the Shared Task of WMT18WMT18
Chao Bei | Hao Zong | Yiming Wang | Baoyong Fan | Shiqi Li | Conghu Yuan

This paper describes the Global Tone Communication Co., Ltd.’s submission of the WMT18 shared news translation task. We participated in the English-to-Chinese direction and get the best BLEU (43.8) scores among all the participants. The submitted system focus on data clearing and techniques to build a competitive model for this task. Unlike other participants, the submitted system are mainly relied on the data filtering to obtain the best BLEU score. We do data filtering not only for provided sentences but also for the back translated sentences. The techniques we apply for data filtering include filtering by rules, language models and translation models. We also conduct several experiments to validate the effectiveness of training techniques. According to our experiments, the Annealing Adam optimizing function and ensemble decoding are the most effective techniques for the model training.

pdf bib
The TALP-UPC Machine Translation Systems for WMT18 News Shared Translation TaskTALP-UPC Machine Translation Systems for WMT18 News Shared Translation Task
Noe Casas | Carlos Escolano | Marta R. Costa-jussà | José A. R. Fonollosa

In this article we describe the TALP-UPC research group participation in the WMT18 news shared translation task for Finnish-English and Estonian-English within the multi-lingual subtrack. All of our primary submissions implement an attention-based Neural Machine Translation architecture. Given that Finnish and Estonian belong to the same language family and are similar, we use as training data the combination of the datasets of both language pairs to paliate the data scarceness of each individual pair. We also report the translation quality of systems trained on individual language pair data to serve as baseline and comparison reference.

pdf bib
The AFRL WMT18 Systems : Ensembling, Continuation and CombinationAFRL WMT18 Systems: Ensembling, Continuation and Combination
Jeremy Gwinnup | Tim Anderson | Grant Erdmann | Katherine Young

This paper describes the Air Force Research Laboratory (AFRL) machine translation systems and the improvements that were developed during the WMT18 evaluation campaign. This year, we examined the developments and additions to popular neural machine translation toolkits and measure improvements in performance on the RussianEnglish language pair.

pdf bib
The University of Edinburgh’s Submissions to the WMT18 News Translation TaskUniversity of Edinburgh’s Submissions to the WMT18 News Translation Task
Barry Haddow | Nikolay Bogoychev | Denis Emelin | Ulrich Germann | Roman Grundkiewicz | Kenneth Heafield | Antonio Valerio Miceli Barone | Rico Sennrich

The University of Edinburgh made submissions to all 14 language pairs in the news translation task, with strong performances in most pairs. We introduce new RNN-variant, mixed RNN / Transformer ensembles, data selection and weighting, and extensions to back-translation.

pdf bib
CUNI Submissions in WMT18CUNI Submissions in WMT18
Tom Kocmi | Roman Sudarikov | Ondřej Bojar

We participated in the WMT 2018 shared news translation task in three language pairs : English-Estonian, English-Finnish, and English-Czech. Our main focus was the low-resource language pair of Estonian and English for which we utilized Finnish parallel data in a simple method. We first train a parent model for the high-resource language pair followed by adaptation on the related low-resource language pair. This approach brings a substantial performance boost over the baseline system trained only on Estonian-English parallel data. Our systems are based on the Transformer architecture. For the English to Czech translation, we have evaluated our last year models of hybrid phrase-based approach and neural machine translation mainly for comparison purposes.

pdf bib
JUCBNMT at WMT2018 News Translation Task : Character Based Neural Machine Translation of Finnish to EnglishJUCBNMT at WMT2018 News Translation Task: Character Based Neural Machine Translation of Finnish to English
Sainik Kumar Mahata | Dipankar Das | Sivaji Bandyopadhyay

In the current work, we present a description of the system submitted to WMT 2018 News Translation Shared task. The system was created to translate news text from Finnish to English. The system used a Character Based Neural Machine Translation model to accomplish the given task. The current paper documents the preprocessing steps, the description of the submitted system and the results produced using the same. Our system garnered a BLEU score of 12.9.

pdf bib
PROMT Systems for WMT 2018 Shared Translation TaskPROMT Systems for WMT 2018 Shared Translation Task
Alexander Molchanov

This paper describes the PROMT submissions for the WMT 2018 Shared News Translation Task. This year we participated only in the English-Russian language pair. We built two primary neural networks-based systems : 1) a pure Marian-based neural system and 2) a hybrid system which incorporates OpenNMT-based neural post-editing component into our RBMT engine. We also submitted pure rule-based translation (RBMT) for contrast. We show competitive results with both primary submissions which significantly outperform the RBMT baseline.

pdf bib
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2018WMT 2018
Ngoc-Quan Pham | Jan Niehues | Alexander Waibel

We present our experiments in the scope of the news translation task in WMT 2018, in directions : EnglishGerman. The core of our systems is the encoder-decoder based neural machine translation models using the transformer architecture. We enhanced the model with a deeper architecture. By using techniques to limit the memory consumption, we were able to train models that are 4 times larger on one GPU and improve the performance by 1.2 BLEU points. Furthermore, we performed sentence selection for the newly available ParaCrawl corpus. Thereby, we could improve the effectiveness of the corpus by 0.5 BLEU points.

pdf bib
CUNI Transformer Neural MT System for WMT18CUNI Transformer Neural MT System for WMT18
Martin Popel

We describe our NMT system submitted to the WMT2018 shared task in news translation. Our system is based on the Transformer model (Vaswani et al., 2017). We use an improved technique of backtranslation, where we iterate the process of translating monolingual data in one direction and training an NMT model for the opposite direction using synthetic parallel data. We apply a simple but effective filtering of the synthetic data. We pre-process the input sentences using coreference resolution in order to disambiguate the gender of pro-dropped personal pronouns. Finally, we apply two simple post-processing substitutions on the translated output. Our system is significantly (p 0.05) better than all other English-Czech and Czech-English systems in WMT2018.

pdf bib
The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018RWTH Aachen University Supervised Machine Translation Systems for WMT 2018
Julian Schamper | Jan Rosendahl | Parnia Bahar | Yunsu Kim | Arne Nix | Hermann Ney

This paper describes the statistical machine translation systems developed at RWTH Aachen University for the GermanEnglish, EnglishTurkish and ChineseEnglish translation tasks of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles of neural machine translation systems based on the Transformer architecture. Our main focus is on the GermanEnglish task where we to all automatic scored first with respect metrics provided by the organizers. We identify data selection, fine-tuning, batch size and model dimension as important hyperparameters. In total we improve by 6.8 % BLEU over our last year’s submission and by 4.8 % BLEU over the winning system of the 2017 GermanEnglish task. In EnglishTurkish task, we show 3.6 % BLEU improvement over the last year’s winning system. We further report results on the ChineseEnglish task where we improve 2.2 % BLEU on average over our baseline systems but stay behind the 2018 winning systems.

pdf bib
The University of Cambridge’s Machine Translation Systems for WMT18University of Cambridge’s Machine Translation Systems for WMT18
Felix Stahlberg | Adrià de Gispert | Bill Byrne

The University of Cambridge submission to the WMT18 news translation task focuses on the combination of diverse models of translation. We compare recurrent, convolutional, and self-attention-based neural models on German-English, English-German, and Chinese-English. Our final system combines all neural models together with a phrase-based SMT system in an MBR-based scheme. We report small but consistent gains on top of strong Transformer ensembles.

pdf bib
The LMU Munich Unsupervised Machine Translation SystemsLMU Munich Unsupervised Machine Translation Systems
Dario Stojanovski | Viktor Hangya | Matthias Huck | Alexander Fraser

We describe LMU Munich’s unsupervised machine translation systems for EnglishGerman translation. These systems were used to participate in the WMT18 news translation shared task and more specifically, for the unsupervised learning sub-track. The systems are trained on English and German monolingual data only and exploit and combine previously proposed techniques such as using word-by-word translated data based on bilingual word embeddings, denoising and on-the-fly backtranslation.

pdf bib
Tencent Neural Machine Translation Systems for WMT18WMT18
Mingxuan Wang | Li Gong | Wenhuan Zhu | Jun Xie | Chao Bian

We participated in the WMT 2018 shared news translation task on EnglishChinese language pair. Our systems are based on attentional sequence-to-sequence models with some form of recursion and self-attention. Some data augmentation methods are also introduced to improve the translation performance. The best translation result is obtained with ensemble and reranking techniques. Our ChineseEnglish system achieved the highest cased BLEU score among all 16 submitted systems, and our EnglishChinese system ranked the third out of 18 submitted systems.

pdf bib
The University of Maryland’s Chinese-English Neural Machine Translation Systems at WMT18University of Maryland’s Chinese-English Neural Machine Translation Systems at WMT18
Weijia Xu | Marine Carpuat

This paper describes the University of Maryland’s submission to the WMT 2018 ChineseEnglish news translation tasks. Our systems are BPE-based self-attentional Transformer networks with parallel and backtranslated monolingual training data. Using ensembling and reranking, we improve over the Transformer baseline by +1.4 BLEU for ChineseEnglish and +3.97 BLEU for EnglishChinese on newstest2017. Our best systems reach BLEU scores of 24.4 for ChineseEnglish and 39.0 for EnglishChinese on newstest2018.newstest2017. Our best systems reach BLEU scores of 24.4 for Chinese→English and 39.0 for English→Chinese on newstest2018.

pdf bib
EvalD Reference-Less Discourse Evaluation for WMT18EvalD Reference-Less Discourse Evaluation for WMT18
Ondřej Bojar | Jiří Mírovský | Kateřina Rysová | Magdaléna Rysová

We present the results of automatic evaluation of discourse in machine translation (MT) outputs using the EVALD tool. EVALD was originally designed and trained to assess the quality of human writing, for native speakers and foreign-language learners. MT has seen a tremendous leap in translation quality at the level of sentences and it is thus interesting to see if the human-level evaluation is becoming relevant.human writing, for native speakers and foreign-language learners. MT has seen a tremendous leap in translation quality at the level of sentences and it is thus interesting to see if the human-level evaluation is becoming relevant.

pdf bib
Fine-grained evaluation of German-English Machine Translation based on a Test SuiteGerman-English Machine Translation based on a Test Suite
Vivien Macketanz | Eleftherios Avramidis | Aljoscha Burchardt | Hans Uszkoreit

We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 linguistic phenomena in 14 categories, with an increased focus on verb tenses, aspects and moods. The MT outputs are evaluated in a semi-automatic way through regular expressions that focus only on the part of the sentence that is relevant to each phenomenon. Through our analysis, we are able to compare systems based on their performance on these categories. Additionally, we reveal strengths and weaknesses of particular systems and we identify grammatical phenomena where the overall performance of MT is relatively low.

pdf bib
The Word Sense Disambiguation Test Suite at WMT18WMT18
Annette Rios | Mathias Müller | Rico Sennrich

We present a task to measure an MT system’s capability to translate ambiguous words with their correct sense according to the given context. The task is based on the GermanEnglish Word Sense Disambiguation (WSD) test set ContraWSD (Rios Gonzales et al., 2017), but it has been filtered to reduce noise, and the evaluation has been adapted to assess MT output directly rather than scoring existing translations. We evaluate all GermanEnglish submissions to the WMT’18 shared translation task, plus a number of submissions from previous years, and find that performance on the task has markedly improved compared to the 2016 WMT submissions (81%93 % accuracy on the WSD task). We also find that the unsupervised submissions to the task have a low WSD capability, and predominantly translate ambiguous source words with the same sense.

pdf bib
The AFRL-Ohio State WMT18 Multimodal System : Combining Visual with TraditionalAFRL-Ohio State WMT18 Multimodal System: Combining Visual with Traditional
Jeremy Gwinnup | Joshua Sandvick | Michael Hutt | Grant Erdmann | John Duselis | James Davis

AFRL-Ohio State extends its usage of visual domain-driven machine translation for use as a peer with traditional machine translation systems. As a peer, it is enveloped into a system combination of neural and statistical MT systems to present a composite translation.

pdf bib
CUNI System for the WMT18 Multimodal Translation TaskCUNI System for the WMT18 Multimodal Translation Task
Jindřich Helcl | Jindřich Libovický | Dušan Variš

We present our submission to the WMT18 Multimodal Translation Task. The main feature of our submission is applying a self-attentive network instead of a recurrent neural network. We evaluate two methods of incorporating the visual features in the model : first, we include the image representation as another input to the network ; second, we train the model to predict the visual features and use it as an auxiliary objective. For our submission, we acquired both textual and multimodal additional data. Both of the proposed methods yield significant improvements over recurrent networks and self-attentive textual baselines.

pdf bib
Sheffield Submissions for WMT18 Multimodal Translation Shared TaskSheffield Submissions for WMT18 Multimodal Translation Shared Task
Chiraag Lala | Pranava Swaroop Madhyastha | Carolina Scarton | Lucia Specia

This paper describes the University of Sheffield’s submissions to the WMT18 Multimodal Machine Translation shared task. We participated in both tasks 1 and 1b. For task 1, we build on a standard sequence to sequence attention-based neural machine translation system (NMT) and investigate the utility of multimodal re-ranking approaches. More specifically, n-best translation candidates from this system are re-ranked using novel multimodal cross-lingual word sense disambiguation models. For task 1b, we explore three approaches : (i) re-ranking based on cross-lingual word sense disambiguation (as for task 1), (ii) re-ranking based on consensus of NMT n-best lists from German-Czech, French-Czech and English-Czech systems, and (iii) data augmentation by generating English source data through machine translation from French to English and from German to English followed by hypothesis selection using a multimodal-reranker.

pdf bib
Translation of Biomedical Documents with Focus on Spanish-EnglishSpanish-English
Mirela-Stefania Duma | Wolfgang Menzel

For the WMT 2018 shared task of translating documents pertaining to the Biomedical domain, we developed a scoring formula that uses an unsophisticated and effective method of weighting term frequencies and was integrated in a data selection pipeline. The method was applied on five language pairs and it performed best on Portuguese-English, where a BLEU score of 41.84 placed it third out of seven runs submitted by three institutions. In this paper, we describe our method and results with a special focus on Spanish-English where we compare it against a state-of-the-art method. Our contribution to the task lies in introducing a fast, unsupervised method for selecting domain-specific data for training models which obtain good results using only 10 % of the general domain data.

pdf bib
Ensemble of Translators with Automatic Selection of the Best Translation the submission of FOKUS to the WMT 18 biomedical translation taskFOKUS to the WMT 18 biomedical translation task –
Cristian Grozea

This paper describes the system of Fraunhofer FOKUS for the WMT 2018 biomedical translation task. Our approach, described here, was to automatically select the most promising translation from a set of candidates produced with NMT (Transformer) models. We selected the highest fidelity translation of each sentence by using a dictionary, stemming and a set of heuristics. Our method is simple, can use any machine translators, and requires no further training in addition to that already employed to build the NMT models. The downside is that the score did not increase over the best in ensemble, but was quite close to it (difference about 0.5 BLEU).

pdf bib
LMU Munich’s Neural Machine Translation Systems at WMT 2018LMU Munich’s Neural Machine Translation Systems at WMT 2018
Matthias Huck | Dario Stojanovski | Viktor Hangya | Alexander Fraser

We present the LMU Munich machine translation systems for the EnglishGerman language pair. We have built neural machine translation systems for both translation directions (EnglishGerman and GermanEnglish) and for two different domains (the biomedical domain and the news domain). The systems were used for our participation in the WMT18 biomedical translation task and in the shared task on machine translation of news. The main focus of our recent system development efforts has been on achieving improvements in the biomedical domain over last year’s strong biomedical translation engine for EnglishGerman (Huck et al., 2017a). Considerable progress has been made in the latter task, which we report on in this paper.

pdf bib
Hunter NMT System for WMT18 Biomedical Translation Task : Transfer Learning in Neural Machine TranslationNMT System for WMT18 Biomedical Translation Task: Transfer Learning in Neural Machine Translation
Abdul Khan | Subhadarshi Panda | Jia Xu | Lampros Flokas

This paper describes the submission of Hunter Neural Machine Translation (NMT) to the WMT’18 Biomedical translation task from English to French. The discrepancy between training and test data distribution brings a challenge to translate text in new domains. Beyond the previous work of combining in-domain with out-of-domain models, we found accuracy and efficiency gain in combining different in-domain models. We conduct extensive experiments on NMT with transfer learning. We train on different in-domain Biomedical datasets one after another. That means parameters of the previous training serve as the initialization of the next one. Together with a pre-trained out-of-domain News model, we enhanced translation quality with 3.73 BLEU points over the baseline. Furthermore, we applied ensemble learning on training models of intermediate epochs and achieved an improvement of 4.02 BLEU points over the baseline. Overall, our system is 11.29 BLEU points above the best system of last year on the EDP 2017 test set.transfer learning. We train on different in-domain Biomedical datasets one after another. That means parameters of the previous training serve as the initialization of the next one. Together with a pre-trained out-of-domain News model, we enhanced translation quality with 3.73 BLEU points over the baseline. Furthermore, we applied ensemble learning on training models of intermediate epochs and achieved an improvement of 4.02 BLEU points over the baseline. Overall, our system is 11.29 BLEU points above the best system of last year on the EDP 2017 test set.

pdf bib
UFRGS Participation on the WMT Biomedical Translation Shared TaskUFRGS Participation on the WMT Biomedical Translation Shared Task
Felipe Soares | Karin Becker

This paper describes the machine translation systems developed by the Universidade Federal do Rio Grande do Sul (UFRGS) team for the biomedical translation shared task. Our systems are based on statistical machine translation and neural machine translation, using the Moses and OpenNMT toolkits, respectively. We participated in four translation directions for the English / Spanish and English / Portuguese language pairs. To create our training data, we concatenated several parallel corpora, both from in-domain and out-of-domain sources, as well as terminological resources from UMLS. Our systems achieved the best BLEU scores according to the official shared task evaluation.

pdf bib
Neural Machine Translation with the Transformer and Multi-Source Romance Languages for the Biomedical WMT 2018 taskRomance Languages for the Biomedical WMT 2018 task
Brian Tubay | Marta R. Costa-jussà

The Transformer architecture has become the state-of-the-art in Machine Translation. This model, which relies on attention-based mechanisms, has outperformed previous neural machine translation architectures in several tasks. In this system description paper, we report details of training neural machine translation with multi-source Romance languages with the Transformer model and in the evaluation frame of the biomedical WMT 2018 task. Using multi-source languages from the same family allows improvements of over 6 BLEU points.

pdf bib
ITER : Improving Translation Edit Rate through Optimizable Edit CostsITER: Improving Translation Edit Rate through Optimizable Edit Costs
Joybrata Panja | Sudip Kumar Naskar

The paper presents our participation in the WMT 2018 Metrics Shared Task. We propose an improved version of Translation Edit / Error Rate (TER). In addition to including the basic edit operations in TER, namely-insertion, deletion, substitution and shift, our metric also allows stem matching, optimizable edit costs and better normalization so as to correlate better with human judgement scores. The proposed metric shows much higher correlation with human judgments than TER.

pdf bib
RUSE : Regressor Using Sentence Embeddings for Automatic Machine Translation EvaluationRUSE: Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation
Hiroki Shimanaka | Tomoyuki Kajiwara | Mamoru Komachi

We introduce the RUSE metric for the WMT18 metrics shared task. Sentence embeddings can capture global information that can not be captured by local features based on character or word N-grams. Although training sentence embeddings using small-scale translation datasets with manual evaluation is difficult, sentence embeddings trained from large-scale data in other tasks can improve the automatic evaluation of machine translation. We use a multi-layer perceptron regressor based on three types of sentence embeddings. The experimental results of the WMT16 and WMT17 datasets show that the RUSE metric achieves a state-of-the-art performance in both segment- and system-level metrics tasks with embedding features only.

pdf bib
Neural Machine Translation for English-TamilEnglish-Tamil
Himanshu Choudhary | Aditya Kumar Pathak | Rajiv Ratan Saha | Ponnurangam Kumaraguru

A huge amount of valuable resources is available on the web in English, which are often translated into local languages to facilitate knowledge sharing among local people who are not much familiar with English. However, translating such content manually is very tedious, costly, and time-consuming process. To this end, machine translation is an efficient approach to translate text without any human involvement. Neural machine translation (NMT) is one of the most recent and effective translation technique amongst all existing machine translation systems. In this paper, we apply NMT for English-Tamil language pair. We propose a novel neural machine translation technique using word-embedding along with Byte-Pair-Encoding (BPE) to develop an efficient translation system that overcomes the OOV (Out Of Vocabulary) problem for languages which do not have much translations available online. We use the BLEU score for evaluating the system performance. Experimental results confirm that our proposed MIDAS translator (8.33 BLEU score) outperforms Google translator (3.75 BLEU score).

pdf bib
The Benefit of Pseudo-Reference Translations in Quality Estimation of MT OutputMT Output
Melania Duma | Wolfgang Menzel

In this paper, a novel approach to Quality Estimation is introduced, which extends the method in (Duma and Menzel, 2017) by also considering pseudo-reference translations as data sources to the tree and sequence kernels used before. Two variants of the system were submitted to the sentence level WMT18 Quality Estimation Task for the English-German language pair. They have been ranked 4th and 6th out of 13 systems in the SMT track, while in the NMT track ranks 4 and 5 out of 11 submissions have been reached.

pdf bib
Supervised and Unsupervised Minimalist Quality Estimators : Vicomtech’s Participation in the WMT 2018 Quality Estimation TaskWMT 2018 Quality Estimation Task
Thierry Etchegoyhen | Eva Martínez Garcia | Andoni Azpeitia

We describe Vicomtech’s participation in the WMT 2018 shared task on quality estimation, for which we submitted minimalist quality estimators. The core of our approach is based on two simple features : lexical translation overlaps and language model cross-entropy scores. These features are exploited in two system variants : uMQE is an unsupervised system, where the final quality score is obtained by averaging individual feature scores ; sMQE is a supervised variant, where the final score is estimated by a Support Vector Regressor trained on the available annotated datasets. The main goal of our minimalist approach to quality estimation is to provide reliable estimators that require minimal deployment effort, few resources, and, in the case of uMQE, do not depend on costly data annotation or post-editing. Our approach was applied to all language pairs in sentence quality estimation, obtaining competitive results across the board.

pdf bib
Sheffield Submissions for the WMT18 Quality Estimation Shared TaskSheffield Submissions for the WMT18 Quality Estimation Shared Task
Julia Ive | Carolina Scarton | Frédéric Blain | Lucia Specia

In this paper we present the University of Sheffield submissions for the WMT18 Quality Estimation shared task. We discuss our submissions to all four sub-tasks, where ours is the only team to participate in all language pairs and variations (37 combinations). Our systems show competitive results and outperform the baseline in nearly all cases.

pdf bib
UAlacant machine translation quality estimation at WMT 2018 : a simple approach using phrase tables and feed-forward neural networksUAlacant machine translation quality estimation at WMT 2018: a simple approach using phrase tables and feed-forward neural networks
Felipe Sánchez-Martínez | Miquel Esplà-Gomis | Mikel L. Forcada

We describe the Universitat d’Alacant submissions to the word- and sentence-level machine translation (MT) quality estimation (QE) shared task at WMT 2018. Our approach to word-level MT QE builds on previous work to mark the words in the machine-translated sentence as OK or BAD, and is extended to determine if a word or sequence of words need to be inserted in the gap after each word. Our sentence-level submission simply uses the edit operations predicted by the word-level approach to approximate TER. The method presented ranked first in the sub-task of identifying insertions in gaps for three out of the six datasets, and second in the rest of them.OK or BAD, and is extended to determine if a word or sequence of words need to be inserted in the gap after each word. Our sentence-level submission simply uses the edit operations predicted by the word-level approach to approximate TER. The method presented ranked first in the sub-task of identifying insertions in gaps for three out of the six datasets, and second in the rest of them.

pdf bib
Alibaba Submission for WMT18 Quality Estimation TaskAlibaba Submission for WMT18 Quality Estimation Task
Jiayi Wang | Kai Fan | Bo Li | Fengming Zhou | Boxing Chen | Yangbin Shi | Luo Si

The goal of WMT 2018 Shared Task on Translation Quality Estimation is to investigate automatic methods for estimating the quality of machine translation results without reference translations. This paper presents the QE Brain system, which proposes the neural Bilingual Expert model as a feature extractor based on conditional target language model with a bidirectional transformer and then processes the semantic representations of source and the translation output with a Bi-LSTM predictive model for automatic quality estimation. The system has been applied to the sentence-level scoring and ranking tasks as well as the word-level tasks for finding errors for each word in translations. An extensive set of experimental results have shown that our system outperformed the best results in WMT 2017 Quality Estimation tasks and obtained top results in WMT 2018.

pdf bib
Quality Estimation with Force-Decoded Attention and Cross-lingual Embeddings
Elizaveta Yankovskaya | Andre Tättar | Mark Fishel

This paper describes the submissions of the team from the University of Tartu for the sentence-level Quality Estimation shared task of WMT18. The proposed models use features based on attention weights of a neural machine translation system and cross-lingual phrase embeddings as input features of a regression model. Two of the proposed models require only a neural machine translation system with an attention mechanism with no additional resources. Results show that combining neural networks and baseline features leads to significant improvements over the baseline features alone.

pdf bib
MS-UEdin Submission to the WMT2018 APE Shared Task : Dual-Source Transformer for Automatic Post-EditingMS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source Transformer for Automatic Post-Editing
Marcin Junczys-Dowmunt | Roman Grundkiewicz

This paper describes the Microsoft and University of Edinburgh submission to the Automatic Post-editing shared task at WMT2018. Based on training data and systems from the WMT2017 shared task, we re-implement our own models from the last shared task and introduce improvements based on extensive parameter sharing. Next we experiment with our implementation of dual-source transformer models and data selection for the IT domain. Our submissions decisively wins the SMT post-editing sub-task establishing the new state-of-the-art and is a very close second (or equal, 16.46 vs 16.50 TER) in the NMT sub-task. Based on the rather weak results in the NMT sub-task, we hypothesize that neural-on-neural APE might not be actually useful.

pdf bib
A Transformer-Based Multi-Source Automatic Post-Editing System
Santanu Pal | Nico Herbig | Antonio Krüger | Josef van Genabith

This paper presents our EnglishGerman Automatic Post-Editing (APE) system submitted to the APE Task organized at WMT 2018 (Chatterjee et al., 2018). The proposed model is an extension of the transformer architecture : two separate self-attention-based encoders encode the machine translation output (mt) and the source (src), followed by a joint encoder that attends over a combination of these two encoded sequences (encsrc and encmt) for generating the post-edited sentence. We compare this multi-source architecture (i.e, src, mt pe) to a monolingual transformer (i.e., mt pe) model and an ensemble combining the multi-source src, mt pe and single-source mt pe models. For both the PBSMT and the NMT task, the ensemble yields the best results, followed by the multi-source model and last the single-source approach. Our best model, the ensemble, achieves a BLEU score of 66.16 and 74.22 for the PBSMT and NMT task, respectively.

pdf bib
DFKI-MLT System Description for the WMT18 Automatic Post-editing TaskDFKI-MLT System Description for the WMT18 Automatic Post-editing Task
Daria Pylypenko | Raphael Rubino

This paper presents the Automatic Post-editing (APE) systems submitted by the DFKI-MLT group to the WMT’18 APE shared task. Three monolingual neural sequence-to-sequence APE systems were trained using target-language data only : one using an attentional recurrent neural network architecture and two using the attention-only (transformer) architecture. The training data was composed of machine translated (MT) output used as source to the APE model aligned with their manually post-edited version or reference translation as target. We made use of the provided training sets only and trained APE models applicable to phrase-based and neural MT outputs. Results show better performances reached by the attention-only model over the recurrent one, significant improvement over the baseline when post-editing phrase-based MT output but degradation when applied to neural MT output.transformer) architecture. The training data was composed of machine translated (MT) output used as source to the APE model aligned with their manually post-edited version or reference translation as target. We made use of the provided training sets only and trained APE models applicable to phrase-based and neural MT outputs. Results show better performances reached by the attention-only model over the recurrent one, significant improvement over the baseline when post-editing phrase-based MT output but degradation when applied to neural MT output.

pdf bib
The Speechmatics Parallel Corpus Filtering System for WMT18WMT18
Tom Ash | Remi Francis | Will Williams

Our entry to the parallel corpus filtering task uses a two-step strategy. The first step uses a series of pragmatic hard ‘rules’ to remove the worst example sentences. This first step reduces the effective corpus size down from the initial 1 billion to 160 million tokens. The second step uses four different heuristics weighted to produce a score that is then used for further filtering down to 100 or 10 million tokens. Our final system produces competitive results without requiring excessive fine tuning to the exact task or language pair. The first step in isolation provides a very fast filter that gives most of the gains of the final system.

pdf bib
An Unsupervised System for Parallel Corpus Filtering
Viktor Hangya | Alexander Fraser

In this paper we describe LMU Munich’s submission for the WMT 2018 Parallel Corpus Filtering shared task which addresses the problem of cleaning noisy parallel corpora. The task of mining and cleaning parallel sentences is important for improving the quality of machine translation systems, especially for low-resource languages. We tackle this problem in a fully unsupervised fashion relying on bilingual word embeddings created without any bilingual signal. After pre-filtering noisy data we rank sentence pairs by calculating bilingual sentence-level similarities and then remove redundant data by employing monolingual similarity as well. Our unsupervised system achieved good performance during the official evaluation of the shared task, scoring only a few BLEU points behind the best systems, while not requiring any parallel training data.WMT 2018 Parallel Corpus Filtering shared task which addresses the problem of cleaning noisy parallel corpora. The task of mining and cleaning parallel sentences is important for improving the quality of machine translation systems, especially for low-resource languages. We tackle this problem in a fully unsupervised fashion relying on bilingual word embeddings created without any bilingual signal. After pre-filtering noisy data we rank sentence pairs by calculating bilingual sentence-level similarities and then remove redundant data by employing monolingual similarity as well. Our unsupervised system achieved good performance during the official evaluation of the shared task, scoring only a few BLEU points behind the best systems, while not requiring any parallel training data.

pdf bib
Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora
Marcin Junczys-Dowmunt

In this work we introduce dual conditional cross-entropy filtering for noisy parallel data. For each sentence pair of the noisy parallel corpus we compute cross-entropy scores according to two inverse translation models trained on clean data. We penalize divergent cross-entropies and weigh the penalty by the cross-entropy average of both models. Sorting or thresholding according to these scores results in better subsets of parallel data. We achieve higher BLEU scores with models trained on parallel data filtered only from Paracrawl than with models trained on clean WMT data. We further evaluate our method in the context of the WMT2018 shared task on parallel corpus filtering and achieve the overall highest ranking scores of the shared task, scoring top in three out of four subtasks.

pdf bib
Measuring sentence parallelism using Mahalanobis distances : The NRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared taskNRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared task
Patrick Littell | Samuel Larkin | Darlene Stewart | Michel Simard | Cyril Goutte | Chi-kiu Lo

The WMT18 shared task on parallel corpus filtering (Koehn et al., 2018b) challenged teams to score sentence pairs from a large high-recall, low-precision web-scraped parallel corpus (Koehn et al., 2018a). Participants could use existing sample corpora (e.g. past WMT data) as a supervisory signal to learn what a clean corpus looks like. However, in lower-resource situations it often happens that the target corpus of the language is the only sample of parallel text in that language. We therefore made several unsupervised entries, setting ourselves an additional constraint that we not utilize the additional clean parallel corpora. One such entry fairly consistently scored in the top ten systems in the 100M-word conditions, and for one tasktranslating the European Medicines Agency corpus (Tiedemann, 2009)scored among the best systems even in the 10M-word conditions.only sample of parallel text in that language. We therefore made several unsupervised entries, setting ourselves an additional constraint that we not utilize the additional clean parallel corpora. One such entry fairly consistently scored in the top ten systems in the 100M-word conditions, and for one task—translating the European Medicines Agency corpus (Tiedemann, 2009)—scored among the best systems even in the 10M-word conditions.

pdf bib
Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric : The NRC supervised submissions to the Parallel Corpus Filtering taskNRC supervised submissions to the Parallel Corpus Filtering task
Chi-kiu Lo | Michel Simard | Darlene Stewart | Samuel Larkin | Cyril Goutte | Patrick Littell

We present our semantic textual similarity approach in filtering a noisy web crawled parallel corpus using YiSia novel semantic machine translation evaluation metric. The systems mainly based on this supervised approach perform well in the WMT18 Parallel Corpus Filtering shared task (4th place in 100-million-word evaluation, 8th place in 10-million-word evaluation, and 6th place overall, out of 48 submissions). In fact, our best performing systemNRC-yisi-bicov is one of the only four submissions ranked top 10 in both evaluations. Our submitted systems also include some initial filtering steps for scaling down the size of the test corpus and a final redundancy removal step for better semantic and token coverage of the filtered corpus. In this paper, we also describe our unsuccessful attempt in automatically synthesizing a noisy parallel development corpus for tuning the weights to combine different parallelism and fluency features.

pdf bib
Alibaba Submission to the WMT18 Parallel Corpus Filtering TaskAlibaba Submission to the WMT18 Parallel Corpus Filtering Task
Jun Lu | Xiaoyu Lv | Yangbin Shi | Boxing Chen

This paper describes the Alibaba Machine Translation Group submissions to the WMT 2018 Shared Task on Parallel Corpus Filtering. While evaluating the quality of the parallel corpus, the three characteristics of the corpus are investigated, i.e. 1) the bilingual / translation quality, 2) the monolingual quality and 3) the corpus diversity. Both rule-based and model-based methods are adapted to score the parallel sentence pairs. The final parallel corpus filtering system is reliable, easy to build and adapt to other language pairs.

pdf bib
UTFPR at WMT 2018 : Minimalistic Supervised Corpora Filtering for Machine TranslationUTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation
Gustavo Paetzold

We present the UTFPR systems at the WMT 2018 parallel corpus filtering task. Our supervised approach discerns between good and bad translations by training classic binary classification models over an artificially produced binary classification dataset derived from a high-quality translation set, and a minimalistic set of 6 semantic distance features that rely only on easy-to-gather resources. We rank translations by their probability for the good label. Our results show that logistic regression pairs best with our approach, yielding more consistent results throughout the different settings evaluated.

pdf bib
The ILSP / ARC submission to the WMT 2018 Parallel Corpus Filtering Shared TaskILSP/ARC submission to the WMT 2018 Parallel Corpus Filtering Shared Task
Vassilis Papavassiliou | Sokratis Sofianopoulos | Prokopis Prokopidis | Stelios Piperidis

This paper describes the submission of the Institute for Language and Speech Processing / Athena Research and Innovation Center (ILSP / ARC) for the WMT 2018 Parallel Corpus Filtering shared task. We explore several properties of sentences and sentence pairs that our system explored in the context of the task with the purpose of clustering sentence pairs according to their appropriateness in training MT systems. We also discuss alternative methods for ranking the sentence pairs of the most appropriate clusters with the aim of generating the two datasets (of 10 and 100 million words as required in the task) that were evaluated. By summarizing the results of several experiments that were carried out by the organizers during the evaluation phase, our submission achieved an average BLEU score of 26.41, even though it does not make use of any language-specific resources like bilingual lexica, monolingual corpora, or MT output, while the average score of the best participant system was 27.91.

pdf bib
SYSTRAN Participation to the WMT2018 Shared Task on Parallel Corpus FilteringSYSTRAN Participation to the WMT2018 Shared Task on Parallel Corpus Filtering
MinhQuang Pham | Josep Crego | Jean Senellart

This paper describes the participation of SYSTRAN to the shared task on parallel corpus filtering at the Third Conference on Machine Translation (WMT 2018). We participate for the first time using a neural sentence similarity classifier which aims at predicting the relatedness of sentence pairs in a multilingual context. The paper describes the main characteristics of our approach and discusses the results obtained on the data sets published for the shared task.

pdf bib
The RWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering TaskRWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering Task
Nick Rossenbach | Jan Rosendahl | Yunsu Kim | Miguel Graça | Aman Gokrani | Hermann Ney

This paper describes the submission of RWTH Aachen University for the DeEn parallel corpus filtering task of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These sentence pairs are scored with count-based and neural systems as language and translation models. In addition to single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing corpus filtering system relies on recurrent neural language models and translation models based on the transformer architecture. A model trained on 10 M randomly sampled tokens reaches a performance of 9.2 % BLEU on newstest2018. Using our filtering and ranking techniques we achieve 34.8 % BLEU.EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These sentence pairs are scored with count-based and neural systems as language and translation models. In addition to single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing corpus filtering system relies on recurrent neural language models and translation models based on the transformer architecture. A model trained on 10M randomly sampled tokens reaches a performance of 9.2% BLEU on newstest2018. Using our filtering and ranking techniques we achieve 34.8% BLEU.

pdf bib
Prompsit’s submission to WMT 2018 Parallel Corpus Filtering shared taskWMT 2018 Parallel Corpus Filtering shared task
Víctor M. Sánchez-Cartagena | Marta Bañón | Sergio Ortiz-Rojas | Gema Ramírez

This paper describes Prompsit Language Engineering’s submissions to the WMT 2018 parallel corpus filtering shared task. Our four submissions were based on an automatic classifier for identifying pairs of sentences that are mutual translations. A set of hand-crafted hard rules for discarding sentences with evident flaws were applied before the classifier. We explored different strategies for achieving a training corpus with diverse vocabulary and fluent sentences : language model scoring, an active-learning-inspired data selection algorithm and n-gram saturation. Our submissions were very competitive in comparison with other participants on the 100 million word training corpus.