Workshop on Statistical Machine Translation (2018)


up

pdf (full)
bib (full)
Proceedings of the Third Conference on Machine Translation: Research Papers

pdf bib
Proceedings of the Third Conference on Machine Translation: Research Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor

pdf bib
Scaling Neural Machine Translation
Myle Ott | Sergey Edunov | David Grangier | Michael Auli

Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation. On WMT’14 English-German translation, we match the accuracy of Vaswani et al. (2017) in under 5 hours when training on 8 GPUs and we obtain a new state of the art of 29.3 BLEU after training for 85 minutes on 128 GPUs. We further improve these results to 29.8 BLEU by training on the much larger Paracrawl dataset. On the WMT’14 English-French task, we obtain a state-of-the-art BLEU of 43.2 in 8.5 hours on 128 GPUs.

pdf bib
Character-level Chinese-English Translation through ASCII EncodingChinese-English Translation through ASCII Encoding
Nikola I. Nikolov | Yuhuang Hu | Mi Xue Tan | Richard H.R. Hahnloser

Character-level Neural Machine Translation (NMT) models have recently achieved impressive results on many language pairs. They mainly do well for Indo-European language pairs, where the languages share the same writing system. However, for translating between Chinese and English, the gap between the two different writing systems poses a major challenge because of a lack of systematic correspondence between the individual linguistic units. In this paper, we enable character-level NMT for Chinese, by breaking down Chinese characters into linguistic units similar to that of Indo-European languages. We use the Wubi encoding scheme, which preserves the original shape and semantic information of the characters, while also being reversible. We show promising results from training Wubi-based models on the character- and subword-level with recurrent as well as convolutional models.

pdf bib
An Analysis of Attention Mechanisms : The Case of Word Sense Disambiguation in Neural Machine Translation
Gongbo Tang | Rico Sennrich | Joakim Nivre

Recent work has shown that the encoder-decoder attention mechanisms in neural machine translation (NMT) are different from the word alignment in statistical machine translation. In this paper, we focus on analyzing encoder-decoder attention mechanisms, in the case of word sense disambiguation (WSD) in NMT models. We hypothesize that attention mechanisms pay more attention to context tokens when translating ambiguous words. We explore the attention distribution patterns when translating ambiguous nouns. Counterintuitively, we find that attention mechanisms are likely to distribute more attention to the ambiguous noun itself rather than context tokens, in comparison to other nouns. We conclude that attention is not the main mechanism used by NMT models to incorporate contextual information for WSD. The experimental results suggest that NMT models learn to encode contextual information necessary for WSD in the encoder hidden states. For the attention mechanism in Transformer models, we reveal that the first few layers gradually learn to align source and target tokens and the last few layers learn to extract features from the related but unaligned context tokens.

pdf bib
Discourse-Related Language Contrasts in English-Croatian Human and Machine TranslationEnglish-Croatian Human and Machine Translation
Margita Šoštarić | Christian Hardmeier | Sara Stymne

We present an analysis of a number of coreference phenomena in English-Croatian human and machine translations. The aim is to shed light on the differences in the way these structurally different languages make use of discourse information and provide insights for discourse-aware machine translation system development. The phenomena are automatically identified in parallel data using annotation produced by parsers and word alignment tools, enabling us to pinpoint patterns of interest in both languages. We make the analysis more fine-grained by including three corpora pertaining to three different registers. In a second step, we create a test set with the challenging linguistic constructions and use it to evaluate the performance of three MT systems. We show that both SMT and NMT systems struggle with handling these discourse phenomena, even though NMT tends to perform somewhat better than SMT. By providing an overview of patterns frequently occurring in actual language use, as well as by pointing out the weaknesses of current MT systems that commonly mistranslate them, we hope to contribute to the effort of resolving the issue of discourse phenomena in MT applications.

pdf bib
Coreference and Coherence in Neural Machine Translation : A Study Using Oracle Experiments
Dario Stojanovski | Alexander Fraser

Cross-sentence context can provide valuable information in Machine Translation and is critical for translation of anaphoric pronouns and for providing consistent translations. In this paper, we devise simple oracle experiments targeting coreference and coherence. Oracles are an easy way to evaluate the effect of different discourse-level phenomena in NMT using BLEU and eliminate the necessity to manually define challenge sets for this purpose. We propose two context-aware NMT models and compare them against models working on a concatenation of consecutive sentences. Concatenation models perform better, but are computationally expensive. We show that NMT models taking advantage of context oracle signals can achieve considerable gains in BLEU, of up to 7.02 BLEU for coreference and 1.89 BLEU for coherence on subtitles translation. Access to strong signals allows us to make clear comparisons between context-aware models.

pdf bib
A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation
Mathias Müller | Annette Rios | Elena Voita | Rico Sennrich

The translation of pronouns presents a special challenge to machine translation to this day, since it often requires context outside the current sentence. Recent work on models that have access to information across sentence boundaries has seen only moderate improvements in terms of automatic evaluation metrics such as BLEU. However, metrics that quantify the overall translation quality are ill-equipped to measure gains from additional context. We argue that a different kind of evaluation is needed to assess how well models translate inter-sentential phenomena such as pronouns. This paper therefore presents a test suite of contrastive translations focused specifically on the translation of pronouns. Furthermore, we perform experiments with several context-aware models. We show that, while gains in BLEU are moderate for those systems, they outperform baselines by a large margin in terms of accuracy on our contrastive test set. Our experiments also show the effectiveness of parameter tying for multi-encoder architectures.

pdf bib
A neural interlingua for multilingual machine translation
Yichao Lu | Phillip Keung | Faisal Ladhak | Vikas Bhardwaj | Shaonan Zhang | Jason Sun

We incorporate an explicit neural interlingua into a multilingual encoder-decoder neural machine translation (NMT) architecture. We demonstrate that our model learns a language-independent representation by performing direct zero-shot translation (without using pivot translation), and by using the source sentence embeddings to create an English Yelp review classifier that, through the mediation of the neural interlingua, can also classify French and German reviews. Furthermore, we show that, despite using a smaller number of parameters than a pairwise collection of bilingual NMT models, our approach produces comparable BLEU scores for each language pair in WMT15.

pdf bib
Improving Neural Language Models with Weight Norm Initialization and Regularization
Christian Herold | Yingbo Gao | Hermann Ney

Embedding and projection matrices are commonly used in neural language models (NLM) as well as in other sequence processing networks that operate on large vocabularies. We examine such matrices in fine-tuned language models and observe that a NLM learns word vectors whose norms are related to the word frequencies. We show that by initializing the weight norms with scaled log word counts, together with other techniques, lower perplexities can be obtained in early epochs of training. We also introduce a weight norm regularization loss term, whose hyperparameters are tuned via a grid search. With this method, we are able to significantly improve perplexities on two word-level language modeling tasks (without dynamic evaluation): from 54.44 to 53.16 on Penn Treebank (PTB) and from 61.45 to 60.13 on WikiText-2 (WT2).

pdf bib
Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations
Sameen Maruf | André F. T. Martins | Gholamreza Haffari

Recent works in neural machine translation have begun to explore document translation. However, translating online multi-speaker conversations is still an open problem. In this work, we propose the task of translating Bilingual Multi-Speaker Conversations, and explore neural architectures which exploit both source and target-side conversation histories for this task. To initiate an evaluation for this task, we introduce datasets extracted from Europarl v7 and OpenSubtitles2016. Our experiments on four language-pairs confirm the significance of leveraging conversation history, both in terms of BLEU and manual evaluation.

pdf bib
Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation
Brian Thompson | Huda Khayrallah | Antonios Anastasopoulos | Arya D. McCarthy | Kevin Duh | Rebecca Marvin | Paul McNamee | Jeremy Gwinnup | Tim Anderson | Philipp Koehn

To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component’s contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surprisingly good when a single component is adapted while holding the rest of the model fixed. We also find that continued training does not move the model very far from the out-of-domain model, compared to a sensitivity analysis metric, suggesting that the out-of-domain model can provide a good generic initialization for the new domain.

pdf bib
Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection
Wei Wang | Taro Watanabe | Macduff Hughes | Tetsuji Nakagawa | Ciprian Chelba

Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet. Denoising is concerned with a different type of data quality and tries to reduce the negative impact of data noise on MT training, in particular, neural MT (NMT) training. This paper generalizes methods for measuring and selecting data for domain MT and applies them to denoising NMT training. The proposed approach uses trusted data and a denoising curriculum realized by online data selection. Intrinsic and extrinsic evaluations of the approach show its significant effectiveness for NMT to train on data with severe noise.

pdf bib
Using Monolingual Data in Neural Machine Translation : a Systematic Study
Franck Burlot | François Yvon

Neural Machine Translation (MT) has radically changed the way systems are developed. A major difference with the previous generation (Phrase-Based MT) is the way monolingual target data, which often abounds, is used in these two paradigms. While Phrase-Based MT can seamlessly integrate very large language models trained on billions of sentences, the best option for Neural MT developers seems to be the generation of artificial parallel data through back-translation-a technique that fails to fully take advantage of existing datasets. In this paper, we conduct a systematic study of back-translation, comparing alternative uses of monolingual data, as well as multiple data generation procedures. Our findings confirm that back-translation is very effective and give new explanations as to why this is the case. We also introduce new data simulation techniques that are almost as effective, yet much cheaper to implement.

pdf bib
A Call for Clarity in Reporting BLEU ScoresBLEU Scores
Matt Post

The field of machine translation faces an under-recognized problem because of inconsistency in the reporting of scores from its dominant metric. Although people refer to the BLEU score, BLEU is in fact a parameterized metric whose values can vary wildly with changes to these parameters. These parameters are often not reported or are hard to find, and consequently, BLEU scores between papers can not be directly compared. I quantify this variation, finding differences as high as 1.8 between commonly used configurations. The main culprit is different tokenization and normalization schemes applied to the reference. Pointing to the success of the parsing community, I suggest machine translation researchers settle upon the BLEU scheme used by the annual Conference on Machine Translation (WMT), which does not allow for user-supplied reference processing, and provide a new tool, SACREBLEU, to facilitate this.

pdf bib
Exploring gap filling as a cheaper alternative to reading comprehension questionnaires when evaluating machine translation for gisting
Mikel L. Forcada | Carolina Scarton | Lucia Specia | Barry Haddow | Alexandra Birch

A popular application of machine translation (MT) is gisting : MT is consumed as is to make sense of text in a foreign language. Evaluation of the usefulness of MT for gisting is surprisingly uncommon. The classical method uses reading comprehension questionnaires (RCQ), in which informants are asked to answer professionally-written questions in their language about a foreign text that has been machine-translated into their language. Recently, gap-filling (GF), a form of cloze testing, has been proposed as a cheaper alternative to RCQ. In GF, certain words are removed from reference translations and readers are asked to fill the gaps left using the machine-translated text as a hint. This paper reports, for the first time, a comparative evaluation, using both RCQ and GF, of translations from multiple MT systems for the same foreign texts, and a systematic study on the effect of variables such as gap density, gap-selection strategies, and document context in GF. The main findings of the study are : (a) both RCQ and GF clearly identify MT to be useful ; (b) global RCQ and GF rankings for the MT systems are mostly in agreement ; (c) GF scores vary very widely across informants, making comparisons among MT systems hard, and (d) unlike RCQ, which is framed around documents, GF evaluation can be framed at the sentence level. These findings support the use of GF as a cheaper alternative to RCQ.gisting: MT is consumed as is to make sense of text in a foreign language. Evaluation of the usefulness of MT for gisting is surprisingly uncommon. The classical method uses reading comprehension questionnaires (RCQ), in which informants are asked to answer professionally-written questions in their language about a foreign text that has been machine-translated into their language. Recently, gap-filling (GF), a form of cloze testing, has been proposed as a cheaper alternative to RCQ. In GF, certain words are removed from reference translations and readers are asked to fill the gaps left using the machine-translated text as a hint. This paper reports, for the first time, a comparative evaluation, using both RCQ and GF, of translations from multiple MT systems for the same foreign texts, and a systematic study on the effect of variables such as gap density, gap-selection strategies, and document context in GF. The main findings of the study are: (a) both RCQ and GF clearly identify MT to be useful; (b) global RCQ and GF rankings for the MT systems are mostly in agreement; (c) GF scores vary very widely across informants, making comparisons among MT systems hard, and (d) unlike RCQ, which is framed around documents, GF evaluation can be framed at the sentence level. These findings support the use of GF as a cheaper alternative to RCQ.

pdf bib
Simple Fusion : Return of the Language Model
Felix Stahlberg | James Cross | Veselin Stoyanov

Neural Machine Translation (NMT) typically leverages monolingual data in training through backtranslation. We investigate an alternative simple method to use monolingual data for NMT training : We combine the scores of a pre-trained and fixed language model (LM) with the scores of a translation model (TM) while the TM is trained from scratch. To achieve that, we train the translation model to predict the residual probability of the training data added to the prediction of the LM. This enables the TM to focus its capacity on modeling the source sentence since it can rely on the LM for fluency. We show that our method outperforms previous approaches to integrate LMs into NMT while the architecture is simpler as it does not require gating networks to balance TM and LM. We observe gains of between +0.24 and +2.36 BLEU on all four test sets (English-Turkish, Turkish-English, Estonian-English, Xhosa-English) on top of ensembles without LM. We compare our method with alternative ways to utilize monolingual data such as backtranslation, shallow fusion, and cold fusion.

pdf bib
Massively Parallel Cross-Lingual Learning in Low-Resource Target Language Translation
Zhong Zhou | Matthias Sperber | Alexander Waibel

We work on translation from rich-resource languages to low-resource languages. The main challenges we identify are the lack of low-resource language data, effective methods for cross-lingual transfer, and the variable-binding problem that is common in neural systems. We build a translation system that addresses these challenges using eight European language families as our test ground. Firstly, we add the source and the target family labels and study intra-family and inter-family influences for effective cross-lingual transfer. We achieve an improvement of +9.9 in BLEU score for English-Swedish translation using eight families compared to the single-family multi-source multi-target baseline. Moreover, we find that training on two neighboring families closest to the low-resource language is often enough. Secondly, we construct an ablation study and find that reasonably good results can be achieved even with considerably less target data. Thirdly, we address the variable-binding problem by building an order-preserving named entity translation model. We obtain 60.6 % accuracy in qualitative evaluation where our translations are akin to human translations in a preliminary study.

pdf bib
Trivial Transfer Learning for Low-Resource Neural Machine Translation
Tom Kocmi | Ondřej Bojar

Transfer learning has been proven as an effective technique for neural machine translation under low-resource conditions. Existing methods require a common target language, language relatedness, or specific training tricks and regimes. We present a simple transfer learning method, where we first train a parent model for a high-resource language pair and then continue the training on a low-resource pair only by replacing the training corpus. This child model performs significantly better than the baseline trained for low-resource pair only. We are the first to show this for targeting different languages, and we observe the improvements even for unrelated languages with different alphabets.

up

pdf (full)
bib (full)
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

bib
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor

pdf bib
Findings of the WMT 2018 Biomedical Translation Shared Task : Evaluation on Medline test setsWMT 2018 Biomedical Translation Shared Task: Evaluation on Medline test sets
Mariana Neves | Antonio Jimeno Yepes | Aurélie Névéol | Cristian Grozea | Amy Siu | Madeleine Kittner | Karin Verspoor

Machine translation enables the automatic translation of textual documents between languages and can facilitate access to information only available in a given language for non-speakers of this language, e.g. research results presented in scientific publications. In this paper, we provide an overview of the Biomedical Translation shared task in the Workshop on Machine Translation (WMT) 2018, which specifically examined the performance of machine translation systems for biomedical texts. This year, we provided test sets of scientific publications from two sources (EDP and Medline) and for six language pairs (English with each of Chinese, French, German, Portuguese, Romanian and Spanish). We describe the development of the various test sets, the submissions that we received and the evaluations that we carried out. We obtained a total of 39 runs from six teams and some of this year’s BLEU scores were somewhat higher that last year’s, especially for teams that made use of biomedical resources or state-of-the-art MT algorithms (e.g. Transformer). Finally, our manual evaluation scored automatic translations higher than the reference translations for German and Spanish.

pdf bib
An Empirical Study of Machine Translation for the Shared Task of WMT18WMT18
Chao Bei | Hao Zong | Yiming Wang | Baoyong Fan | Shiqi Li | Conghu Yuan

This paper describes the Global Tone Communication Co., Ltd.’s submission of the WMT18 shared news translation task. We participated in the English-to-Chinese direction and get the best BLEU (43.8) scores among all the participants. The submitted system focus on data clearing and techniques to build a competitive model for this task. Unlike other participants, the submitted system are mainly relied on the data filtering to obtain the best BLEU score. We do data filtering not only for provided sentences but also for the back translated sentences. The techniques we apply for data filtering include filtering by rules, language models and translation models. We also conduct several experiments to validate the effectiveness of training techniques. According to our experiments, the Annealing Adam optimizing function and ensemble decoding are the most effective techniques for the model training.

pdf bib
The TALP-UPC Machine Translation Systems for WMT18 News Shared Translation TaskTALP-UPC Machine Translation Systems for WMT18 News Shared Translation Task
Noe Casas | Carlos Escolano | Marta R. Costa-jussà | José A. R. Fonollosa

In this article we describe the TALP-UPC research group participation in the WMT18 news shared translation task for Finnish-English and Estonian-English within the multi-lingual subtrack. All of our primary submissions implement an attention-based Neural Machine Translation architecture. Given that Finnish and Estonian belong to the same language family and are similar, we use as training data the combination of the datasets of both language pairs to paliate the data scarceness of each individual pair. We also report the translation quality of systems trained on individual language pair data to serve as baseline and comparison reference.

pdf bib
The AFRL WMT18 Systems : Ensembling, Continuation and CombinationAFRL WMT18 Systems: Ensembling, Continuation and Combination
Jeremy Gwinnup | Tim Anderson | Grant Erdmann | Katherine Young

This paper describes the Air Force Research Laboratory (AFRL) machine translation systems and the improvements that were developed during the WMT18 evaluation campaign. This year, we examined the developments and additions to popular neural machine translation toolkits and measure improvements in performance on the RussianEnglish language pair.

pdf bib
The University of Edinburgh’s Submissions to the WMT18 News Translation TaskUniversity of Edinburgh’s Submissions to the WMT18 News Translation Task
Barry Haddow | Nikolay Bogoychev | Denis Emelin | Ulrich Germann | Roman Grundkiewicz | Kenneth Heafield | Antonio Valerio Miceli Barone | Rico Sennrich

The University of Edinburgh made submissions to all 14 language pairs in the news translation task, with strong performances in most pairs. We introduce new RNN-variant, mixed RNN / Transformer ensembles, data selection and weighting, and extensions to back-translation.

pdf bib
CUNI Submissions in WMT18CUNI Submissions in WMT18
Tom Kocmi | Roman Sudarikov | Ondřej Bojar

We participated in the WMT 2018 shared news translation task in three language pairs : English-Estonian, English-Finnish, and English-Czech. Our main focus was the low-resource language pair of Estonian and English for which we utilized Finnish parallel data in a simple method. We first train a parent model for the high-resource language pair followed by adaptation on the related low-resource language pair. This approach brings a substantial performance boost over the baseline system trained only on Estonian-English parallel data. Our systems are based on the Transformer architecture. For the English to Czech translation, we have evaluated our last year models of hybrid phrase-based approach and neural machine translation mainly for comparison purposes.

pdf bib
JUCBNMT at WMT2018 News Translation Task : Character Based Neural Machine Translation of Finnish to EnglishJUCBNMT at WMT2018 News Translation Task: Character Based Neural Machine Translation of Finnish to English
Sainik Kumar Mahata | Dipankar Das | Sivaji Bandyopadhyay

In the current work, we present a description of the system submitted to WMT 2018 News Translation Shared task. The system was created to translate news text from Finnish to English. The system used a Character Based Neural Machine Translation model to accomplish the given task. The current paper documents the preprocessing steps, the description of the submitted system and the results produced using the same. Our system garnered a BLEU score of 12.9.

pdf bib
PROMT Systems for WMT 2018 Shared Translation TaskPROMT Systems for WMT 2018 Shared Translation Task
Alexander Molchanov

This paper describes the PROMT submissions for the WMT 2018 Shared News Translation Task. This year we participated only in the English-Russian language pair. We built two primary neural networks-based systems : 1) a pure Marian-based neural system and 2) a hybrid system which incorporates OpenNMT-based neural post-editing component into our RBMT engine. We also submitted pure rule-based translation (RBMT) for contrast. We show competitive results with both primary submissions which significantly outperform the RBMT baseline.

pdf bib
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2018WMT 2018
Ngoc-Quan Pham | Jan Niehues | Alexander Waibel

We present our experiments in the scope of the news translation task in WMT 2018, in directions : EnglishGerman. The core of our systems is the encoder-decoder based neural machine translation models using the transformer architecture. We enhanced the model with a deeper architecture. By using techniques to limit the memory consumption, we were able to train models that are 4 times larger on one GPU and improve the performance by 1.2 BLEU points. Furthermore, we performed sentence selection for the newly available ParaCrawl corpus. Thereby, we could improve the effectiveness of the corpus by 0.5 BLEU points.

pdf bib
CUNI Transformer Neural MT System for WMT18CUNI Transformer Neural MT System for WMT18
Martin Popel

We describe our NMT system submitted to the WMT2018 shared task in news translation. Our system is based on the Transformer model (Vaswani et al., 2017). We use an improved technique of backtranslation, where we iterate the process of translating monolingual data in one direction and training an NMT model for the opposite direction using synthetic parallel data. We apply a simple but effective filtering of the synthetic data. We pre-process the input sentences using coreference resolution in order to disambiguate the gender of pro-dropped personal pronouns. Finally, we apply two simple post-processing substitutions on the translated output. Our system is significantly (p 0.05) better than all other English-Czech and Czech-English systems in WMT2018.

pdf bib
The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018RWTH Aachen University Supervised Machine Translation Systems for WMT 2018
Julian Schamper | Jan Rosendahl | Parnia Bahar | Yunsu Kim | Arne Nix | Hermann Ney

This paper describes the statistical machine translation systems developed at RWTH Aachen University for the GermanEnglish, EnglishTurkish and ChineseEnglish translation tasks of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles of neural machine translation systems based on the Transformer architecture. Our main focus is on the GermanEnglish task where we to all automatic scored first with respect metrics provided by the organizers. We identify data selection, fine-tuning, batch size and model dimension as important hyperparameters. In total we improve by 6.8 % BLEU over our last year’s submission and by 4.8 % BLEU over the winning system of the 2017 GermanEnglish task. In EnglishTurkish task, we show 3.6 % BLEU improvement over the last year’s winning system. We further report results on the ChineseEnglish task where we improve 2.2 % BLEU on average over our baseline systems but stay behind the 2018 winning systems.

pdf bib
The University of Cambridge’s Machine Translation Systems for WMT18University of Cambridge’s Machine Translation Systems for WMT18
Felix Stahlberg | Adrià de Gispert | Bill Byrne

The University of Cambridge submission to the WMT18 news translation task focuses on the combination of diverse models of translation. We compare recurrent, convolutional, and self-attention-based neural models on German-English, English-German, and Chinese-English. Our final system combines all neural models together with a phrase-based SMT system in an MBR-based scheme. We report small but consistent gains on top of strong Transformer ensembles.

pdf bib
The LMU Munich Unsupervised Machine Translation SystemsLMU Munich Unsupervised Machine Translation Systems
Dario Stojanovski | Viktor Hangya | Matthias Huck | Alexander Fraser

We describe LMU Munich’s unsupervised machine translation systems for EnglishGerman translation. These systems were used to participate in the WMT18 news translation shared task and more specifically, for the unsupervised learning sub-track. The systems are trained on English and German monolingual data only and exploit and combine previously proposed techniques such as using word-by-word translated data based on bilingual word embeddings, denoising and on-the-fly backtranslation.

pdf bib
Tencent Neural Machine Translation Systems for WMT18WMT18
Mingxuan Wang | Li Gong | Wenhuan Zhu | Jun Xie | Chao Bian

We participated in the WMT 2018 shared news translation task on EnglishChinese language pair. Our systems are based on attentional sequence-to-sequence models with some form of recursion and self-attention. Some data augmentation methods are also introduced to improve the translation performance. The best translation result is obtained with ensemble and reranking techniques. Our ChineseEnglish system achieved the highest cased BLEU score among all 16 submitted systems, and our EnglishChinese system ranked the third out of 18 submitted systems.

pdf bib
The University of Maryland’s Chinese-English Neural Machine Translation Systems at WMT18University of Maryland’s Chinese-English Neural Machine Translation Systems at WMT18
Weijia Xu | Marine Carpuat

This paper describes the University of Maryland’s submission to the WMT 2018 ChineseEnglish news translation tasks. Our systems are BPE-based self-attentional Transformer networks with parallel and backtranslated monolingual training data. Using ensembling and reranking, we improve over the Transformer baseline by +1.4 BLEU for ChineseEnglish and +3.97 BLEU for EnglishChinese on newstest2017. Our best systems reach BLEU scores of 24.4 for ChineseEnglish and 39.0 for EnglishChinese on newstest2018.newstest2017. Our best systems reach BLEU scores of 24.4 for Chinese→English and 39.0 for English→Chinese on newstest2018.

pdf bib
EvalD Reference-Less Discourse Evaluation for WMT18EvalD Reference-Less Discourse Evaluation for WMT18
Ondřej Bojar | Jiří Mírovský | Kateřina Rysová | Magdaléna Rysová

We present the results of automatic evaluation of discourse in machine translation (MT) outputs using the EVALD tool. EVALD was originally designed and trained to assess the quality of human writing, for native speakers and foreign-language learners. MT has seen a tremendous leap in translation quality at the level of sentences and it is thus interesting to see if the human-level evaluation is becoming relevant.human writing, for native speakers and foreign-language learners. MT has seen a tremendous leap in translation quality at the level of sentences and it is thus interesting to see if the human-level evaluation is becoming relevant.

pdf bib
Fine-grained evaluation of German-English Machine Translation based on a Test SuiteGerman-English Machine Translation based on a Test Suite
Vivien Macketanz | Eleftherios Avramidis | Aljoscha Burchardt | Hans Uszkoreit

We present an analysis of 16 state-of-the-art MT systems on German-English based on a linguistically-motivated test suite. The test suite has been devised manually by a team of language professionals in order to cover a broad variety of linguistic phenomena that MT often fails to translate properly. It contains 5,000 test sentences covering 106 linguistic phenomena in 14 categories, with an increased focus on verb tenses, aspects and moods. The MT outputs are evaluated in a semi-automatic way through regular expressions that focus only on the part of the sentence that is relevant to each phenomenon. Through our analysis, we are able to compare systems based on their performance on these categories. Additionally, we reveal strengths and weaknesses of particular systems and we identify grammatical phenomena where the overall performance of MT is relatively low.

pdf bib
The Word Sense Disambiguation Test Suite at WMT18WMT18
Annette Rios | Mathias Müller | Rico Sennrich

We present a task to measure an MT system’s capability to translate ambiguous words with their correct sense according to the given context. The task is based on the GermanEnglish Word Sense Disambiguation (WSD) test set ContraWSD (Rios Gonzales et al., 2017), but it has been filtered to reduce noise, and the evaluation has been adapted to assess MT output directly rather than scoring existing translations. We evaluate all GermanEnglish submissions to the WMT’18 shared translation task, plus a number of submissions from previous years, and find that performance on the task has markedly improved compared to the 2016 WMT submissions (81%93 % accuracy on the WSD task). We also find that the unsupervised submissions to the task have a low WSD capability, and predominantly translate ambiguous source words with the same sense.

pdf bib
The AFRL-Ohio State WMT18 Multimodal System : Combining Visual with TraditionalAFRL-Ohio State WMT18 Multimodal System: Combining Visual with Traditional
Jeremy Gwinnup | Joshua Sandvick | Michael Hutt | Grant Erdmann | John Duselis | James Davis

AFRL-Ohio State extends its usage of visual domain-driven machine translation for use as a peer with traditional machine translation systems. As a peer, it is enveloped into a system combination of neural and statistical MT systems to present a composite translation.

pdf bib
CUNI System for the WMT18 Multimodal Translation TaskCUNI System for the WMT18 Multimodal Translation Task
Jindřich Helcl | Jindřich Libovický | Dušan Variš

We present our submission to the WMT18 Multimodal Translation Task. The main feature of our submission is applying a self-attentive network instead of a recurrent neural network. We evaluate two methods of incorporating the visual features in the model : first, we include the image representation as another input to the network ; second, we train the model to predict the visual features and use it as an auxiliary objective. For our submission, we acquired both textual and multimodal additional data. Both of the proposed methods yield significant improvements over recurrent networks and self-attentive textual baselines.

pdf bib
Sheffield Submissions for WMT18 Multimodal Translation Shared TaskSheffield Submissions for WMT18 Multimodal Translation Shared Task
Chiraag Lala | Pranava Swaroop Madhyastha | Carolina Scarton | Lucia Specia

This paper describes the University of Sheffield’s submissions to the WMT18 Multimodal Machine Translation shared task. We participated in both tasks 1 and 1b. For task 1, we build on a standard sequence to sequence attention-based neural machine translation system (NMT) and investigate the utility of multimodal re-ranking approaches. More specifically, n-best translation candidates from this system are re-ranked using novel multimodal cross-lingual word sense disambiguation models. For task 1b, we explore three approaches : (i) re-ranking based on cross-lingual word sense disambiguation (as for task 1), (ii) re-ranking based on consensus of NMT n-best lists from German-Czech, French-Czech and English-Czech systems, and (iii) data augmentation by generating English source data through machine translation from French to English and from German to English followed by hypothesis selection using a multimodal-reranker.

pdf bib
Translation of Biomedical Documents with Focus on Spanish-EnglishSpanish-English
Mirela-Stefania Duma | Wolfgang Menzel

For the WMT 2018 shared task of translating documents pertaining to the Biomedical domain, we developed a scoring formula that uses an unsophisticated and effective method of weighting term frequencies and was integrated in a data selection pipeline. The method was applied on five language pairs and it performed best on Portuguese-English, where a BLEU score of 41.84 placed it third out of seven runs submitted by three institutions. In this paper, we describe our method and results with a special focus on Spanish-English where we compare it against a state-of-the-art method. Our contribution to the task lies in introducing a fast, unsupervised method for selecting domain-specific data for training models which obtain good results using only 10 % of the general domain data.

pdf bib
Ensemble of Translators with Automatic Selection of the Best Translation the submission of FOKUS to the WMT 18 biomedical translation taskFOKUS to the WMT 18 biomedical translation task –
Cristian Grozea

This paper describes the system of Fraunhofer FOKUS for the WMT 2018 biomedical translation task. Our approach, described here, was to automatically select the most promising translation from a set of candidates produced with NMT (Transformer) models. We selected the highest fidelity translation of each sentence by using a dictionary, stemming and a set of heuristics. Our method is simple, can use any machine translators, and requires no further training in addition to that already employed to build the NMT models. The downside is that the score did not increase over the best in ensemble, but was quite close to it (difference about 0.5 BLEU).

pdf bib
LMU Munich’s Neural Machine Translation Systems at WMT 2018LMU Munich’s Neural Machine Translation Systems at WMT 2018
Matthias Huck | Dario Stojanovski | Viktor Hangya | Alexander Fraser

We present the LMU Munich machine translation systems for the EnglishGerman language pair. We have built neural machine translation systems for both translation directions (EnglishGerman and GermanEnglish) and for two different domains (the biomedical domain and the news domain). The systems were used for our participation in the WMT18 biomedical translation task and in the shared task on machine translation of news. The main focus of our recent system development efforts has been on achieving improvements in the biomedical domain over last year’s strong biomedical translation engine for EnglishGerman (Huck et al., 2017a). Considerable progress has been made in the latter task, which we report on in this paper.

pdf bib
Hunter NMT System for WMT18 Biomedical Translation Task : Transfer Learning in Neural Machine TranslationNMT System for WMT18 Biomedical Translation Task: Transfer Learning in Neural Machine Translation
Abdul Khan | Subhadarshi Panda | Jia Xu | Lampros Flokas

This paper describes the submission of Hunter Neural Machine Translation (NMT) to the WMT’18 Biomedical translation task from English to French. The discrepancy between training and test data distribution brings a challenge to translate text in new domains. Beyond the previous work of combining in-domain with out-of-domain models, we found accuracy and efficiency gain in combining different in-domain models. We conduct extensive experiments on NMT with transfer learning. We train on different in-domain Biomedical datasets one after another. That means parameters of the previous training serve as the initialization of the next one. Together with a pre-trained out-of-domain News model, we enhanced translation quality with 3.73 BLEU points over the baseline. Furthermore, we applied ensemble learning on training models of intermediate epochs and achieved an improvement of 4.02 BLEU points over the baseline. Overall, our system is 11.29 BLEU points above the best system of last year on the EDP 2017 test set.transfer learning. We train on different in-domain Biomedical datasets one after another. That means parameters of the previous training serve as the initialization of the next one. Together with a pre-trained out-of-domain News model, we enhanced translation quality with 3.73 BLEU points over the baseline. Furthermore, we applied ensemble learning on training models of intermediate epochs and achieved an improvement of 4.02 BLEU points over the baseline. Overall, our system is 11.29 BLEU points above the best system of last year on the EDP 2017 test set.

pdf bib
UFRGS Participation on the WMT Biomedical Translation Shared TaskUFRGS Participation on the WMT Biomedical Translation Shared Task
Felipe Soares | Karin Becker

This paper describes the machine translation systems developed by the Universidade Federal do Rio Grande do Sul (UFRGS) team for the biomedical translation shared task. Our systems are based on statistical machine translation and neural machine translation, using the Moses and OpenNMT toolkits, respectively. We participated in four translation directions for the English / Spanish and English / Portuguese language pairs. To create our training data, we concatenated several parallel corpora, both from in-domain and out-of-domain sources, as well as terminological resources from UMLS. Our systems achieved the best BLEU scores according to the official shared task evaluation.

pdf bib
Neural Machine Translation with the Transformer and Multi-Source Romance Languages for the Biomedical WMT 2018 taskRomance Languages for the Biomedical WMT 2018 task
Brian Tubay | Marta R. Costa-jussà

The Transformer architecture has become the state-of-the-art in Machine Translation. This model, which relies on attention-based mechanisms, has outperformed previous neural machine translation architectures in several tasks. In this system description paper, we report details of training neural machine translation with multi-source Romance languages with the Transformer model and in the evaluation frame of the biomedical WMT 2018 task. Using multi-source languages from the same family allows improvements of over 6 BLEU points.

pdf bib
ITER : Improving Translation Edit Rate through Optimizable Edit CostsITER: Improving Translation Edit Rate through Optimizable Edit Costs
Joybrata Panja | Sudip Kumar Naskar

The paper presents our participation in the WMT 2018 Metrics Shared Task. We propose an improved version of Translation Edit / Error Rate (TER). In addition to including the basic edit operations in TER, namely-insertion, deletion, substitution and shift, our metric also allows stem matching, optimizable edit costs and better normalization so as to correlate better with human judgement scores. The proposed metric shows much higher correlation with human judgments than TER.

pdf bib
RUSE : Regressor Using Sentence Embeddings for Automatic Machine Translation EvaluationRUSE: Regressor Using Sentence Embeddings for Automatic Machine Translation Evaluation
Hiroki Shimanaka | Tomoyuki Kajiwara | Mamoru Komachi

We introduce the RUSE metric for the WMT18 metrics shared task. Sentence embeddings can capture global information that can not be captured by local features based on character or word N-grams. Although training sentence embeddings using small-scale translation datasets with manual evaluation is difficult, sentence embeddings trained from large-scale data in other tasks can improve the automatic evaluation of machine translation. We use a multi-layer perceptron regressor based on three types of sentence embeddings. The experimental results of the WMT16 and WMT17 datasets show that the RUSE metric achieves a state-of-the-art performance in both segment- and system-level metrics tasks with embedding features only.

pdf bib
Neural Machine Translation for English-TamilEnglish-Tamil
Himanshu Choudhary | Aditya Kumar Pathak | Rajiv Ratan Saha | Ponnurangam Kumaraguru

A huge amount of valuable resources is available on the web in English, which are often translated into local languages to facilitate knowledge sharing among local people who are not much familiar with English. However, translating such content manually is very tedious, costly, and time-consuming process. To this end, machine translation is an efficient approach to translate text without any human involvement. Neural machine translation (NMT) is one of the most recent and effective translation technique amongst all existing machine translation systems. In this paper, we apply NMT for English-Tamil language pair. We propose a novel neural machine translation technique using word-embedding along with Byte-Pair-Encoding (BPE) to develop an efficient translation system that overcomes the OOV (Out Of Vocabulary) problem for languages which do not have much translations available online. We use the BLEU score for evaluating the system performance. Experimental results confirm that our proposed MIDAS translator (8.33 BLEU score) outperforms Google translator (3.75 BLEU score).

pdf bib
The Benefit of Pseudo-Reference Translations in Quality Estimation of MT OutputMT Output
Melania Duma | Wolfgang Menzel

In this paper, a novel approach to Quality Estimation is introduced, which extends the method in (Duma and Menzel, 2017) by also considering pseudo-reference translations as data sources to the tree and sequence kernels used before. Two variants of the system were submitted to the sentence level WMT18 Quality Estimation Task for the English-German language pair. They have been ranked 4th and 6th out of 13 systems in the SMT track, while in the NMT track ranks 4 and 5 out of 11 submissions have been reached.

pdf bib
Supervised and Unsupervised Minimalist Quality Estimators : Vicomtech’s Participation in the WMT 2018 Quality Estimation TaskWMT 2018 Quality Estimation Task
Thierry Etchegoyhen | Eva Martínez Garcia | Andoni Azpeitia

We describe Vicomtech’s participation in the WMT 2018 shared task on quality estimation, for which we submitted minimalist quality estimators. The core of our approach is based on two simple features : lexical translation overlaps and language model cross-entropy scores. These features are exploited in two system variants : uMQE is an unsupervised system, where the final quality score is obtained by averaging individual feature scores ; sMQE is a supervised variant, where the final score is estimated by a Support Vector Regressor trained on the available annotated datasets. The main goal of our minimalist approach to quality estimation is to provide reliable estimators that require minimal deployment effort, few resources, and, in the case of uMQE, do not depend on costly data annotation or post-editing. Our approach was applied to all language pairs in sentence quality estimation, obtaining competitive results across the board.

pdf bib
Sheffield Submissions for the WMT18 Quality Estimation Shared TaskSheffield Submissions for the WMT18 Quality Estimation Shared Task
Julia Ive | Carolina Scarton | Frédéric Blain | Lucia Specia

In this paper we present the University of Sheffield submissions for the WMT18 Quality Estimation shared task. We discuss our submissions to all four sub-tasks, where ours is the only team to participate in all language pairs and variations (37 combinations). Our systems show competitive results and outperform the baseline in nearly all cases.

pdf bib
UAlacant machine translation quality estimation at WMT 2018 : a simple approach using phrase tables and feed-forward neural networksUAlacant machine translation quality estimation at WMT 2018: a simple approach using phrase tables and feed-forward neural networks
Felipe Sánchez-Martínez | Miquel Esplà-Gomis | Mikel L. Forcada

We describe the Universitat d’Alacant submissions to the word- and sentence-level machine translation (MT) quality estimation (QE) shared task at WMT 2018. Our approach to word-level MT QE builds on previous work to mark the words in the machine-translated sentence as OK or BAD, and is extended to determine if a word or sequence of words need to be inserted in the gap after each word. Our sentence-level submission simply uses the edit operations predicted by the word-level approach to approximate TER. The method presented ranked first in the sub-task of identifying insertions in gaps for three out of the six datasets, and second in the rest of them.OK or BAD, and is extended to determine if a word or sequence of words need to be inserted in the gap after each word. Our sentence-level submission simply uses the edit operations predicted by the word-level approach to approximate TER. The method presented ranked first in the sub-task of identifying insertions in gaps for three out of the six datasets, and second in the rest of them.

pdf bib
Alibaba Submission for WMT18 Quality Estimation TaskAlibaba Submission for WMT18 Quality Estimation Task
Jiayi Wang | Kai Fan | Bo Li | Fengming Zhou | Boxing Chen | Yangbin Shi | Luo Si

The goal of WMT 2018 Shared Task on Translation Quality Estimation is to investigate automatic methods for estimating the quality of machine translation results without reference translations. This paper presents the QE Brain system, which proposes the neural Bilingual Expert model as a feature extractor based on conditional target language model with a bidirectional transformer and then processes the semantic representations of source and the translation output with a Bi-LSTM predictive model for automatic quality estimation. The system has been applied to the sentence-level scoring and ranking tasks as well as the word-level tasks for finding errors for each word in translations. An extensive set of experimental results have shown that our system outperformed the best results in WMT 2017 Quality Estimation tasks and obtained top results in WMT 2018.

pdf bib
Quality Estimation with Force-Decoded Attention and Cross-lingual Embeddings
Elizaveta Yankovskaya | Andre Tättar | Mark Fishel

This paper describes the submissions of the team from the University of Tartu for the sentence-level Quality Estimation shared task of WMT18. The proposed models use features based on attention weights of a neural machine translation system and cross-lingual phrase embeddings as input features of a regression model. Two of the proposed models require only a neural machine translation system with an attention mechanism with no additional resources. Results show that combining neural networks and baseline features leads to significant improvements over the baseline features alone.

pdf bib
MS-UEdin Submission to the WMT2018 APE Shared Task : Dual-Source Transformer for Automatic Post-EditingMS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source Transformer for Automatic Post-Editing
Marcin Junczys-Dowmunt | Roman Grundkiewicz

This paper describes the Microsoft and University of Edinburgh submission to the Automatic Post-editing shared task at WMT2018. Based on training data and systems from the WMT2017 shared task, we re-implement our own models from the last shared task and introduce improvements based on extensive parameter sharing. Next we experiment with our implementation of dual-source transformer models and data selection for the IT domain. Our submissions decisively wins the SMT post-editing sub-task establishing the new state-of-the-art and is a very close second (or equal, 16.46 vs 16.50 TER) in the NMT sub-task. Based on the rather weak results in the NMT sub-task, we hypothesize that neural-on-neural APE might not be actually useful.

pdf bib
A Transformer-Based Multi-Source Automatic Post-Editing System
Santanu Pal | Nico Herbig | Antonio Krüger | Josef van Genabith

This paper presents our EnglishGerman Automatic Post-Editing (APE) system submitted to the APE Task organized at WMT 2018 (Chatterjee et al., 2018). The proposed model is an extension of the transformer architecture : two separate self-attention-based encoders encode the machine translation output (mt) and the source (src), followed by a joint encoder that attends over a combination of these two encoded sequences (encsrc and encmt) for generating the post-edited sentence. We compare this multi-source architecture (i.e, src, mt pe) to a monolingual transformer (i.e., mt pe) model and an ensemble combining the multi-source src, mt pe and single-source mt pe models. For both the PBSMT and the NMT task, the ensemble yields the best results, followed by the multi-source model and last the single-source approach. Our best model, the ensemble, achieves a BLEU score of 66.16 and 74.22 for the PBSMT and NMT task, respectively.

pdf bib
DFKI-MLT System Description for the WMT18 Automatic Post-editing TaskDFKI-MLT System Description for the WMT18 Automatic Post-editing Task
Daria Pylypenko | Raphael Rubino

This paper presents the Automatic Post-editing (APE) systems submitted by the DFKI-MLT group to the WMT’18 APE shared task. Three monolingual neural sequence-to-sequence APE systems were trained using target-language data only : one using an attentional recurrent neural network architecture and two using the attention-only (transformer) architecture. The training data was composed of machine translated (MT) output used as source to the APE model aligned with their manually post-edited version or reference translation as target. We made use of the provided training sets only and trained APE models applicable to phrase-based and neural MT outputs. Results show better performances reached by the attention-only model over the recurrent one, significant improvement over the baseline when post-editing phrase-based MT output but degradation when applied to neural MT output.transformer) architecture. The training data was composed of machine translated (MT) output used as source to the APE model aligned with their manually post-edited version or reference translation as target. We made use of the provided training sets only and trained APE models applicable to phrase-based and neural MT outputs. Results show better performances reached by the attention-only model over the recurrent one, significant improvement over the baseline when post-editing phrase-based MT output but degradation when applied to neural MT output.

pdf bib
The Speechmatics Parallel Corpus Filtering System for WMT18WMT18
Tom Ash | Remi Francis | Will Williams

Our entry to the parallel corpus filtering task uses a two-step strategy. The first step uses a series of pragmatic hard ‘rules’ to remove the worst example sentences. This first step reduces the effective corpus size down from the initial 1 billion to 160 million tokens. The second step uses four different heuristics weighted to produce a score that is then used for further filtering down to 100 or 10 million tokens. Our final system produces competitive results without requiring excessive fine tuning to the exact task or language pair. The first step in isolation provides a very fast filter that gives most of the gains of the final system.

pdf bib
An Unsupervised System for Parallel Corpus Filtering
Viktor Hangya | Alexander Fraser

In this paper we describe LMU Munich’s submission for the WMT 2018 Parallel Corpus Filtering shared task which addresses the problem of cleaning noisy parallel corpora. The task of mining and cleaning parallel sentences is important for improving the quality of machine translation systems, especially for low-resource languages. We tackle this problem in a fully unsupervised fashion relying on bilingual word embeddings created without any bilingual signal. After pre-filtering noisy data we rank sentence pairs by calculating bilingual sentence-level similarities and then remove redundant data by employing monolingual similarity as well. Our unsupervised system achieved good performance during the official evaluation of the shared task, scoring only a few BLEU points behind the best systems, while not requiring any parallel training data.WMT 2018 Parallel Corpus Filtering shared task which addresses the problem of cleaning noisy parallel corpora. The task of mining and cleaning parallel sentences is important for improving the quality of machine translation systems, especially for low-resource languages. We tackle this problem in a fully unsupervised fashion relying on bilingual word embeddings created without any bilingual signal. After pre-filtering noisy data we rank sentence pairs by calculating bilingual sentence-level similarities and then remove redundant data by employing monolingual similarity as well. Our unsupervised system achieved good performance during the official evaluation of the shared task, scoring only a few BLEU points behind the best systems, while not requiring any parallel training data.

pdf bib
Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora
Marcin Junczys-Dowmunt

In this work we introduce dual conditional cross-entropy filtering for noisy parallel data. For each sentence pair of the noisy parallel corpus we compute cross-entropy scores according to two inverse translation models trained on clean data. We penalize divergent cross-entropies and weigh the penalty by the cross-entropy average of both models. Sorting or thresholding according to these scores results in better subsets of parallel data. We achieve higher BLEU scores with models trained on parallel data filtered only from Paracrawl than with models trained on clean WMT data. We further evaluate our method in the context of the WMT2018 shared task on parallel corpus filtering and achieve the overall highest ranking scores of the shared task, scoring top in three out of four subtasks.

pdf bib
Measuring sentence parallelism using Mahalanobis distances : The NRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared taskNRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared task
Patrick Littell | Samuel Larkin | Darlene Stewart | Michel Simard | Cyril Goutte | Chi-kiu Lo

The WMT18 shared task on parallel corpus filtering (Koehn et al., 2018b) challenged teams to score sentence pairs from a large high-recall, low-precision web-scraped parallel corpus (Koehn et al., 2018a). Participants could use existing sample corpora (e.g. past WMT data) as a supervisory signal to learn what a clean corpus looks like. However, in lower-resource situations it often happens that the target corpus of the language is the only sample of parallel text in that language. We therefore made several unsupervised entries, setting ourselves an additional constraint that we not utilize the additional clean parallel corpora. One such entry fairly consistently scored in the top ten systems in the 100M-word conditions, and for one tasktranslating the European Medicines Agency corpus (Tiedemann, 2009)scored among the best systems even in the 10M-word conditions.only sample of parallel text in that language. We therefore made several unsupervised entries, setting ourselves an additional constraint that we not utilize the additional clean parallel corpora. One such entry fairly consistently scored in the top ten systems in the 100M-word conditions, and for one task—translating the European Medicines Agency corpus (Tiedemann, 2009)—scored among the best systems even in the 10M-word conditions.

pdf bib
Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric : The NRC supervised submissions to the Parallel Corpus Filtering taskNRC supervised submissions to the Parallel Corpus Filtering task
Chi-kiu Lo | Michel Simard | Darlene Stewart | Samuel Larkin | Cyril Goutte | Patrick Littell

We present our semantic textual similarity approach in filtering a noisy web crawled parallel corpus using YiSia novel semantic machine translation evaluation metric. The systems mainly based on this supervised approach perform well in the WMT18 Parallel Corpus Filtering shared task (4th place in 100-million-word evaluation, 8th place in 10-million-word evaluation, and 6th place overall, out of 48 submissions). In fact, our best performing systemNRC-yisi-bicov is one of the only four submissions ranked top 10 in both evaluations. Our submitted systems also include some initial filtering steps for scaling down the size of the test corpus and a final redundancy removal step for better semantic and token coverage of the filtered corpus. In this paper, we also describe our unsuccessful attempt in automatically synthesizing a noisy parallel development corpus for tuning the weights to combine different parallelism and fluency features.

pdf bib
Alibaba Submission to the WMT18 Parallel Corpus Filtering TaskAlibaba Submission to the WMT18 Parallel Corpus Filtering Task
Jun Lu | Xiaoyu Lv | Yangbin Shi | Boxing Chen

This paper describes the Alibaba Machine Translation Group submissions to the WMT 2018 Shared Task on Parallel Corpus Filtering. While evaluating the quality of the parallel corpus, the three characteristics of the corpus are investigated, i.e. 1) the bilingual / translation quality, 2) the monolingual quality and 3) the corpus diversity. Both rule-based and model-based methods are adapted to score the parallel sentence pairs. The final parallel corpus filtering system is reliable, easy to build and adapt to other language pairs.

pdf bib
UTFPR at WMT 2018 : Minimalistic Supervised Corpora Filtering for Machine TranslationUTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation
Gustavo Paetzold

We present the UTFPR systems at the WMT 2018 parallel corpus filtering task. Our supervised approach discerns between good and bad translations by training classic binary classification models over an artificially produced binary classification dataset derived from a high-quality translation set, and a minimalistic set of 6 semantic distance features that rely only on easy-to-gather resources. We rank translations by their probability for the good label. Our results show that logistic regression pairs best with our approach, yielding more consistent results throughout the different settings evaluated.

pdf bib
The ILSP / ARC submission to the WMT 2018 Parallel Corpus Filtering Shared TaskILSP/ARC submission to the WMT 2018 Parallel Corpus Filtering Shared Task
Vassilis Papavassiliou | Sokratis Sofianopoulos | Prokopis Prokopidis | Stelios Piperidis

This paper describes the submission of the Institute for Language and Speech Processing / Athena Research and Innovation Center (ILSP / ARC) for the WMT 2018 Parallel Corpus Filtering shared task. We explore several properties of sentences and sentence pairs that our system explored in the context of the task with the purpose of clustering sentence pairs according to their appropriateness in training MT systems. We also discuss alternative methods for ranking the sentence pairs of the most appropriate clusters with the aim of generating the two datasets (of 10 and 100 million words as required in the task) that were evaluated. By summarizing the results of several experiments that were carried out by the organizers during the evaluation phase, our submission achieved an average BLEU score of 26.41, even though it does not make use of any language-specific resources like bilingual lexica, monolingual corpora, or MT output, while the average score of the best participant system was 27.91.

pdf bib
SYSTRAN Participation to the WMT2018 Shared Task on Parallel Corpus FilteringSYSTRAN Participation to the WMT2018 Shared Task on Parallel Corpus Filtering
MinhQuang Pham | Josep Crego | Jean Senellart

This paper describes the participation of SYSTRAN to the shared task on parallel corpus filtering at the Third Conference on Machine Translation (WMT 2018). We participate for the first time using a neural sentence similarity classifier which aims at predicting the relatedness of sentence pairs in a multilingual context. The paper describes the main characteristics of our approach and discusses the results obtained on the data sets published for the shared task.

pdf bib
The RWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering TaskRWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering Task
Nick Rossenbach | Jan Rosendahl | Yunsu Kim | Miguel Graça | Aman Gokrani | Hermann Ney

This paper describes the submission of RWTH Aachen University for the DeEn parallel corpus filtering task of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These sentence pairs are scored with count-based and neural systems as language and translation models. In addition to single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing corpus filtering system relies on recurrent neural language models and translation models based on the transformer architecture. A model trained on 10 M randomly sampled tokens reaches a performance of 9.2 % BLEU on newstest2018. Using our filtering and ranking techniques we achieve 34.8 % BLEU.EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These sentence pairs are scored with count-based and neural systems as language and translation models. In addition to single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing corpus filtering system relies on recurrent neural language models and translation models based on the transformer architecture. A model trained on 10M randomly sampled tokens reaches a performance of 9.2% BLEU on newstest2018. Using our filtering and ranking techniques we achieve 34.8% BLEU.

pdf bib
Prompsit’s submission to WMT 2018 Parallel Corpus Filtering shared taskWMT 2018 Parallel Corpus Filtering shared task
Víctor M. Sánchez-Cartagena | Marta Bañón | Sergio Ortiz-Rojas | Gema Ramírez

This paper describes Prompsit Language Engineering’s submissions to the WMT 2018 parallel corpus filtering shared task. Our four submissions were based on an automatic classifier for identifying pairs of sentences that are mutual translations. A set of hand-crafted hard rules for discarding sentences with evident flaws were applied before the classifier. We explored different strategies for achieving a training corpus with diverse vocabulary and fluent sentences : language model scoring, an active-learning-inspired data selection algorithm and n-gram saturation. Our submissions were very competitive in comparison with other participants on the 100 million word training corpus.