Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Marco Turchi, Karin Verspoor (Editors)


Anthology ID:
W19-54
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | WMT | WS
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/W19-54
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/W19-54.pdf

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor

pdf bib
Findings of the WMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource ConditionsWMT 2019 Shared Task on Parallel Corpus Filtering for Low-Resource Conditions
Philipp Koehn | Francisco Guzmán | Vishrav Chaudhary | Juan Pino

Following the WMT 2018 Shared Task on Parallel Corpus Filtering, we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2 % and 10 % of the highest-quality data to be used to train machine translation systems. This year, the task tackled the low resource condition of Nepali-English and Sinhala-English. Eleven participants from companies, national research labs, and universities participated in this task.

pdf bib
RTM Stacking Results for Machine Translation Performance PredictionRTM Stacking Results for Machine Translation Performance Prediction
Ergun Biçici

We obtain new results using referential translation machines with increased number of learning models in the set of results that are stacked to obtain a better mixture of experts prediction. We combine features extracted from the word-level predictions with the sentence- or document-level features, which significantly improve the results on the training sets but decrease the test set results.

pdf bib
QE BERT : Bilingual BERT Using Multi-task Learning for Neural Quality EstimationQE BERT: Bilingual BERT Using Multi-task Learning for Neural Quality Estimation
Hyun Kim | Joon-Ho Lim | Hyun-Ki Kim | Seung-Hoon Na

For translation quality estimation at word and sentence levels, this paper presents a novel approach based on BERT that recently has achieved impressive results on various natural language processing tasks. Our proposed model is re-purposed BERT for the translation quality estimation and uses multi-task learning for the sentence-level task and word-level subtasks (i.e., source word, target word, and target gap). Experimental results on Quality Estimation shared task of WMT19 show that our systems show competitive results and provide significant improvements over the baseline.

pdf bib
MIPT System for World-Level Quality EstimationMIPT System for World-Level Quality Estimation
Mikhail Mosyagin | Varvara Logacheva

We explore different model architectures for the WMT 19 shared task on word-level quality estimation of automatic translation. We start with a model similar to Shef-bRNN, which we modify by using conditional random fields for sequence labelling. Additionally, we use a different approach for labelling gaps and source words. We further develop this model by including features from different sources such as BERT, baseline features for the task and transformer encoders. We evaluate the performance of our models on the English-German dataset for the corresponding shared task.

pdf bib
NJU Submissions for the WMT19 Quality Estimation Shared TaskNJU Submissions for the WMT19 Quality Estimation Shared Task
Hou Qi

In this paper, we describe the submissions of the team from Nanjing University for the WMT19 sentence-level Quality Estimation (QE) shared task on English-German language pair. We develop two approaches based on a two-stage neural QE model consisting of a feature extractor and a quality estimator. More specifically, one of the proposed approaches employs the translation knowledge between the two languages from two different translation directions ; while the other one employs extra monolingual knowledge from both source and target sides, obtained by pre-training deep self-attention networks. To efficiently train these two-stage models, a joint learning training method is applied. Experiments show that the ensemble model of the above two models achieves the best results on the benchmark dataset of the WMT17 sentence-level QE shared task and obtains competitive results in WMT19, ranking 3rd out of 10 submissions.

pdf bib
Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings
Elizaveta Yankovskaya | Andre Tättar | Mark Fishel

We propose the use of pre-trained embeddings as features of a regression model for sentence-level quality estimation of machine translation. In our work we combine freely available BERT and LASER multilingual embeddings to train a neural-based regression model. In the second proposed method we use as an input features not only pre-trained embeddings, but also log probability of any machine translation (MT) system. Both methods are applied to several language pairs and are evaluated both as a classical quality estimation system (predicting the HTER score) as well as an MT metric (predicting human judgements of translation quality).

pdf bib
SOURCE : SOURce-Conditional Elmo-style Model for Machine Translation Quality EstimationSOURCE: SOURce-Conditional Elmo-style Model for Machine Translation Quality Estimation
Junpei Zhou | Zhisong Zhang | Zecong Hu

Quality estimation (QE) of machine translation (MT) systems is a task of growing importance. It reduces the cost of post-editing, allowing machine-translated text to be used in formal occasions. In this work, we describe our submission system in WMT 2019 sentence-level QE task. We mainly explore the utilization of pre-trained translation models in QE and adopt a bi-directional translation-like strategy. The strategy is similar to ELMo, but additionally conditions on source sentences. Experiments on WMT QE dataset show that our strategy, which makes the pre-training slightly harder, can bring improvements for QE. In WMT-2019 QE task, our system ranked in the second place on En-De NMT dataset and the third place on En-Ru NMT dataset.

pdf bib
Effort-Aware Neural Automatic Post-Editing
Amirhossein Tebbifakhr | Matteo Negri | Marco Turchi

For this round of the WMT 2019 APE shared task, our submission focuses on addressing the over-correction problem in APE. Over-correction occurs when the APE system tends to rephrase an already correct MT output, and the resulting sentence is penalized by a reference-based evaluation against human post-edits. Our intuition is that this problem can be prevented by informing the system about the predicted quality of the MT output or, in other terms, the expected amount of needed corrections. For this purpose, following the common approach in multilingual NMT, we prepend a special token to the beginning of both the source text and the MT output indicating the required amount of post-editing. Following the best submissions to the WMT 2018 APE shared task, our backbone architecture is based on multi-source Transformer to encode both the MT output and the corresponding source text. We participated both in the English-German and English-Russian subtasks. In the first subtask, our best submission improved the original MT output quality up to +0.98 BLEU and -0.47 TER. In the second subtask, where the higher quality of the MT output increases the risk of over-correction, none of our submitted runs was able to improve the MT output.

pdf bib
UdS Submission for the WMT 19 Automatic Post-Editing TaskUdS Submission for the WMT 19 Automatic Post-Editing Task
Hongfei Xu | Qiuhui Liu | Josef van Genabith

In this paper, we describe our submission to the English-German APE shared task at WMT 2019. We utilize and adapt an NMT architecture originally developed for exploiting context information to APE, implement this in our own transformer model and explore joint training of the APE task with a de-noising encoder.

pdf bib
Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation TaskWMT19 Biomedical Translation Task
Casimiro Pio Carrino | Bardia Rafieian | Marta R. Costa-jussà | José A. R. Fonollosa

In this work, we give a description of the TALP-UPC systems submitted for the WMT19 Biomedical Translation Task. Our proposed strategy is NMT model-independent and relies only on one ingredient, a biomedical terminology list. We first extracted such a terminology list by labelling biomedical words in our training dataset using the BabelNet API. Then, we designed a data preparation strategy to insert the terms information at a token level. Finally, we trained the Transformer model with this terms-informed data. Our best-submitted system ranked 2nd and 3rd for Spanish-English and English-Spanish translation directions, respectively.

pdf bib
Huawei’s NMT Systems for the WMT 2019 Biomedical Translation TaskNMT Systems for the WMT 2019 Biomedical Translation Task
Wei Peng | Jianfeng Liu | Liangyou Li | Qun Liu

This paper describes Huawei’s neural machine translation systems for the WMT 2019 biomedical translation shared task. We trained and fine-tuned our systems on a combination of out-of-domain and in-domain parallel corpora for six translation directions covering EnglishChinese, EnglishFrench and EnglishGerman language pairs. Our submitted systems achieve the best BLEU scores on EnglishFrench and EnglishGerman language pairs according to the official evaluation results. In the EnglishChinese translation task, our systems are in the second place. The enhanced performance is attributed to more in-domain training and more sophisticated models developed. Development of translation models and transfer learning (or domain adaptation) methods has significantly contributed to the progress of the task.

pdf bib
UCAM Biomedical Translation at WMT19 : Transfer Learning Multi-domain EnsemblesUCAM Biomedical Translation at WMT19: Transfer Learning Multi-domain Ensembles
Danielle Saunders | Felix Stahlberg | Bill Byrne

The 2019 WMT Biomedical translation task involved translating Medline abstracts. We approached this using transfer learning to obtain a series of strong neural models on distinct domains, and combining them into multi-domain ensembles. We further experimented with an adaptive language-model ensemble weighting scheme. Our submission achieved the best submitted results on both directions of English-Spanish.

pdf bib
Machine Translation from an Intercomprehension Perspective
Yu Chen | Tania Avgustinova

Within the first shared task on machine translation between similar languages, we present our first attempts on Czech to Polish machine translation from an intercomprehension perspective. We propose methods based on the mutual intelligibility of the two languages, taking advantage of their orthographic and phonological similarity, in the hope to improve over our baselines. The translation results are evaluated using BLEU. On this metric, none of our proposals could outperform the baselines on the final test set. The current setups are rather preliminary, and there are several potential improvements we can try in the future.

pdf bib
Utilizing Monolingual Data in NMT for Similar Languages : Submission to Similar Language Translation TaskNMT for Similar Languages: Submission to Similar Language Translation Task
Jyotsana Khatri | Pushpak Bhattacharyya

This paper describes our submission to Shared Task on Similar Language Translation in Fourth Conference on Machine Translation (WMT 2019). We submitted three systems for Hindi-Nepali direction in which we have examined the performance of a RNN based NMT system, a semi-supervised NMT system where monolingual data of both languages is utilized using the architecture by and a system trained with extra synthetic sentences generated using copy of source and target sentences without using any additional monolingual data.

pdf bib
Neural Machine Translation : Hindi-NepaliHindi-Nepali
Sahinur Rahman Laskar | Partha Pakray | Sivaji Bandyopadhyay

With the extensive use of Machine Translation (MT) technology, there is progressively interest in directly translating between pairs of similar languages. Because the main challenge is to overcome the limitation of available parallel data to produce a precise MT output. Current work relies on the Neural Machine Translation (NMT) with attention mechanism for the similar language translation of WMT19 shared task in the context of Hindi-Nepali pair. The NMT systems trained the Hindi-Nepali parallel corpus and tested, analyzed in Hindi Nepali translation. The official result declared at WMT19 shared task, which shows that our NMT system obtained Bilingual Evaluation Understudy (BLEU) score 24.6 for primary configuration in Nepali to Hindi translation. Also, we have achieved BLEU score 53.7 (Hindi to Nepali) and 49.1 (Nepali to Hindi) in contrastive system type.

pdf bib
Panlingua-KMI MT System for Similar Language Translation Task at WMT 2019KMI MT System for Similar Language Translation Task at WMT 2019
Atul Kr. Ojha | Ritesh Kumar | Akanksha Bansal | Priya Rani

The present paper enumerates the development of Panlingua-KMI Machine Translation (MT) systems for Hindi Nepali language pair, designed as part of the Similar Language Translation Task at the WMT 2019 Shared Task. The Panlingua-KMI team conducted a series of experiments to explore both the phrase-based statistical (PBSMT) and neural methods (NMT). Among the 11 MT systems prepared under this task, 6 PBSMT systems were prepared for Nepali-Hindi, 1 PBSMT for Hindi-Nepali and 2 NMT systems were developed for NepaliHindi. The results show that PBSMT could be an effective method for developing MT systems for closely-related languages. Our Hindi-Nepali PBSMT system was ranked 2nd among the 13 systems submitted for the pair and our Nepali-Hindi PBSMTsystem was ranked 4th among the 12 systems submitted for the task.

pdf bib
UDSDFKI Submission to the WMT2019 CzechPolish Similar Language Translation Shared TaskUDSDFKI Submission to the WMT2019 Czech–Polish Similar Language Translation Shared Task
Santanu Pal | Marcos Zampieri | Josef van Genabith

In this paper we present the UDS-DFKI system submitted to the Similar Language Translation shared task at WMT 2019. The first edition of this shared task featured data from three pairs of similar languages : Czech and Polish, Hindi and Nepali, and Portuguese and Spanish. Participants could choose to participate in any of these three tracks and submit system outputs in any translation direction. We report the results obtained by our system in translating from Czech to Polish and comment on the impact of out-of-domain test data in the performance of our system. UDS-DFKI achieved competitive performance ranking second among ten teams in Czech to Polish translation.

pdf bib
Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation
Michael Przystupa | Muhammad Abdul-Mageed

We present our contribution to the WMT19 Similar Language Translation shared task. We investigate the utility of neural machine translation on three low-resource, similar language pairs : Spanish Portuguese, Czech Polish, and Hindi Nepali. Since state-of-the-art neural machine translation systems still require large amounts of bitext, which we do not have for the pairs we consider, we focus primarily on incorporating monolingual data into our models with backtranslation. In our analysis, we found Transformer models to work best on Spanish Portuguese and Czech Polish translation, whereas LSTMs with global attention worked best on Hindi Nepali translation.

pdf bib
Dual Monolingual Cross-Entropy Delta Filtering of Noisy Parallel Data
Amittai Axelrod | Anish Kumar | Steve Sloto

We introduce a purely monolingual approach to filtering for parallel data from a noisy corpus in a low-resource scenario. Our work is inspired by Junczysdowmunt:2018, but we relax the requirements to allow for cases where no parallel data is available. Our primary contribution is a dual monolingual cross-entropy delta criterion modified from Cynical data selection Axelrod:2017, and is competitive (within 1.8 BLEU) with the best bilingual filtering method when used to train SMT systems. Our approach is featherweight, and runs end-to-end on a standard laptop in three hours.

pdf bib
NRC Parallel Corpus Filtering System for WMT 2019NRC Parallel Corpus Filtering System for WMT 2019
Gabriel Bernier-Colborne | Chi-kiu Lo

We describe the National Research Council Canada team’s submissions to the parallel corpus filtering task at the Fourth Conference on Machine Translation.

pdf bib
Quality and Coverage : The AFRL Submission to the WMT19 Parallel Corpus Filtering for Low-Resource Conditions TaskAFRL Submission to the WMT19 Parallel Corpus Filtering for Low-Resource Conditions Task
Grant Erdmann | Jeremy Gwinnup

The WMT19 Parallel Corpus Filtering For Low-Resource Conditions Task aims to test various methods of filtering a noisy parallel corpora, to make them useful for training machine translation systems. This year the noisy corpora are the relatively low-resource language pairs of Nepali-English and Sinhala-English. This papers describes the Air Force Research Laboratory (AFRL) submissions, including preprocessing methods and scoring metrics. Numerical results indicate a benefit over baseline and the relative benefits of different options.

pdf bib
Webinterpret Submission to the WMT2019 Shared Task on Parallel Corpus FilteringWMT2019 Shared Task on Parallel Corpus Filtering
Jesús González-Rubio

This document describes the participation of Webinterpret in the shared task on parallel corpus filtering at the Fourth Conference on Machine Translation (WMT 2019). Here, we describe the main characteristics of our approach and discuss the results obtained on the data sets published for the shared task.

pdf bib
Filtering of Noisy Parallel Corpora Based on Hypothesis Generation
Zuzanna Parcheta | Germán Sanchis-Trilles | Francisco Casacuberta

The filtering task of noisy parallel corpora in WMT2019 aims to challenge participants to create filtering methods to be useful for training machine translation systems. In this work, we introduce a noisy parallel corpora filtering system based on generating hypotheses by means of a translation model. We train translation models in both language pairs : NepaliEnglish and SinhalaEnglish using provided parallel corpora. We select the training subset for three language pairs (Nepali, Sinhala and Hindi to English) jointly using bilingual cross-entropy selection to create the best possible translation model for both language pairs. Once the translation models are trained, we translate the noisy corpora and generate a hypothesis for each sentence pair. We compute the smoothed BLEU score between the target sentence and generated hypothesis. In addition, we apply several rules to discard very noisy or inadequate sentences which can lower the translation score. These heuristics are based on sentence length, source and target similarity and source language detection. We compare our results with the baseline published on the shared task website, which uses the Zipporah model, over which we achieve significant improvements in one of the conditions in the shared task. The designed filtering system is domain independent and all experiments are conducted using neural machine translation.

pdf bib
The University of Helsinki Submission to the WMT19 Parallel Corpus Filtering TaskUniversity of Helsinki Submission to the WMT19 Parallel Corpus Filtering Task
Raúl Vázquez | Umut Sulubacak | Jörg Tiedemann

This paper describes the University of Helsinki Language Technology group’s participation in the WMT 2019 parallel corpus filtering task. Our scores were produced using a two-step strategy. First, we individually applied a series of filters to remove the ‘bad’ quality sentences. Then, we produced scores for each sentence by weighting these features with a classification model. This methodology allowed us to build a simple and reliable system that is easily adaptable to other language pairs.