Machine Translation Summit (2021)


up

pdf (full)
bib (full)
Proceedings of Machine Translation Summit XVIII: Research Track

pdf bib
Proceedings of Machine Translation Summit XVIII: Research Track
Kevin Duh | Francisco Guzmán

pdf bib
Learning Curricula for Multilingual Neural Machine Translation Training
Gaurav Kumar | Philipp Koehn | Sanjeev Khudanpur

Low-resource Multilingual Neural Machine Translation (MNMT) is typically tasked with improving the translation performance on one or more language pairs with the aid of high-resource language pairs. In this paper and we propose two simple search based curricula orderings of the multilingual training data which help improve translation performance in conjunction with existing techniques such as fine-tuning. Additionally and we attempt to learn a curriculum for MNMT from scratch jointly with the training of the translation system using contextual multi-arm bandits. We show on the FLORES low-resource translation dataset that these learned curricula can provide better starting points for fine tuning and improve overall performance of the translation system.

pdf bib
Transformers for Low-Resource Languages : Is Fidir Linn !
Seamus Lankford | Haithem Alfi | Andy Way

The Transformer model is the state-of-the-art in Machine Translation. However and in general and neural translation models often under perform on language pairs with insufficient training data. As a consequence and relatively few experiments have been carried out using this architecture on low-resource language pairs. In this study and hyperparameter optimization of Transformer models in translating the low-resource English-Irish language pair is evaluated. We demonstrate that choosing appropriate parameters leads to considerable performance improvements. Most importantly and the correct choice of subword model is shown to be the biggest driver of translation performance. SentencePiece models using both unigram and BPE approaches were appraised. Variations on model architectures included modifying the number of layers and testing various regularization techniques and evaluating the optimal number of heads for attention. A generic 55k DGT corpus and an in-domain 88k public admin corpus were used for evaluation. A Transformer optimized model demonstrated a BLEU score improvement of 7.8 points when compared with a baseline RNN model. Improvements were observed across a range of metrics and including TER and indicating a substantially reduced post editing effort for Transformer optimized models with 16k BPE subword models. Bench-marked against Google Translate and our translation engines demonstrated significant improvements. The question of whether or not Transformers can be used effectively in a low-resource setting of English-Irish translation has been addressed. Is fidir linn-yes we can.

pdf bib
The Effect of Domain and Diacritics in YorubaEnglish Neural Machine TranslationYoruba–English Neural Machine Translation
David Adelani | Dana Ruiter | Jesujoba Alabi | Damilola Adebonojo | Adesina Ayeni | Mofe Adeyemi | Ayodele Esther Awokoya | Cristina España-Bonet

Massively multilingual machine translation (MT) has shown impressive capabilities and including zero and few-shot translation between low-resource language pairs. However and these models are often evaluated on high-resource languages with the assumption that they generalize to low-resource ones. The difficulty of evaluating MT models on low-resource pairs is often due to lack of standardized evaluation datasets. In this paper and we present MENYO-20k and the first multi-domain parallel corpus with a especially curated orthography for YorubaEnglish with standardized train-test splits for benchmarking. We provide several neural MT benchmarks and compare them to the performance of popular pre-trained (massively multilingual) MT models both for the heterogeneous test set and its subdomains. Since these pre-trained models use huge amounts of data with uncertain quality and we also analyze the effect of diacritics and a major characteristic of Yoruba and in the training data. We investigate how and when this training condition affects the final quality of a translation and its understandability. Our models outperform massively multilingual models such as Google (+8.7 BLEU) and Facebook M2 M (+9.1) when translating to Yoruba and setting a high quality benchmark for future research.+8.7 BLEU) and Facebook M2M (+9.1) when translating to Yoruba and setting a high quality benchmark for future research.

pdf bib
Like Chalk and Cheese? On the Effects of Translationese in MT TrainingMT Training
Samuel Larkin | Michel Simard | Rebecca Knowles

We revisit the topic of translation direction in the data used for training neural machine translation systems and focusing on a real-world scenario with known translation direction and imbalances in translation direction : the Canadian Hansard. According to automatic metrics and we observe that using parallel data that was produced in the matching translation direction (Authentic source and translationese target) improves translation quality. In cases of data imbalance in terms of translation direction and we find that tagging of translation direction can close the performance gap. We perform a human evaluation that differs slightly from the automatic metrics and but nevertheless confirms that for this French-English dataset that is known to contain high-quality translations and authentic or tagged mixed source improves over translationese source for training.

pdf bib
Investigating Softmax Tempering for Training Neural Machine Translation Models
Raj Dabre | Atsushi Fujita

Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against the gold labels. In low-resource scenarios and NMT models tend to perform poorly because the model training quickly converges to a point where the softmax distribution computed using logits approaches the gold label distribution. Although label smoothing is a well-known solution to address this issue and we further propose to divide the logits by a temperature coefficient greater than one and forcing the softmax distribution to be smoother during training. This makes it harder for the model to quickly over-fit. In our experiments on 11 language pairs in the low-resource Asian Language Treebank dataset and we observed significant improvements in translation quality. Our analysis focuses on finding the right balance of label smoothing and softmax tempering which indicates that they are orthogonal methods. Finally and a study of softmax entropies and gradients reveal the impact of our method on the internal behavior of our NMT models.

pdf bib
Scrambled Translation Problem : A Problem of Denoising UNMTUNMT
Tamali Banerjee | Rudra V Murthy | Pushpak Bhattacharya

In this paper and we identify an interesting kind of error in the output of Unsupervised Neural Machine Translation (UNMT) systems like Undreamt1. We refer to this error type as Scrambled Translation problem. We observe that UNMT models which use word shuffle noise (as in case of Undreamt) can generate correct words and but fail to stitch them together to form phrases. As a result and words of the translated sentence look scrambled and resulting in decreased BLEU. We hypothesise that the reason behind scrambled translation problem is’ shuffling noise’ which is introduced in every input sentence as a denoising strategy. To test our hypothesis and we experiment by retraining UNMT models with a simple retraining strategy. We stop the training of the Denoising UNMT model after a pre-decided number of iterations and resume the training for the remaining iterations- which number is also pre-decided- using original sentence as input without adding any noise. Our proposed solution achieves significant performance improvement UNMT models that train conventionally. We demonstrate these performance gains on four language pairs and viz. and English-French and English-German and English-Spanish and Hindi-Punjabi. Our qualitative and quantitative analysis shows that the retraining strategy helps achieve better alignment as observed by attention heatmap and better phrasal translation and leading to statistically significant improvement in BLEU scores.

pdf bib
On nature and causes of observed MT errorsMT errors
Maja Popovic

This work describes analysis of nature and causes of MT errors observed by different evaluators under guidance of different quality criteria : adequacy and comprehension and and a not specified generic mixture of adequacy and fluency. We report results for three language pairs and two domains and eleven MT systems. Our findings indicate that and despite the fact that some of the identified phenomena depend on domain and/or language and the following set of phenomena can be considered as generally challenging for modern MT systems : rephrasing groups of words and translation of ambiguous source words and translating noun phrases and and mistranslations. Furthermore and we show that the quality criterion also has impact on error perception. Our findings indicate that comprehension and adequacy can be assessed simultaneously by different evaluators and so that comprehension and as an important quality criterion and can be included more often in human evaluations.

pdf bib
Studying The Impact Of Document-level Context On Simultaneous Neural Machine Translation
Raj Dabre | Aizhan Imankulova | Masahiro Kaneko

In a real-time simultaneous translation setting and neural machine translation (NMT) models start generating target language tokens from incomplete source language sentences and making them harder to translate and leading to poor translation quality. Previous research has shown that document-level NMT and comprising of sentence and context encoders and a decoder and leverages context from neighboring sentences and helps improve translation quality. In simultaneous translation settings and the context from previous sentences should be even more critical. To this end and in this paper and we propose wait-k simultaneous document-level NMT where we keep the context encoder as it is and replace the source sentence encoder and target language decoder with their wait-k equivalents. We experiment with low and high resource settings using the ALT and OpenSubtitles2018 corpora and where we observe minor improvements in translation quality. We then perform an analysis of the translations obtained using our models by focusing on sentences that should benefit from the context where we found out that the model does and in fact and benefit from context but is unable to effectively leverage it and especially in a low-resource setting. This shows that there is a need for further innovation in the way useful context is identified and leveraged.

pdf bib
Attainable Text-to-Text Machine Translation vs. Translation : Issues Beyond Linguistic Processing
Atsushi Fujita

Existing approaches for machine translation (MT) mostly translate given text in the source language into the target language and without explicitly referring to information indispensable for producing proper translation. This includes not only information in other textual elements and modalities than texts in the same document and but also extra-document and non-linguistic information and such as norms and skopos. To design better translation production work-flows and we need to distinguish translation issues that could be resolved by the existing text-to-text approaches and those beyond them. To this end and we conducted an analytic assessment of MT outputs and taking an English-to-Japanese news translation task as a case study. First and examples of translation issues and their revisions were collected by a two-stage post-editing (PE) method : performing minimal PE to obtain translation attainable based on the given textual information and further performing full PE to obtain truly acceptable translation referring to any information if necessary. Then and the collected revision examples were manually analyzed. We revealed dominant issues and information indispensable for resolving them and such as fine-grained style specifications and terminology and domain-specific knowledge and and reference documents and delineating a clear distinction between translation and what text-to-text MT can ultimately attain.

pdf bib
Modeling Target-side Inflection in Placeholder Translation
Ryokan Ri | Toshiaki Nakazawa | Yoshimasa Tsuruoka

Placeholder translation systems enable the users to specify how a specific phrase is translated in the output sentence. The system is trained to output special placeholder tokens and the user-specified term is injected into the output through the context-free replacement of the placeholder token. However and this approach could result in ungrammatical sentences because it is often the case that the specified term needs to be inflected according to the context of the output and which is unknown before the translation. To address this problem and we propose a novel method of placeholder translation that can inflect specified terms according to the grammatical construction of the output sentence. We extend the seq2seq architecture with a character-level decoder that takes the lemma of a user-specified term and the words generated from the word-level decoder to output a correct inflected form of the lemma. We evaluate our approach with a Japanese-to-English translation task in the scientific writing domain and and show our model can incorporate specified terms in a correct form more successfully than other comparable models.

pdf bib
Neural Machine Translation with Inflected Lexicon
Artur Nowakowski | Krzysztof Jassem

The paper presents experiments in neural machine translation with lexical constraints into a morphologically rich language. In particular and we introduce a method and based on constrained decoding and which handles the inflected forms of lexical entries and does not require any modification to the training data or model architecture. To evaluate its effectiveness and we carry out experiments in two different scenarios : general and domain-specific. We compare our method with baseline translation and i.e. translation without lexical constraints and in terms of translation speed and translation quality. To evaluate how well the method handles the constraints and we propose new evaluation metrics which take into account the presence and placement and duplication and inflectional correctness of lexical terms in the output sentence.

up

pdf (full)
bib (full)
Proceedings of the 1st Workshop on Automatic Spoken Language Translation in Real-World Settings (ASLTRW)

pdf bib
Proceedings of the 1st Workshop on Automatic Spoken Language Translation in Real-World Settings (ASLTRW)
Marco Turchi | Claudio Fantinuoli

pdf bib
Seed Words Based Data Selection for Language Model Adaptation
Roberto Gretter | Marco Matassoni | Daniele Falavigna

We address the problem of language model customization in applications where the ASR component needs to manage domain-specific terminology ; although current state-of-the-art speech recognition technology provides excellent results for generic domains, the adaptation to specialized dictionaries or glossaries is still an open issue. In this work we present an approach for automatically selecting sentences, from a text corpus, that match, both semantically and morphologically, a glossary of terms (words or composite words) furnished by the user. The final goal is to rapidly adapt the language model of an hybrid ASR system with a limited amount of in-domain text data in order to successfully cope with the linguistic domain at hand ; the vocabulary of the baseline model is expanded and tailored, reducing the resulting OOV rate. Data selection strategies based on shallow morphological seeds and semantic similarity via word2vec are introduced and discussed ; the experimental setting consists in a simultaneous interpreting scenario, where ASRs in three languages are designed to recognize the domainspecific terms (i.e. dentistry). Results using different metrics (OOV rate, WER, precision and recall) show the effectiveness of the proposed techniques.

up

pdf (full)
bib (full)
Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)

pdf bib
Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)
Dimitar Shterionov

pdf bib
Sign and Search : Sign Search Functionality for Sign Language Lexica
Manolis Fragkiadakis | Peter van der Putten

Sign language lexica are a useful resource for researchers and people learning sign languages. Current implementations allow a user to search a sign either by its gloss or by selecting its primary features such as handshape and location. This study focuses on exploring a reverse search functionality where a user can sign a query sign in front of a webcam and retrieve a set of matching signs. By extracting different body joints combinations (upper body, dominant hand’s arm and wrist) using the pose estimation framework OpenPose, we compare four techniques (PCA, UMAP, DTW and Euclidean distance) as distance metrics between 20 query signs, each performed by eight participants on a 1200 sign lexicon. The results show that UMAP and DTW can predict a matching sign with an 80 % and 71 % accuracy respectively at the top-20 retrieved signs using the movement of the dominant hand arm. Using DTW and adding more sign instances from other participants in the lexicon, the accuracy can be raised to 90 % at the top-10 ranking. Our results suggest that our methodology can be used with no training in any sign language lexicon regardless of its size.

pdf bib
The Myth of Signing Avatars
John C. McDonald | Rosalee Wolfe | Eleni Efthimiou | Evita Fontinea | Frankie Picron | Davy Van Landuyt | Tina Sioen | Annelies Braffort | Michael Filhol | Sarah Ebling | Thomas Hanke | Verena Krausneker

Development of automatic translation between signed and spoken languages has lagged behind the development of automatic translation between spoken languages, but it is a common misperception that extending machine translation techniques to include signed languages should be a straightforward process. A contributing factor is the lack of an acceptable method for displaying sign language apart from interpreters on video. This position paper examines the challenges of displaying a signed language as a target in automatic translation, analyses the underlying causes and suggests strategies to develop display technologies that are acceptable to sign language communities.

pdf bib
AVASAG : A German Sign Language Translation System for Public Services (short paper)AVASAG: A German Sign Language Translation System for Public Services (short paper)
Fabrizio Nunnari | Judith Bauerdiek | Lucas Bernhard | Cristina España-Bonet | Corinna Jäger | Amelie Unger | Kristoffer Waldow | Sonja Wecker | Elisabeth André | Stephan Busemann | Christian Dold | Arnulph Fuhrmann | Patrick Gebhard | Yasser Hamidullah | Marcel Hauck | Yvonne Kossel | Martin Misiak | Dieter Wallach | Alexander Stricker

This paper presents an overview of AVASAG ; an ongoing applied-research project developing a text-to-sign-language translation system for public services. We describe the scientific innovation points (geometry-based SL-description, 3D animation and video corpus, simplified annotation scheme, motion capture strategy) and the overall translation pipeline.

pdf bib
Approaching Sign Language Gloss Translation as a Low-Resource Machine Translation Task
Xuan Zhang | Kevin Duh

A cascaded Sign Language Translation system first maps sign videos to gloss annotations and then translates glosses into a spoken languages. This work focuses on the second-stage gloss translation component, which is challenging due to the scarcity of publicly available parallel data. We approach gloss translation as a low-resource machine translation task and investigate two popular methods for improving translation quality : hyperparameter search and backtranslation. We discuss the potentials and pitfalls of these methods based on experiments on the RWTH-PHOENIX-Weather 2014 T dataset.

pdf bib
Automatic generation of a 3D sign language avatar on AR glasses given 2D videos of human signersD sign language avatar on AR glasses given 2D videos of human signers
Lan Thao Nguyen | Florian Schicktanz | Aeneas Stankowski | Eleftherios Avramidis

In this paper we present a prototypical implementation of a pipeline that allows the automatic generation of a German Sign Language avatar from 2D video material. The presentation is accompanied by the source code. We record human pose movements during signing with computer vision models. The joint coordinates of hands and arms are imported as landmarks to control the skeleton of our avatar. From the anatomically independent landmarks, we create another skeleton based on the avatar’s skeletal bone architecture to calculate the bone rotation data. This data is then used to control our human 3D avatar. The avatar is displayed on AR glasses and can be placed virtually in the room, in a way that it can be perceived simultaneously to the verbal speaker. In further work it is aimed to be enhanced with speech recognition and machine translation methods for serving as a sign language interpreter. The prototype has been shown to people of the deaf and hard-of-hearing community for assessing its comprehensibility. Problems emerged with the transferred hand rotations, hand gestures were hard to recognize on the avatar due to deformations like twisted finger meshes.

pdf bib
Defining meaningful units. Challenges in sign segmentation and segment-meaning mapping (short paper)
Mirella De Sisto | Dimitar Shterionov | Irene Murtagh | Myriam Vermeerbergen | Lorraine Leeson

This paper addresses the tasks of sign segmentation and segment-meaning mapping in the context of sign language (SL) recognition. It aims to give an overview of the linguistic properties of SL, such as coarticulation and simultaneity, which make these tasks complex. A better understanding of SL structure is the necessary ground for the design and development of SL recognition and segmentation methodologies, which are fundamental for machine translation of these languages. Based on this preliminary exploration, a proposal for mapping segments to meaning in the form of an agglomerate of lexical and non-lexical information is introduced.

up

pdf (full)
bib (full)
Proceedings of Machine Translation Summit XVIII: Users and Providers Track

pdf bib
Proceedings of Machine Translation Summit XVIII: Users and Providers Track
Janice Campbell | Ben Huyck | Stephen Larocca | Jay Marciano | Konstantin Savenkov | Alex Yanishevsky

bib
Roundtable: Digital Marketing Globalization at NetApp: A Case Study of Digital Transformation utilizing Neural Machine Translation
Edith Bendermacher

bib
Roundtable: Neural Machine Translation at Ford Motor Company
Nestor Rychtyckyj

bib
Roundtable: Salesforce NMT System: A Year Later
Raffaella Buschiazzo

bib
Roundtable: Autodesk: Neural Machine Translation – Localization and beyond
Emanuele Dias

bib
From Research to Production: Fine-Grained Analysis of Terminology Integration
Toms Bergmanis | Mārcis Pinnis | Paula Reichenberg

Dynamic terminology integration in neural machine translation (NMT) is a sought-after feature of computer-aided translation tools among language service providers and small to medium businesses. Despite the recent surge in research on terminology integration in NMT, it still is seldom or inadequately supported in commercial machine translation solutions. In this presentation, we will share our experience of developing and deploying terminology integration capabilities for NMT systems in production. We will look at the three core tasks of terminology integration: terminology management, terminology identification, and translation with terminology. This talk will be insightful for NMT system developers, translators, terminologists, and anyone interested in translation projects.

bib
A Review for Large Volumes of Post-edited Data
Silvio Picinini

Interested in being more confident about the quality of your post-edited data? This is a session to learn how to create a Longitudinal Review that looks at specific aspects of quality in a systematic way, for the entire content and not just for a sample. Are you a project manager for a multilingual project? The Longitudinal Review can give insights to help project management, even if you are not a speaker of the target language. And it can help you detect issues that a Sample Review may not detect. Please come learn more about this new way to look at review.

bib
Accelerated Human NMT Evaluation Approaches for NMT Workflow Integration
James Phillips

Attendees to this session will get a clear view into how neural machine translation is leveraged in a large-scale real-life scenario to make substantial cost savings in comparison to conventional approaches without compromising quality. This will include an overview of how quality is measured, when and why quality estimation is applied, what preparations are required to do so, and what attempts are made to minimize the amount of human effort involved. It will also be outlined as to what worked well and what pitfalls are to be avoided to give pointers to others who may be considering similar strategies.

bib
MT Human Evaluation – Insights & Approaches
Paula Manzur

This session is designed to help companies and people in the business of translation evaluate MT output and to show how human translator feedback can be tweaked to make the process more objective and accurate. You will hear recommendations, insights, and takeaways on how to improve the procedure for human evaluation. When this is achieved, we can understand if the human eval study and machine metric result coheres. And we can think about what the future of translators looks like – the final “human touch” and automated MT review.”

bib
A Rising Tide Lifts All Boats? Quality Correlation between Human Translation and Machine Assisted Translation
Evelyn Yang Garland | Rony Gao

Does the human who produces the best translation without Machine Translation (MT) also produce the best translation with the assistance of MT? Our empirical study has found a strong correlation between the quality of pure human translation (HT) and that of machine-assisted translation (MAT) produced by the same translator (Pearson correlation coefficient 0.85, p=0.007). Data from the study also indicates a more concentrated distribution of the MAT quality scores than that of the HT scores. Additional insights will also be discussed during the presentation. This study has two prominent features: the participation of professional translators (mostly ATA members, English-into-Chinese) as subjects, and the rigorous quality evaluation by multiple professional translators (all ATA certified) using ATA’s time-tested certification exam grading metrics. Despite a major limitation in sample size, our findings provide a strong indication of correlation between HT and MAT quality, adding to the body of evidence in support of further studies on larger scales.

bib
Bad to the Bone: Predicting the Impact of Source on MT
Alex Yanishevsky

It’s a well-known truism that poorly written source has a profound negative effect on the quality of machine translation, drastically reduces the productivity of post-editors and impacts turnaround times. But what is bad and how bad is bad? Conversely, what are the features emblematic of good content and how good is good? The impact of source on MT is crucial since a lot of content is written by non-native authors, created by technical specialists for a non-technical audience and may not adhere to brand tone and voice. AI can be employed to identify these errors and predict ‘at-risk’ content prior to localization in a multitude of languages. The presentation will show how source files and even individual sentences within those source files can be analyzed for markers of complexity and readability and thus are more likely to cause mistranslations and omissions for machine translation and subsequent post-editing. Potential solutions will be explored such as rewriting the source to be in line with acceptable threshold criteria for each product and/or domain, re-routing to other machine translation engines better suited for the task at hand and building AI-based predictive models.

bib
Using Raw MT to make essential information available for a diverse range of potential customers
Sabine Peng

This presentation will share how we use raw machine translation to reach more potential customers. The attendees will learn about the raw machine strategies and workflow, how to select languages and products through data analysis, how to evaluate the overall quality of documentation with raw machine translation. The attendees will also learn about the direction we are going, that is, collecting user feedback and optimizing raw machine translation, so to build a complete and sustainable closed loop.

bib
A Common Machine Translation Post-Editing Training Protocol by GALA
Viveta Gene | Lucía Guerrero

bib
Preserving high MT quality for content with inline tags
Konstantin Savenkov | Grigory Sapunov | Pavel Stepachev

Attendees will learn about how we use machine translation to provide targeted, high MT quality for content with inline tags. We offer a new and innovative approach to inserting tags into the translated text in a way that reliably preserves their quality. This process can achieve better MT quality and lower costs, as it is MT-independent, and can be used for all languages, MT engines, and use cases.

bib
Early-stage development of the SignON application and open framework – challenges and opportunities
Dimitar Shterionov | John J O’Flaherty | Edward Keane | Connor O’Reilly | Marcello Paolo Scipioni | Marco Giovanelli | Matteo Villa

SignON is an EU Horizon 2020 Research and Innovation project, that is developing a smartphone application and an open framework to facilitate translation between different European sign, spoken and text languages. The framework will incorporate state of the art sign language recognition and presentation, speech processing technologies and, in its core, multi-modal, cross-language machine translation. The framework, dedicated to the computationally heavy tasks and distributed on the cloud powers the application – a lightweight app running on a standard mobile device. The application and framework are being researched, designed and developed through a co-creation user-centric approach with the European deaf and hard of hearing communities. In this session, the speakers will detail their progress, challenges and lessons learned in the early-stage development of the application and framework. They will also present their Agile DevOps approach and the next steps in the evolution of the SignON project.

bib
Deploying MT Quality Estimation on a large scale: Lessons learned and open questions
Aleš Tamchyna

This talk will focus on Memsource’s experience implementing MT Quality Estimation on a large scale within a translation management system. We will cover the whole development journey: from our early experimentation and the challenges we faced adapting academic models for a real world setting, all the way through to the practical implementation. Since the launch of this feature, we’ve accumulated a significant amount of experience and feedback, which has informed our subsequent development. Lastly we will discuss several open questions regarding the future role of quality estimation in translation.

bib
Neural Translation for European Union (NTEU)
Mercedes García-Martínez | Laurent Bié | Aleix Cerdà | Amando Estela | Manuel Herranz | Rihards Krišlauks | Maite Melero | Tony O’Dowd | Sinead O’Gorman | Marcis Pinnis | Artūrs Stafanovič | Riccardo Superbo | Artūrs Vasiļevskis

The Neural Translation for the European Union (NTEU) engine farm enables direct machine translation for all 24 official languages of the European Union without the necessity to use a high-resourced language as a pivot. This amounts to a total of 552 translation engines for all combinations of the 24 languages. We have collected parallel data for all the language combinations publickly shared in elrc-share.eu. The translation engines have been customized to domain,for the use of the European public administrations. The delivered engines will be published in the European Language Grid. In addition to the usual automatic metrics, all the engines have been evaluated by humans based on the direct assessment methodology. For this purpose, we built an open-source platform called MTET The evaluation shows that most of the engines reach high quality and get better scores compared to an external machine translation service in a blind evaluation setup.

bib
A Data-Centric Approach to Real-World Custom NMT for Arabic
Rebecca Jonsson | Ruba Jaikat | Abdallah Nasir | Nour Al-Khdour | Sara Alisis

In this presentation, we will present our approach to taking Custom NMT to the next level by building tailor-made NMT to fit the needs of businesses seeking to scale in the Arabic-speaking world. In close collaboration with customers in the MENA region and with a deep understanding of their data, we work on building a variety of NMT models that accommodate to the unique challenges of the Arabic language. This session will provide insights into the challenges of acquiring, analyzing, and processing customer data in various sectors, as well as insights into how to best make use of this data to build high-quality Custom NMT models in English-Arabic. Feedback from usage of these models in production will be provided. Furthermore, we will show how to use our translation management system to make the most of the custom NMT, by leveraging the models, fine-tuning and continuing to improve them over time.

bib
Building MT systems in low resourced languages for Public Sector users in Croatia, Iceland, Ireland, and Norway
Róisín Moran | Carla Para Escartín | Akshai Ramesh | Páraic Sheridan | Jane Dunne | Federico Gaspari | Sheila Castilho | Natalia Resende | Andy Way

When developing Machine Translation engines, low resourced language pairs tend to be in a disadvantaged position: less available data means that developing robust MT models can be more challenging.The EU-funded PRINCIPLE project aims at overcoming this challenge for four low resourced European languages: Norwegian, Croatian, Irish and Icelandic. This presentation will give an overview of the project, with a focus on the set of Public Sector users and their use cases for which we have developed MT solutions.We will discuss the range of language resources that have been gathered through contributions from public sector collaborators, and present the extensive evaluations that have been undertaken, including significant user evaluation of MT systems across all of the public sector participants in each of the four countries involved.

bib
Using speech technology in the translation process workflow in international organizations: A quantitative and qualitative study
Pierrette Bouillon | Jeevanthi Liyanapathirana

In international organizations, the growing demand for translations has increased the need for post-editing. Different studies show that automatic speech recognition systems have the potential to increase the productivity of the translation process as well as the quality. In this talk, we will explore the possibilities of using speech in the translation process by conducting a post-editing experiment with three professional translators in an international organization. Our experiment consisted of comparing three translation methods: speaking the translation with MT as an inspiration (RESpeaking), post-editing the MT suggestions by typing (PE), and editing the MT suggestion using speech (SPE). BLEU and HTER scores were used to compare the three methods. Our study shows that translators did more edits under condition RES, whereas in SPE, the resulting translations were closer to the reference according to the BLEU score and required less edits. Time taken to translate was the least in SPE followed by PE, RES methods and the translators preferred using speech to typing.These results show the potential of speech when it is coupled with post-editing.To the best of our knowledge, this is the first quantitative study conducted on using post-editing and speech together in large scale international organizations.

bib
cushLEPOR uses LABSE distilled knowledge to improve correlation with human translation evaluations
Gleb Erofeev | Irina Sorokina | Lifeng Han | Serge Gladkoff

Automatic MT evaluation metrics are indispensable for MT research. Augmented metrics such as hLEPOR include broader evaluation factors (recall and position difference penalty) in addition to the factors used in BLEU (sentence length, precision), and demonstrated higher accuracy. However, the obstacles preventing the wide use of hLEPOR were the lack of easy portable Python package and empirical weighting parameters that were tuned by manual work. This project addresses the above issues by offering a Python implementation of hLEPOR and automatic tuning of the parameters. We use existing translation memories (TM) as reference set and distillation modeling with LaBSE (Language-Agnostic BERT Sentence Embedding) to calibrate parameters for custom hLEPOR (cushLEPOR). cushLEPOR maximizes the correlation between hLEPOR and the distilling model similarity score towards reference. It can be used quickly and precisely to evaluate MT output from different engines, without need of manual weight tuning for optimization. In this session you will learn how to tune hLEPOR to obtain automatic custom-tuned cushLEPOR metric far more precise than BLEU. The method does not require costly human evaluations, existing TM is taken as a reference translation set, and cushLEPOR is created to select the best MT engine for the reference data-set.

bib
A Synthesis of Human and Machine: Correlating “New” Automatic Evaluation Metrics with Human Assessments
Mara Nunziatini | Andrea Alfieri

The session will provide an overview of some of the new Machine Translation metrics available on the market, analyze if and how these new metrics correlate at a segment level to the results of Adequacy and Fluency Human Assessments, and how they compare against TER scores and Levenshtein Distance – two of our currently preferred metrics – as well as against each of the other. The information in this session will help to get a better understanding of their strengths and weaknesses and make informed decisions when it comes to forecasting MT production.

bib
Lab vs. Production: Two Approaches to Productivity Evaluation for MTPE for LSP
Elena Murgolo

In the paper we propose both kind of tests as viable post-editing productivity evaluation solutions as they both deliver a clear overview of the difference in speed between HT and PE of the translators involved. The decision on whether to use the first approach or the second can be based on a number of factors, such as: availability of actual orders in the domain and language combination to be tested; time; availability of Post-editors in the domain and in the language combination to be tested. The aim of this paper will be to show that both methodologies can be useful in different settings for a preliminary evaluation of possible productivity gain with MTPE.