Rico Sennrich


2021

pdf bib
Exploring the Importance of Source Text in Automatic Post-Editing for Context-Aware Machine Translation
Chaojun Wang | Christian Hardmeier | Rico Sennrich
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Accurate translation requires document-level information, which is ignored by sentence-level machine translation. Recent work has demonstrated that document-level consistency can be improved with automatic post-editing (APE) using only target-language (TL) information. We study an extended APE model that additionally integrates source context. A human evaluation of fluency and adequacy in EnglishRussian translation reveals that the model with access to source context significantly outperforms monolingual APE in terms of adequacy, an effect largely ignored by automatic evaluation metrics. Our results show that TL-only modelling increases fluency without improving adequacy, demonstrating the need for conditioning on source text for automatic post-editing. They also highlight blind spots in automatic methods for targeted evaluation and demonstrate the need for human assessment to evaluate document-level translation quality reliably.

pdf bib
Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine TranslationBayes Risk Decoding in Neural Machine Translation
Mathias Müller | Rico Sennrich
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Neural Machine Translation (NMT) currently exhibits biases such as producing translations that are too short and overgenerating frequent words, and shows poor robustness to copy noise in training data or domain shift. Recent work has tied these shortcomings to beam search the de facto standard inference algorithm in NMT and Eikema & Aziz (2020) propose to use Minimum Bayes Risk (MBR) decoding on unbiased samples instead. In this paper, we empirically investigate the properties of MBR decoding on a number of previously reported biases and failure cases of beam search. We find that MBR still exhibits a length and token frequency bias, owing to the MT metrics used as utility functions, but that MBR also increases robustness against copy noise in the training data and domain shift.

pdf bib
Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation
Elena Voita | Rico Sennrich | Ivan Titov
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context : the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We argue that this relative contribution can be evaluated by adopting a variant of Layerwise Relevance Propagation (LRP). Its underlying ‘conservation principle’ makes relevance propagation unique : differently from other methods, it evaluates not an abstract quantity reflecting token importance, but the proportion of each token’s influence. We extend LRP to the Transformer and conduct an analysis of NMT models which explicitly evaluates the source and target relative contributions to the generation process. We analyze changes in these contributions when conditioning on different types of prefixes, when varying the training objective or the amount of training data, and during the training process. We find that models trained with more data tend to rely on source information more and to have more sharp token contributions ; the training process is non-monotonic with several stages of different nature.

pdf bib
Beyond Sentence-Level End-to-End Speech Translation : Context Helps
Biao Zhang | Ivan Titov | Barry Haddow | Rico Sennrich
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Document-level contextual information has shown benefits to text-based machine translation, but whether and how context helps end-to-end (E2E) speech translation (ST) is still under-studied. We fill this gap through extensive experiments using a simple concatenation-based context-aware ST model, paired with adaptive feature selection on speech encodings for computational efficiency. We investigate several decoding approaches, and introduce in-model ensemble decoding which jointly performs document- and sentence-level translation using the same model. Our results on the MuST-C benchmark with Transformer demonstrate the effectiveness of context to E2E ST. Compared to sentence-level ST, context-aware ST obtains better translation quality (+0.18-2.61 BLEU), improves pronoun and homophone translation, shows better robustness to (artificial) audio segmentation errors, and reduces latency and flicker to deliver higher quality for simultaneous translation.

pdf bib
Language Modeling, Lexical Translation, Reordering : The Training Process of NMT through the Lens of Classical SMTNMT through the Lens of Classical SMT
Elena Voita | Rico Sennrich | Ivan Titov
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Differently from the traditional statistical MT that decomposes the translation task into distinct separately learned components, neural machine translation uses a single neural network to model the entire translation process. Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training, and how this mirrors the different models in traditional SMT. In this work, we look at the competences related to three core SMT components and find that during training, NMT first focuses on learning target-side language modeling, then improves translation quality approaching word-by-word translation, and finally learns more complicated reordering patterns. We show that this behavior holds for several models and language pairs. Additionally, we explain how such an understanding of the training process can be useful in practice and, as an example, show how it can be used to improve vanilla non-autoregressive neural machine translation by guiding teacher model selection.

pdf bib
Wino-X : Multilingual Winograd Schemas for Commonsense Reasoning and Coreference ResolutionX: Multilingual Winograd Schemas for Commonsense Reasoning and Coreference Resolution
Denis Emelin | Rico Sennrich
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Winograd schemas are a well-established tool for evaluating coreference resolution (CoR) and commonsense reasoning (CSR) capabilities of computational models. So far, schemas remained largely confined to English, limiting their utility in multilingual settings. This work presents Wino-X, a parallel dataset of German, French, and Russian schemas, aligned with their English counterparts. We use this resource to investigate whether neural machine translation (NMT) models can perform CoR that requires commonsense knowledge and whether multilingual language models (MLLMs) are capable of CSR across multiple languages. Our findings show Wino-X to be exceptionally challenging for NMT systems that are prone to undesirable biases and unable to detect disambiguating information. We quantify biases using established statistical methods and define ways to address both of these issues. We furthermore present evidence of active cross-lingual knowledge transfer in MLLMs, whereby fine-tuning models on English schemas yields CSR improvements in other languages.

pdf bib
On the Limits of Minimal Pairs in Contrastive Evaluation
Jannis Vamvas | Rico Sennrich
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Minimal sentence pairs are frequently used to analyze the behavior of language models. It is often assumed that model behavior on contrastive pairs is predictive of model behavior at large. We argue that two conditions are necessary for this assumption to hold : First, a tested hypothesis should be well-motivated, since experiments show that contrastive evaluation can lead to false positives. Secondly, test data should be chosen such as to minimize distributional discrepancy between evaluation time and deployment time. For a good approximation of deployment-time decoding, we recommend that minimal pairs are created based on machine-generated text, as opposed to human-written references. We present a contrastive evaluation suite for EnglishGerman MT that implements this recommendation.

2020

pdf bib
Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks
Denis Emelin | Ivan Titov | Rico Sennrich
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Word sense disambiguation is a well-known source of translation errors in NMT. We posit that some of the incorrect disambiguation choices are due to models’ over-reliance on dataset artifacts found in training data, specifically superficial word co-occurrences, rather than a deeper understanding of the source text. We introduce a method for the prediction of disambiguation errors based on statistical data properties, demonstrating its effectiveness across several domains and model types. Moreover, we develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors to further probe the robustness of translation models. Our findings indicate that disambiguation robustness varies substantially between domains and that different models trained on the same data are vulnerable to different attacks.

pdf bib
In Neural Machine Translation, What Does Transfer Learning Transfer?
Alham Fikri Aji | Nikolay Bogoychev | Kenneth Heafield | Rico Sennrich
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Transfer learning improves quality for low-resource machine translation, but it is unclear what exactly it transfers. We perform several ablation studies that limit information transfer, then measure the quality impact across three language pairs to gain a black-box understanding of transfer learning. Word embeddings play an important role in transfer learning, particularly if they are properly aligned. Although transfer learning can be performed without embeddings, results are sub-optimal. In contrast, transferring only the embeddings but nothing else yields catastrophic results. We then investigate diagonal alignments with auto-encoders over real languages and randomly generated sequences, finding even randomly generated sequences as parents yield noticeable but smaller gains. Finally, transfer learning can eliminate the need for a warm-up phase when training transformer models in high resource language pairs.

pdf bib
Adaptive Feature Selection for End-to-End Speech Translation
Biao Zhang | Ivan Titov | Barry Haddow | Rico Sennrich
Findings of the Association for Computational Linguistics: EMNLP 2020

Information in speech signals is not evenly distributed, making it an additional challenge for end-to-end (E2E) speech translation (ST) to learn to focus on informative features. In this paper, we propose adaptive feature selection (AFS) for encoder-decoder based E2E ST. We first pre-train an ASR encoder and apply AFS to dynamically estimate the importance of each encoded speech feature to ASR. A ST encoder, stacked on top of the ASR encoder, then receives the filtered features from the (frozen) ASR encoder. We take L0DROP (Zhang et al., 2020) as the backbone for AFS, and adapt it to sparsify speech features with respect to both temporal and feature dimensions. Results on LibriSpeech EnFr and MuST-C benchmarks show that AFS facilitates learning of ST by pruning out ~84 % temporal features, yielding an average translation gain of ~1.3-1.6 BLEU and a decoding speedup of ~1.4x. In particular, AFS reduces the performance gap compared to the cascade baseline, and outperforms it on LibriSpeech En-Fr with a BLEU score of 18.56 (without data augmentation).

pdf bib
Fast Interleaved Bidirectional Sequence Generation
Biao Zhang | Ivan Titov | Rico Sennrich
Proceedings of the Fifth Conference on Machine Translation

Independence assumptions during sequence generation can speed up inference, but parallel generation of highly inter-dependent tokens comes at a cost in quality. Instead of assuming independence between neighbouring tokens (semi-autoregressive decoding, SA), we take inspiration from bidirectional sequence generation and introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously. We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder by simply interleaving the two directions and adapting the word positions and selfattention masks. Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer, and on five machine translation tasks and two document summarization tasks, achieves a decoding speedup of ~2x compared to autoregressive decoding with comparable quality. Notably, it outperforms left-to-right SA because the independence assumptions in IBDecoder are more felicitous. To achieve even higher speedups, we explore hybrid models where we either simultaneously predict multiple neighbouring tokens per direction, or perform multi-directional decoding by partitioning the target sequence. These methods achieve speedups to 4x11x across different tasks at the cost of 1 BLEU or 0.5 ROUGE (on average)

pdf bib
Removing European Language Barriers with Innovative Machine Translation TechnologyEuropean Language Barriers with Innovative Machine Translation Technology
Dario Franceschini | Chiara Canton | Ivan Simonini | Armin Schweinfurth | Adelheid Glott | Sebastian Stüker | Thai-Son Nguyen | Felix Schneider | Thanh-Le Ha | Alex Waibel | Barry Haddow | Philip Williams | Rico Sennrich | Ondřej Bojar | Sangeet Sagar | Dominik Macháček | Otakar Smrž
Proceedings of the 1st International Workshop on Language Technology Platforms

This paper presents our progress towards deploying a versatile communication platform in the task of highly multilingual live speech translation for conferences and remote meetings live subtitling. The platform has been designed with a focus on very low latency and high flexibility while allowing research prototypes of speech and text processing tools to be easily connected, regardless of where they physically run. We outline our architecture solution and also briefly compare it with the ELG platform. Technical details are provided on the most important components and we summarize the test deployment events we ran so far.

2019

pdf bib
The Bottom-up Evolution of Representations in the Transformer : A Study with Machine Translation and Language Modeling Objectives
Elena Voita | Rico Sennrich | Ivan Titov
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We seek to understand how the representations of individual tokens and the structure of the learned feature space evolve between layers in deep neural networks under different learning objectives. We chose the Transformers for our analysis as they have been shown effective with various tasks, including machine translation (MT), standard left-to-right language models (LM) and masked language modeling (MLM). Previous work used black-box probing tasks to show that the representations learned by the Transformer differ significantly depending on the objective. In this work, we use canonical correlation analysis and mutual information estimators to study how information flows across Transformer layers and observe that the choice of the objective determines this process. For example, as you go from bottom to top layers, information about the past in left-to-right language models gets vanished and predictions about the future get formed. In contrast, for MLM, representations initially acquire information about the context around the token, partially forgetting the token identity and producing a more generalized token representation. The token identity then gets recreated at the top MLM layers.

pdf bib
Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts
Denis Emelin | Ivan Titov | Rico Sennrich
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

The transformer is a state-of-the-art neural translation model that uses attention to iteratively refine lexical representations with information drawn from the surrounding context. Lexical features are fed into the first layer and propagated through a deep network of hidden layers. We argue that the need to represent and propagate lexical features in each layer limits the model’s capacity for learning and representing other information relevant to the task. To alleviate this bottleneck, we introduce gated shortcut connections between the embedding layer and each subsequent layer within the encoder and decoder. This enables the model to access relevant lexical content dynamically, without expending limited resources on storing it within intermediate states. We show that the proposed modification yields consistent improvements over a baseline transformer on standard WMT translation tasks in 5 translation directions (0.9 BLEU on average) and reduces the amount of lexical information passed along the hidden layers. We furthermore evaluate different ways to integrate lexical connections into the transformer architecture and present ablation experiments exploring the effect of proposed shortcuts on model behavior.

pdf bib
Proceedings of Machine Translation Summit XVII: Research Track
Mikel Forcada | Andy Way | Barry Haddow | Rico Sennrich
Proceedings of Machine Translation Summit XVII: Research Track

2018

pdf bib
Sentence Compression for Arbitrary Languages via Multilingual Pivoting
Jonathan Mallinson | Rico Sennrich | Mirella Lapata
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In this paper we advocate the use of bilingual corpora which are abundantly available for training sentence compression models. Our approach borrows much of its machinery from neural machine translation and leverages bilingual pivoting : compressions are obtained by translating a source string into a foreign language and then back-translating it into the source while controlling the translation length. Our model can be trained for any language as long as a bilingual corpus is available and performs arbitrary rewrites without access to compression specific data. We release. Moss, a new parallel Multilingual Compression dataset for English, German, and French which can be used to evaluate compression models across languages and genres.

pdf bib
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
Gongbo Tang | Mathias Müller | Annette Rios | Rico Sennrich
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed RNNs in neural machine translation. CNNs and self-attentional networks can connect distant words via shorter network paths than RNNs, and it has been speculated that this improves their ability to model long-range dependencies. However, this theoretical argument has not been tested empirically, nor have alternative explanations for their strong performance been explored in-depth. We hypothesize that the strong performance of CNNs and self-attentional networks could also be due to their ability to extract semantic features from the source text, and we evaluate RNNs, CNNs and self-attention networks on two tasks : subject-verb agreement (where capturing long-range dependencies is required) and word sense disambiguation (where semantic feature extraction is required). Our experimental results show that : 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances ; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.

pdf bib
Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation
Samuel Läubli | Rico Sennrich | Martin Volk
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT ChineseEnglish news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasise the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.

pdf bib
An Analysis of Attention Mechanisms : The Case of Word Sense Disambiguation in Neural Machine Translation
Gongbo Tang | Rico Sennrich | Joakim Nivre
Proceedings of the Third Conference on Machine Translation: Research Papers

Recent work has shown that the encoder-decoder attention mechanisms in neural machine translation (NMT) are different from the word alignment in statistical machine translation. In this paper, we focus on analyzing encoder-decoder attention mechanisms, in the case of word sense disambiguation (WSD) in NMT models. We hypothesize that attention mechanisms pay more attention to context tokens when translating ambiguous words. We explore the attention distribution patterns when translating ambiguous nouns. Counterintuitively, we find that attention mechanisms are likely to distribute more attention to the ambiguous noun itself rather than context tokens, in comparison to other nouns. We conclude that attention is not the main mechanism used by NMT models to incorporate contextual information for WSD. The experimental results suggest that NMT models learn to encode contextual information necessary for WSD in the encoder hidden states. For the attention mechanism in Transformer models, we reveal that the first few layers gradually learn to align source and target tokens and the last few layers learn to extract features from the related but unaligned context tokens.

pdf bib
A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation
Mathias Müller | Annette Rios | Elena Voita | Rico Sennrich
Proceedings of the Third Conference on Machine Translation: Research Papers

The translation of pronouns presents a special challenge to machine translation to this day, since it often requires context outside the current sentence. Recent work on models that have access to information across sentence boundaries has seen only moderate improvements in terms of automatic evaluation metrics such as BLEU. However, metrics that quantify the overall translation quality are ill-equipped to measure gains from additional context. We argue that a different kind of evaluation is needed to assess how well models translate inter-sentential phenomena such as pronouns. This paper therefore presents a test suite of contrastive translations focused specifically on the translation of pronouns. Furthermore, we perform experiments with several context-aware models. We show that, while gains in BLEU are moderate for those systems, they outperform baselines by a large margin in terms of accuracy on our contrastive test set. Our experiments also show the effectiveness of parameter tying for multi-encoder architectures.

pdf bib
The University of Edinburgh’s Submissions to the WMT18 News Translation TaskUniversity of Edinburgh’s Submissions to the WMT18 News Translation Task
Barry Haddow | Nikolay Bogoychev | Denis Emelin | Ulrich Germann | Roman Grundkiewicz | Kenneth Heafield | Antonio Valerio Miceli Barone | Rico Sennrich
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

The University of Edinburgh made submissions to all 14 language pairs in the news translation task, with strong performances in most pairs. We introduce new RNN-variant, mixed RNN / Transformer ensembles, data selection and weighting, and extensions to back-translation.

pdf bib
The Word Sense Disambiguation Test Suite at WMT18WMT18
Annette Rios | Mathias Müller | Rico Sennrich
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

We present a task to measure an MT system’s capability to translate ambiguous words with their correct sense according to the given context. The task is based on the GermanEnglish Word Sense Disambiguation (WSD) test set ContraWSD (Rios Gonzales et al., 2017), but it has been filtered to reduce noise, and the evaluation has been adapted to assess MT output directly rather than scoring existing translations. We evaluate all GermanEnglish submissions to the WMT’18 shared translation task, plus a number of submissions from previous years, and find that performance on the task has markedly improved compared to the 2016 WMT submissions (81%93 % accuracy on the WSD task). We also find that the unsupervised submissions to the task have a low WSD capability, and predominantly translate ambiguous source words with the same sense.

pdf bib
Context-Aware Neural Machine Translation Learns Anaphora Resolution
Elena Voita | Pavel Serdyukov | Rico Sennrich | Ivan Titov
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Standard machine translation systems process sentences in isolation and hence ignore extra-sentential information, even though extended context can both prevent mistakes in ambiguous cases and improve translation coherence. We introduce a context-aware neural machine translation model designed in such way that the flow of information from the extended context to the translation model can be controlled and analyzed. We experiment with an English-Russian subtitles dataset, and observe that much of what is captured by our model deals with improving pronoun translation. We measure correspondences between induced attention distributions and coreference relations and observe that the model implicitly captures anaphora. It is consistent with gains for sentences where pronouns need to be gendered in translation. Beside improvements in anaphoric cases, the model also improves in overall BLEU, both over its context-agnostic version (+0.7) and over simple concatenation of the context and source sentences (+0.6).

pdf bib
Samsung and University of Edinburgh’s System for the IWSLT 2018 Low Resource MT TaskSamsung and University of Edinburgh’s System for the IWSLT 2018 Low Resource MT Task
Philip Williams | Marcin Chochowski | Pawel Przybysz | Rico Sennrich | Barry Haddow | Alexandra Birch
Proceedings of the 15th International Conference on Spoken Language Translation

This paper describes the joint submission to the IWSLT 2018 Low Resource MT task by Samsung R&D Institute, Poland, and the University of Edinburgh. We focused on supplementing the very limited in-domain Basque-English training data with out-of-domain data, with synthetic data, and with data for other language pairs. We also experimented with a variety of model architectures and features, which included the development of extensions to the Nematus toolkit. Our submission was ultimately produced by a system combination in which we reranked translations from our strongest individual system using multiple weaker systems.

2017

pdf bib
The Samsung and University of Edinburgh’s submission to IWSLT17Samsung and University of Edinburgh’s submission to IWSLT17
Pawel Przybysz | Marcin Chochowski | Rico Sennrich | Barry Haddow | Alexandra Birch
Proceedings of the 14th International Conference on Spoken Language Translation

This paper describes the joint submission of Samsung Research and Development, Warsaw, Poland and the University of Edinburgh team to the IWSLT MT task for TED talks. We took part in two translation directions, en-de and de-en. We also participated in the en-de and de-en lectures SLT task. The models have been trained with an attentional encoder-decoder model using the BiDeep model in Nematus. We filtered the training data to reduce the problem of noisy data, and we use back-translated monolingual data for domain-adaptation. We demonstrate the effectiveness of the different techniques that we applied via ablation studies. Our submission system outperforms our baseline, and last year’s University of Edinburgh submission to IWSLT, by more than 5 BLEU.

pdf bib
Regularization techniques for fine-tuning in neural machine translation
Antonio Valerio Miceli Barone | Barry Haddow | Ulrich Germann | Rico Sennrich
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We investigate techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-of-domain dataset is adapted to a small in-domain dataset. In this scenario, overfitting is a major challenge. We investigate a number of techniques to reduce overfitting and improve transfer learning, including regularization techniques such as dropout and L2-regularization towards an out-of-domain prior. In addition, we introduce tuneout, a novel regularization technique inspired by dropout. We apply these techniques, alone and in combination, to neural machine translation, obtaining improvements on IWSLT datasets for EnglishGerman and EnglishRussian. We also investigate the amounts of in-domain training data needed for domain adaptation in NMT, and find a logarithmic relationship between the amount of training data and gain in BLEU score.

pdf bib
Image Pivoting for Learning Multilingual Multimodal Representations
Spandana Gella | Rico Sennrich | Frank Keller | Mirella Lapata
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.

pdf bib
Paraphrasing Revisited with Neural Machine Translation
Jonathan Mallinson | Rico Sennrich | Mirella Lapata
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Recognizing and generating paraphrases is an important component in many natural language processing applications. A well-established technique for automatically extracting paraphrases leverages bilingual corpora to find meaning-equivalent phrases in a single language by pivoting over a shared translation in another language. In this paper we revisit bilingual pivoting in the context of neural machine translation and present a paraphrasing model based purely on neural networks. Our model represents paraphrases in a continuous space, estimates the degree of semantic relatedness between text segments of arbitrary length, and generates candidate paraphrases for any source input. Experimental results across tasks and datasets show that neural paraphrases outperform those obtained with conventional phrase-based pivoting approaches.

pdf bib
How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation PairsMT Quality with Contrastive Translation Pairs
Rico Sennrich
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Analysing translation quality in regards to specific linguistic phenomena has historically been difficult and time-consuming. Neural machine translation has the attractive property that it can produce scores for arbitrary translations, and we propose a novel method to assess how well NMT systems model specific linguistic phenomena such as agreement over long distances, the production of novel words, and the faithful translation of polarity. The core idea is that we measure whether a reference translation is more probable under a NMT model than a contrastive translation which introduces a specific type of error. We present LingEval97, a large-scale data set of 97000 contrastive translation pairs based on the WMT English-German translation task, with errors automatically created with simple rules. We report results for a number of systems, and find that recently introduced character-level NMT systems perform better at transliteration than models with byte-pair encoding (BPE) segmentation, but perform more poorly at morphosyntactic agreement, and translating discontiguous units of meaning.

pdf bib
Practical Neural Machine Translation
Rico Sennrich | Barry Haddow
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Neural Machine Translation (NMT) has achieved new breakthroughs in machine translation in recent years. It has dominated recent shared translation tasks in machine translation research, and is also being quickly adopted in industry. The technical differences between NMT and the previously dominant phrase-based statistical approach require that practictioners learn new best practices for building MT systems, ranging from different hardware requirements, new techniques for handling rare words and monolingual data, to new opportunities in continued learning and domain adaptation. This tutorial is aimed at researchers and users of machine translation interested in working with NMT. The tutorial will cover a basic theoretical introduction to NMT, discuss the components of state-of-the-art systems, and provide practical advice for building NMT systems.