Marco Turchi


2021

pdf bib
Proceedings of the 1st Workshop on Automatic Spoken Language Translation in Real-World Settings (ASLTRW)
Marco Turchi | Claudio Fantinuoli
Proceedings of the 1st Workshop on Automatic Spoken Language Translation in Real-World Settings (ASLTRW)

pdf bib
Cascade versus Direct Speech Translation : Do the Differences Still Make a Difference?
Luisa Bentivogli | Mauro Cettolo | Marco Gaido | Alina Karakanta | Alberto Martinelli | Matteo Negri | Marco Turchi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Five years after the first published proofs of concept, direct approaches to speech translation (ST) are now competing with traditional cascade solutions. In light of this steady progress, can we claim that the performance gap between the two is closed? Starting from this question, we present a systematic comparison between state-of-the-art systems representative of the two paradigms. Focusing on three language directions (English-German / Italian / Spanish), we conduct automatic and manual evaluations, exploiting high-quality professional post-edits and annotations. Our multi-faceted analysis on one of the few publicly available ST benchmarks attests for the first time that : i) the gap between the two paradigms is now closed, and ii) the subtle differences observed in their behavior are not sufficient for humans neither to distinguish them nor to prefer one over the other.

pdf bib
CTC-based Compression for Direct Speech TranslationCTC-based Compression for Direct Speech Translation
Marco Gaido | Mauro Cettolo | Matteo Negri | Marco Turchi
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Previous studies demonstrated that a dynamic phone-informed compression of the input audio is beneficial for speech translation (ST). However, they required a dedicated model for phone recognition and did not test this solution for direct ST, in which a single model translates the input audio into the target language without intermediate representations. In this work, we propose the first method able to perform a dynamic compression of the input in direct ST models. In particular, we exploit the Connectionist Temporal Classification (CTC) to compress the input sequence according to its phonetic characteristics. Our experiments demonstrate that our solution brings a 1.3-1.5 BLEU improvement over a strong baseline on two language pairs (English-Italian and English-German), contextually reducing the memory footprint by more than 10 %.

pdf bib
Tutorial Proposal : End-to-End Speech Translation
Jan Niehues | Elizabeth Salesky | Marco Turchi | Matteo Negri
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Speech translation is the translation of speech in one language typically to text in another, traditionally accomplished through a combination of automatic speech recognition and machine translation. Speech translation has attracted interest for many years, but the recent successful applications of deep learning to both individual tasks have enabled new opportunities through joint modeling, in what we today call ‘end-to-end speech translation.’ In this tutorial we will introduce the techniques used in cutting-edge research on speech translation. Starting from the traditional cascaded approach, we will given an overview on data sources and model architectures to achieve state-of-the art performance with end-to-end speech translation for both high- and low-resource languages. In addition, we will discuss methods to evaluate analyze the proposed solutions, as well as the challenges faced when applying speech translation models for real-world applications.

pdf bib
Speechformer : Reducing Information Loss in Direct Speech Translation
Sara Papi | Marco Gaido | Matteo Negri | Marco Turchi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Transformer-based models have gained increasing popularity achieving state-of-the-art performance in many research fields including speech translation. However, Transformer’s quadratic complexity with respect to the input sequence length prevents its adoption as is with audio signals, which are typically represented by long sequences. Current solutions resort to an initial sub-optimal compression based on a fixed sampling of raw audio features. Therefore, potentially useful linguistic information is not accessible to higher-level layers in the architecture. To solve this issue, we propose Speechformer, an architecture that, thanks to reduced memory usage in the attention layers, avoids the initial lossy compression and aggregates information only at a higher level according to more informed linguistic criteria. Experiments on three language pairs (ende / es / nl) show the efficacy of our solution, with gains of up to 0.8 BLEU on the standard MuST-C corpus and of up to 4.0 BLEU in a low resource scenario.

pdf bib
FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGNFINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN
Antonios Anastasopoulos | Ondřej Bojar | Jacob Bremerman | Roldano Cattoni | Maha Elbayad | Marcello Federico | Xutai Ma | Satoshi Nakamura | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Sebastian Stüker | Katsuhito Sudoh | Marco Turchi | Alexander Waibel | Changhan Wang | Matthew Wiesner
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks : (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation. A total of 22 teams participated in at least one of the tasks. This paper describes each shared task, data and evaluation metrics, and reports results of the received submissions.

pdf bib
Dealing with training and test segmentation mismatch : FBK@IWSLT2021FBK@IWSLT2021
Sara Papi | Marco Gaido | Matteo Negri | Marco Turchi
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes FBK’s system submission to the IWSLT 2021 Offline Speech Translation task. We participated with a direct model, which is a Transformer-based architecture trained to translate English speech audio data into German texts. The training pipeline is characterized by knowledge distillation and a two-step fine-tuning procedure. Both knowledge distillation and the first fine-tuning step are carried out on manually segmented real and synthetic data, the latter being generated with an MT system trained on the available corpora. Differently, the second fine-tuning step is carried out on a random segmentation of the MuST-C v2 En-De dataset. Its main goal is to reduce the performance drops occurring when a speech translation model trained on manually segmented data (i.e. an ideal, sentence-like segmentation) is evaluated on automatically segmented audio (i.e. actual, more realistic testing conditions). For the same purpose, a custom hybrid segmentation procedure that accounts for both audio content (pauses) and for the length of the produced segments is applied to the test data before passing them to the system. At inference time, we compared this procedure with a baseline segmentation method based on Voice Activity Detection (VAD). Our results indicate the effectiveness of the proposed hybrid approach, shown by a reduction of the gap with manual segmentation from 8.3 to 1.4 BLEU points.

pdf bib
Between Flexibility and Consistency : Joint Generation of Captions and Subtitles
Alina Karakanta | Marco Gaido | Matteo Negri | Marco Turchi
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

Speech translation (ST) has lately received growing interest for the generation of subtitles without the need for an intermediate source language transcription and timing (i.e. captions). However, the joint generation of source captions and target subtitles does not only bring potential output quality advantages when the two decoding processes inform each other, but it is also often required in multilingual scenarios. In this work, we focus on ST models which generate consistent captions-subtitles in terms of structure and lexical content. We further introduce new metrics for evaluating subtitling consistency. Our findings show that joint decoding leads to increased performance and consistency between the generated captions and subtitles while still allowing for sufficient flexibility to produce subtitles conforming to language-specific needs and norms.

2020

pdf bib
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
André Martins | Helena Moniz | Sara Fumega | Bruno Martins | Fernando Batista | Luisa Coheur | Carla Parra | Isabel Trancoso | Marco Turchi | Arianna Bisazza | Joss Moorkens | Ana Guerberof | Mary Nurminen | Lena Marg | Mikel L. Forcada
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

pdf bib
CEF Data Marketplace : Powering a Long-term Supply of Language DataCEF Data Marketplace: Powering a Long-term Supply of Language Data
Amir Kamran | Dace Dzeguze | Jaap van der Meer | Milica Panic | Alessandro Cattelan | Daniele Patrioli | Luisa Bentivogli | Marco Turchi
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

We describe the CEF Data Marketplace project, which focuses on the development of a trading platform of translation data for language professionals : translators, machine translation (MT) developers, language service providers (LSPs), translation buyers and government bodies. The CEF Data Marketplace platform will be designed and built to manage and trade data for all languages and domains. This project will open a continuous and longterm supply of language data for MT and other machine learning applications.

pdf bib
Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE CorpusMuST-SHE Corpus
Luisa Bentivogli | Beatrice Savoldi | Matteo Negri | Mattia A. Di Gangi | Roldano Cattoni | Marco Turchi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Translating from languages without productive grammatical gender like English into gender-marked languages is a well-known difficulty for machines. This difficulty is also due to the fact that the training data on which models are built typically reflect the asymmetries of natural languages, gender bias included. Exclusively fed with textual data, machine translation is intrinsically constrained by the fact that the input sentence does not always contain clues about the gender identity of the referred human entities. But what happens with speech translation, where the input is an audio signal? Can audio provide additional information to reduce gender bias? We present the first thorough investigation of gender bias in speech translation, contributing with : i) the release of a benchmark useful for future studies, and ii) the comparison of different technologies (cascade and end-to-end) on two language directions (English-Italian / French).

pdf bib
Machine-oriented NMT Adaptation for Zero-shot NLP tasks : Comparing the Usefulness of Close and Distant LanguagesNMT Adaptation for Zero-shot NLP tasks: Comparing the Usefulness of Close and Distant Languages
Amirhossein Tebbifakhr | Matteo Negri | Marco Turchi
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects

Neural Machine Translation (NMT) models are typically trained by considering humans as end-users and maximizing human-oriented objectives. However, in some scenarios, their output is consumed by automatic NLP components rather than by humans. In these scenarios, translations’ quality is measured in terms of their fitness for purpose (i.e. maximizing performance of external NLP tools) rather than in terms of standard human fluency / adequacy criteria. Recently, reinforcement learning techniques exploiting the feedback from downstream NLP tools have been proposed for machine-oriented NMT adaptation. In this work, we tackle the problem in a multilingual setting where a single NMT model translates from multiple languages for downstream automatic processing in the target language. Knowledge sharing across close and distant languages allows to apply our machine-oriented approach in the zero-shot setting where no labeled data for the test language is seen at training time. Moreover, we incorporate multi-lingual BERT in the source side of our NMT system to benefit from the knowledge embedded in this model. Our experiments show coherent performance gains, for different language directions over both i) generic NMT models (trained for human consumption), and ii) fine-tuned multilingual BERT. This gain for zero-shot language directions (e.g. SpanishEnglish) is higher when the models are fine-tuned on a closely-related source language (Italian) than a distant one (German).

pdf bib
Findings of the WMT 2020 Shared Task on Automatic Post-EditingWMT 2020 Shared Task on Automatic Post-Editing
Rajen Chatterjee | Markus Freitag | Matteo Negri | Marco Turchi
Proceedings of the Fifth Conference on Machine Translation

We present the results of the 6th round of the WMT task on MT Automatic Post-Editing. The task consists in automatically correcting the output of a black-box machine translation system by learning from existing human corrections of different sentences. This year, the challenge consisted of fixing the errors present in English Wikipedia pages translated into German and Chinese by state-ofthe-art, not domain-adapted neural MT (NMT) systems unknown to participants. Six teams participated in the English-German task, submitting a total of 11 runs. Two teams participated in the English-Chinese task submitting 2 runs each. Due to i) the different source / domain of data compared to the past (Wikipedia vs Information Technology), ii) the different quality of the initial translations to be corrected and iii) the introduction of a new language pair (English-Chinese), this year’s results are not directly comparable with last year’s round. However, on both language directions, participants’ submissions show considerable improvements over the baseline results. On English-German, the top ranked system improves over the baseline by -11.35 TER and +16.68 BLEU points, while on EnglishChinese the improvements are respectively up to -12.13 TER and +14.57 BLEU points. Overall, coherent gains are also highlighted by the outcomes of human evaluation, which confirms the effectiveness of APE to improve MT quality, especially in the new generic domain selected for this year’s round.

2019

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

pdf bib
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | André Martins | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Marco Turchi | Karin Verspoor
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

pdf bib
Effort-Aware Neural Automatic Post-Editing
Amirhossein Tebbifakhr | Matteo Negri | Marco Turchi
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

For this round of the WMT 2019 APE shared task, our submission focuses on addressing the over-correction problem in APE. Over-correction occurs when the APE system tends to rephrase an already correct MT output, and the resulting sentence is penalized by a reference-based evaluation against human post-edits. Our intuition is that this problem can be prevented by informing the system about the predicted quality of the MT output or, in other terms, the expected amount of needed corrections. For this purpose, following the common approach in multilingual NMT, we prepend a special token to the beginning of both the source text and the MT output indicating the required amount of post-editing. Following the best submissions to the WMT 2018 APE shared task, our backbone architecture is based on multi-source Transformer to encode both the MT output and the corresponding source text. We participated both in the English-German and English-Russian subtasks. In the first subtask, our best submission improved the original MT output quality up to +0.98 BLEU and -0.47 TER. In the second subtask, where the higher quality of the MT output increases the risk of over-correction, none of our submitted runs was able to improve the MT output.

pdf bib
Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation
Mihael Arcan | Marco Turchi | Jinhua Du | Dimitar Shterionov | Daniel Torregrosa
Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation

pdf bib
The IWSLT 2019 Evaluation CampaignIWSLT 2019 Evaluation Campaign
Jan Niehues | Rolando Cattoni | Sebastian Stüker | Matteo Negri | Marco Turchi | Thanh-Le Ha | Elizabeth Salesky | Ramon Sanabria | Loic Barrault | Lucia Specia | Marcello Federico
Proceedings of the 16th International Conference on Spoken Language Translation

The IWSLT 2019 evaluation campaign featured three tasks : speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech. For the first two tasks we encouraged submissions of end- to-end speech-to-text systems, and for the second task participants could also use the video as additional input. We received submissions by 12 research teams. This overview provides detailed descriptions of the data and evaluation conditions of each task and reports results of the participating systems.

pdf bib
Adapting Multilingual Neural Machine Translation to Unseen Languages
Surafel M. Lakew | Alina Karakanta | Marcello Federico | Matteo Negri | Marco Turchi
Proceedings of the 16th International Conference on Spoken Language Translation

Multilingual Neural Machine Translation (MNMT) for low- resource languages (LRL) can be enhanced by the presence of related high-resource languages (HRL), but the relatedness of HRL usually relies on predefined linguistic assumptions about language similarity. Recently, adapting MNMT to a LRL has shown to greatly improve performance. In this work, we explore the problem of adapting an MNMT model to an unseen LRL using data selection and model adapta- tion. In order to improve NMT for LRL, we employ perplexity to select HRL data that are most similar to the LRL on the basis of language distance. We extensively explore data selection in popular multilingual NMT settings, namely in (zero-shot) translation, and in adaptation from a multilingual pre-trained model, for both directions (LRLen). We further show that dynamic adaptation of the model’s vocabulary results in a more favourable segmentation for the LRL in comparison with direct adaptation. Experiments show re- ductions in training time and significant performance gains over LRL baselines, even with zero LRL data (+13.0 BLEU), up to +17.0 BLEU for pre-trained multilingual model dynamic adaptation with related data selection. Our method outperforms current approaches, such as massively multilingual models and data augmentation, on four LRL.

2018

pdf bib
Proceedings of the Third Conference on Machine Translation: Research Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Research Papers

bib
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Ondřej Bojar | Rajen Chatterjee | Christian Federmann | Mark Fishel | Yvette Graham | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Christof Monz | Matteo Negri | Aurélie Névéol | Mariana Neves | Matt Post | Lucia Specia | Marco Turchi | Karin Verspoor
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

pdf bib
Generating E-Commerce Product Titles and Predicting their QualityE-Commerce Product Titles and Predicting their Quality
José G. Camargo de Souza | Michael Kozielski | Prashant Mathur | Ernie Chang | Marco Guerini | Matteo Negri | Marco Turchi | Evgeny Matusov
Proceedings of the 11th International Conference on Natural Language Generation

E-commerce platforms present products using titles that summarize product information. These titles can not be created by hand, therefore an algorithmic solution is required. The task of automatically generating these titles given noisy user provided titles is one way to achieve the goal. The setting requires the generation process to be fast and the generated title to be both human-readable and concise. Furthermore, we need to understand if such generated titles are usable. As such, we propose approaches that (i) automatically generate product titles, (ii) predict their quality. Our approach scales to millions of products and both automatic and human evaluations performed on real-world data indicate our approaches are effective and applicable to existing e-commerce scenarios.

pdf bib
Proceedings of the 15th International Conference on Spoken Language Translation
Marco Turchi | Jan Niehues | Marcello Frederico
Proceedings of the 15th International Conference on Spoken Language Translation

2017

pdf bib
FBK’s Multilingual Neural Machine Translation System for IWSLT 2017FBK’s Multilingual Neural Machine Translation System for IWSLT 2017
Surafel M. Lakew | Quintino F. Lotito | Marco Turchi | Matteo Negri | Marcello Federico
Proceedings of the 14th International Conference on Spoken Language Translation

Neural Machine Translation has been shown to enable inference and cross-lingual knowledge transfer across multiple language directions using a single multilingual model. Focusing on this multilingual translation scenario, this work summarizes FBK’s participation in the IWSLT 2017 shared task. Our submissions rely on two multilingual systems trained on five languages (English, Dutch, German, Italian, and Romanian). The first one is a 20 language direction model, which handles all possible combinations of the five languages. The second multilingual system is trained only on 16 directions, leaving the others as zero-shot translation directions (i.e representing a more complex inference task on language pairs not seen at training time). More specifically, our zero-shot directions are Dutch$German and Italian$Romanian (resulting in four language combinations). Despite the small amount of parallel data used for training these systems, the resulting multilingual models are effective, even in comparison with models trained separately for every language pair (i.e. in more favorable conditions). We compare and show the results of the two multilingual models against a baseline single language pair systems. Particularly, we focus on the four zero-shot directions and show how a multilingual model trained with small data can provide reasonable results. Furthermore, we investigate how pivoting (i.e using a bridge / pivot language for inference in a source!pivot!target translations) using a multilingual model can be an alternative to enable zero-shot translation in a low resource setting.

pdf bib
Improving Zero-Shot Translation of Low-Resource Languages
Surafel M. Lakew | Quintino F. Lotito | Matteo Negri | Marco Turchi | Marcello Federico
Proceedings of the 14th International Conference on Spoken Language Translation

Recent work on multilingual neural machine translation reported competitive performance with respect to bilingual models and surprisingly good performance even on (zero-shot) translation directions not observed at training time. We investigate here a zero-shot translation in a particularly low-resource multilingual setting. We propose a simple iterative training procedure that leverages a duality of translations directly generated by the system for the zero-shot directions. The translations produced by the system (sub-optimal since they contain mixed language from the shared vocabulary), are then used together with the original parallel data to feed and iteratively re-train the multilingual network. Over time, this allows the system to learn from its own generated and increasingly better output. Our approach shows to be effective in improving the two zero-shot directions of our multilingual model. In particular, we observed gains of about 9 BLEU points over a baseline multilingual model and up to 2.08 BLEU over a pivoting mechanism using two bilingual models. Further analysis shows that there is also a slight improvement in the non-zero-shot language directions.

pdf bib
Online Automatic Post-editing for MT in a Multi-Domain Translation EnvironmentMT in a Multi-Domain Translation Environment
Rajen Chatterjee | Gebremedhen Gebremelak | Matteo Negri | Marco Turchi
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Automatic post-editing (APE) for machine translation (MT) aims to fix recurrent errors made by the MT decoder by learning from correction examples. In controlled evaluation scenarios, the representativeness of the training set with respect to the test data is a key factor to achieve good performance. Real-life scenarios, however, do not guarantee such favorable learning conditions. Ideally, to be integrated in a real professional translation workflow (e.g. to play a role in computer-assisted translation framework), APE tools should be flexible enough to cope with continuous streams of diverse data coming from different domains / genres. To cope with this problem, we propose an online APE framework that is : i) robust to data diversity (i.e. capable to learn and apply correction rules in the right contexts) and ii) able to evolve over time (by continuously extending and refining its knowledge). In a comparative evaluation, with English-German test data coming in random order from two different domains, we show the effectiveness of our approach, which outperforms a strong batch system and the state of the art in online APE.