Elizabeth Salesky


pdf bib
Tutorial Proposal : End-to-End Speech Translation
Jan Niehues | Elizabeth Salesky | Marco Turchi | Matteo Negri
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Speech translation is the translation of speech in one language typically to text in another, traditionally accomplished through a combination of automatic speech recognition and machine translation. Speech translation has attracted interest for many years, but the recent successful applications of deep learning to both individual tasks have enabled new opportunities through joint modeling, in what we today call ‘end-to-end speech translation.’ In this tutorial we will introduce the techniques used in cutting-edge research on speech translation. Starting from the traditional cascaded approach, we will given an overview on data sources and model architectures to achieve state-of-the art performance with end-to-end speech translation for both high- and low-resource languages. In addition, we will discuss methods to evaluate analyze the proposed solutions, as well as the challenges faced when applying speech translation models for real-world applications.

pdf bib
Robust Open-Vocabulary Translation from Visual Text Representations
Elizabeth Salesky | David Etter | Matt Post
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Machine translation models have discrete vocabularies and commonly use subword segmentation techniques to achieve an ‘open vocabulary.’ This approach relies on consistent and correct underlying unicode sequences, and makes models susceptible to degradation from common types of noise and variation. Motivated by the robustness of human language processing, we propose the use of visual text representations, which dispense with a finite set of text embeddings in favor of continuous vocabularies created by processing visually rendered text with sliding windows. We show that models using visual text representations approach or match performance of traditional text models on small and larger datasets. More importantly, models with visual embeddings demonstrate significant robustness to varied types of noise, achieving e.g., 25.9 BLEU on a character permuted GermanEnglish task where subword models degrade to 1.9.

pdf bib
Proceedings of the Third Workshop on Computational Typology and Multilingual NLP
Ekaterina Vylomova | Elizabeth Salesky | Sabrina Mielke | Gabriella Lapesa | Ritesh Kumar | Harald Hammarström | Ivan Vulić | Anna Korhonen | Roi Reichart | Edoardo Maria Ponti | Ryan Cotterell
Proceedings of the Third Workshop on Computational Typology and Multilingual NLP

pdf bib
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
Marcello Federico | Alex Waibel | Marta R. Costa-jussà | Jan Niehues | Sebastian Stuker | Elizabeth Salesky
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

pdf bib
Antonios Anastasopoulos | Ondřej Bojar | Jacob Bremerman | Roldano Cattoni | Maha Elbayad | Marcello Federico | Xutai Ma | Satoshi Nakamura | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Sebastian Stüker | Katsuhito Sudoh | Marco Turchi | Alexander Waibel | Changhan Wang | Matthew Wiesner
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks : (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation. A total of 22 teams participated in at least one of the tasks. This paper describes each shared task, data and evaluation metrics, and reports results of the received submissions.


pdf bib
SIGTYP 2020 Shared Task : Prediction of Typological FeaturesSIGTYP 2020 Shared Task: Prediction of Typological Features
Johannes Bjerva | Elizabeth Salesky | Sabrina J. Mielke | Aditi Chaudhary | Giuseppe G. A. Celano | Edoardo Maria Ponti | Ekaterina Vylomova | Ryan Cotterell | Isabelle Augenstein
Proceedings of the Second Workshop on Computational Research in Linguistic Typology

Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013) contain information about linguistic properties of the world’s languages. They have been shown to be useful for downstream applications, including cross-lingual transfer learning and linguistic probing. A major drawback hampering broader adoption of typological KBs is that they are sparsely populated, in the sense that most languages only have annotations for some features, and skewed, in that few features have wide coverage. As typological features often correlate with one another, it is possible to predict them and thus automatically populate typological KBs, which is also the focus of this shared task. Overall, the task attracted 8 submissions from 5 teams, out of which the most successful methods make use of such feature correlations. However, our error analysis reveals that even the strongest submitted systems struggle with predicting feature values for languages where few features are known.


pdf bib
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in MorphologyCMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology
Aditi Chaudhary | Elizabeth Salesky | Gayatri Bhat | David R. Mortensen | Jaime Carbonell | Yulia Tsvetkov
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper presents the submission by the CMU-01 team to the SIGMORPHON 2019 task 2 of Morphological Analysis and Lemmatization in Context. This task requires us to produce the lemma and morpho-syntactic description of each token in a sequence, for 107 treebanks. We approach this task with a hierarchical neural conditional random field (CRF) model which predicts each coarse-grained feature (eg. POS, Case, etc.) independently. However, most treebanks are under-resourced, thus making it challenging to train deep neural models for them. Hence, we propose a multi-lingual transfer training regime where we transfer from multiple related languages that share similar typology.

pdf bib
The IWSLT 2019 Evaluation CampaignIWSLT 2019 Evaluation Campaign
Jan Niehues | Rolando Cattoni | Sebastian Stüker | Matteo Negri | Marco Turchi | Thanh-Le Ha | Elizabeth Salesky | Ramon Sanabria | Loic Barrault | Lucia Specia | Marcello Federico
Proceedings of the 16th International Conference on Spoken Language Translation

The IWSLT 2019 evaluation campaign featured three tasks : speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech. For the first two tasks we encouraged submissions of end- to-end speech-to-text systems, and for the second task participants could also use the video as additional input. We received submissions by 12 research teams. This overview provides detailed descriptions of the data and evaluation conditions of each task and reports results of the participating systems.

pdf bib
Exploring Phoneme-Level Speech Representations for End-to-End Speech Translation
Elizabeth Salesky | Matthias Sperber | Alan W Black
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Previous work on end-to-end translation from speech has primarily used frame-level features as speech representations, which creates longer, sparser sequences than text. We show that a naive method to create compressed phoneme-like speech representations is far more effective and efficient for translation than traditional frame-level speech features. Specifically, we generate phoneme labels for speech frames and average consecutive frames with the same label to create shorter, higher-level source sequences for translation. We see improvements of up to 5 BLEU on both our high and low resource language pairs, with a reduction in training time of 60 %. Our improvements hold across multiple data sizes and two language pairs.


pdf bib
KIT Lecture Translator : Multilingual Speech Translation with One-Shot LearningKIT Lecture Translator: Multilingual Speech Translation with One-Shot Learning
Florian Dessloch | Thanh-Le Ha | Markus Müller | Jan Niehues | Thai-Son Nguyen | Ngoc-Quan Pham | Elizabeth Salesky | Matthias Sperber | Sebastian Stüker | Thomas Zenkel | Alexander Waibel
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

In today’s globalized world we have the ability to communicate with people across the world. However, in many situations the language barrier still presents a major issue. For example, many foreign students coming to KIT to study are initially unable to follow a lecture in German. Therefore, we offer an automatic simultaneous interpretation service for students. To fulfill this task, we have developed a low-latency translation system that is adapted to lectures and covers several language pairs. While the switch from traditional Statistical Machine Translation to Neural Machine Translation (NMT) significantly improved performance, to integrate NMT into the speech translation framework required several adjustments. We have addressed the run-time constraints and different types of input. Furthermore, we utilized one-shot learning to easily add new topic-specific terms to the system. Besides better performance, NMT also enabled us increase our covered languages through multilingual NMT. % Combining these techniques, we are able to provide an adapted speech translation system for several European languages.


pdf bib
KIT’s Multilingual Neural Machine Translation systems for IWSLT 2017KIT’s Multilingual Neural Machine Translation systems for IWSLT 2017
Ngoc-Quan Pham | Matthias Sperber | Elizabeth Salesky | Thanh-Le Ha | Jan Niehues | Alexander Waibel
Proceedings of the 14th International Conference on Spoken Language Translation

In this paper, we present KIT’s multilingual neural machine translation (NMT) systems for the IWSLT 2017 evaluation campaign machine translation (MT) and spoken language translation (SLT) tasks. For our MT task submissions, we used our multi-task system, modified from a standard attentional neural machine translation framework, instead of building 20 individual NMT systems. We investigated different architectures as well as different data corpora in training such a multilingual system. We also suggested an effective adaptation scheme for multilingual systems which brings great improvements compared to monolingual systems. For the SLT track, in addition to a monolingual neural translation system used to generate correct punctuations and true cases of the data prior to training our multilingual system, we introduced a noise model in order to make our system more robust. Results show that our novel modifications improved our systems considerably on all tasks.