Roman Grundkiewicz


2021

pdf bib
Proceedings of the Sixth Conference on Machine Translation
Loic Barrault | Ondrej Bojar | Fethi Bougares | Rajen Chatterjee | Marta R. Costa-jussa | Christian Federmann | Mark Fishel | Alexander Fraser | Markus Freitag | Yvette Graham | Roman Grundkiewicz | Paco Guzman | Barry Haddow | Matthias Huck | Antonio Jimeno Yepes | Philipp Koehn | Tom Kocmi | Andre Martins | Makoto Morishita | Christof Monz
Proceedings of the Sixth Conference on Machine Translation

pdf bib
Efficient Machine Translation with Model Pruning and Quantization
Maximiliana Behnke | Nikolay Bogoychev | Alham Fikri Aji | Kenneth Heafield | Graeme Nail | Qianqian Zhu | Svetlana Tchistiakova | Jelmer van der Linde | Pinzhen Chen | Sidharth Kashyap | Roman Grundkiewicz
Proceedings of the Sixth Conference on Machine Translation

We participated in all tracks of the WMT 2021 efficient machine translation task : single-core CPU, multi-core CPU, and GPU hardware with throughput and latency conditions. Our submissions combine several efficiency strategies : knowledge distillation, a simpler simple recurrent unit (SSRU) decoder with one or two layers, lexical shortlists, smaller numerical formats, and pruning. For the CPU track, we used quantized 8-bit models. For the GPU track, we experimented with FP16 and 8-bit integers in tensorcores. Some of our submissions optimize for size via 4-bit log quantization and omitting a lexical shortlist. We have extended pruning to more parts of the network, emphasizing component- and block-level pruning that actually improves speed unlike coefficient-wise pruning.

2020

pdf bib
Speed-optimized, Compact Student Models that Distill Knowledge from a Larger Teacher Model : the UEDIN-CUNI Submission to the WMT 2020 News Translation TaskUEDIN-CUNI Submission to the WMT 2020 News Translation Task
Ulrich Germann | Roman Grundkiewicz | Martin Popel | Radina Dobreva | Nikolay Bogoychev | Kenneth Heafield
Proceedings of the Fifth Conference on Machine Translation

We describe the joint submission of the University of Edinburgh and Charles University, Prague, to the Czech / English track in the WMT 2020 Shared Task on News Translation. Our fast and compact student models distill knowledge from a larger, slower teacher. They are designed to offer a good trade-off between translation quality and inference efficiency. On the WMT 2020 Czech English test sets, they achieve translation speeds of over 700 whitespace-delimited source words per second on a single CPU thread, thus making neural translation feasible on consumer hardware without a GPU.

pdf bib
Edinburgh’s Submissions to the 2020 Machine Translation Efficiency TaskEdinburgh’s Submissions to the 2020 Machine Translation Efficiency Task
Nikolay Bogoychev | Roman Grundkiewicz | Alham Fikri Aji | Maximiliana Behnke | Kenneth Heafield | Sidharth Kashyap | Emmanouil-Ioannis Farsarakis | Mateusz Chudyk
Proceedings of the Fourth Workshop on Neural Generation and Translation

We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task : single-core CPU, multi-core CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multi-core setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality.

2019

pdf bib
The University of Edinburgh’s Submissions to the WMT19 News Translation TaskUniversity of Edinburgh’s Submissions to the WMT19 News Translation Task
Rachel Bawden | Nikolay Bogoychev | Ulrich Germann | Roman Grundkiewicz | Faheem Kirefu | Antonio Valerio Miceli Barone | Alexandra Birch
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions : EnglishGujarati, EnglishChinese, GermanEnglish, and EnglishCzech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For EnglishGujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For GermanEnglish, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. For EnglishCzech, we compared different preprocessing and tokenisation regimes.

2018

pdf bib
The University of Edinburgh’s Submissions to the WMT18 News Translation TaskUniversity of Edinburgh’s Submissions to the WMT18 News Translation Task
Barry Haddow | Nikolay Bogoychev | Denis Emelin | Ulrich Germann | Roman Grundkiewicz | Kenneth Heafield | Antonio Valerio Miceli Barone | Rico Sennrich
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

The University of Edinburgh made submissions to all 14 language pairs in the news translation task, with strong performances in most pairs. We introduce new RNN-variant, mixed RNN / Transformer ensembles, data selection and weighting, and extensions to back-translation.

pdf bib
MS-UEdin Submission to the WMT2018 APE Shared Task : Dual-Source Transformer for Automatic Post-EditingMS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source Transformer for Automatic Post-Editing
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the Microsoft and University of Edinburgh submission to the Automatic Post-editing shared task at WMT2018. Based on training data and systems from the WMT2017 shared task, we re-implement our own models from the last shared task and introduce improvements based on extensive parameter sharing. Next we experiment with our implementation of dual-source transformer models and data selection for the IT domain. Our submissions decisively wins the SMT post-editing sub-task establishing the new state-of-the-art and is a very close second (or equal, 16.46 vs 16.50 TER) in the NMT sub-task. Based on the rather weak results in the NMT sub-task, we hypothesize that neural-on-neural APE might not be actually useful.

2017

pdf bib
An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing
Marcin Junczys-Dowmunt | Roman Grundkiewicz
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this work, we explore multiple neural architectures adapted for the task of automatic post-editing of machine translation output. We focus on neural end-to-end models that combine both inputs mt (raw MT output) and src (source language input) in a single neural architecture, modeling pe directly. Apart from that, we investigate the influence of hard-attention models which seem to be well-suited for monolingual tasks, as well as combinations of both ideas. We report results on data sets provided during the WMT-2016 shared task on automatic post-editing and can demonstrate that dual-attention models that incorporate all available data in the APE scenario in a single model improve on the best shared task system and on all other published results after the shared task. Dual-attention models that are combined with hard attention remain competitive despite applying fewer changes to the input.mt (raw MT output) and src (source language input) in a single neural architecture, modeling \\{mt, src\\} \\rightarrow pe directly. Apart from that, we investigate the influence of hard-attention models which seem to be well-suited for monolingual tasks, as well as combinations of both ideas. We report results on data sets provided during the WMT-2016 shared task on automatic post-editing and can demonstrate that dual-attention models that incorporate all available data in the APE scenario in a single model improve on the best shared task system and on all other published results after the shared task. Dual-attention models that are combined with hard attention remain competitive despite applying fewer changes to the input.

pdf bib
Pushing the Limits of Translation Quality Estimation
André F. T. Martins | Marcin Junczys-Dowmunt | Fabio N. Kepler | Ramón Astudillo | Chris Hokamp | Roman Grundkiewicz
Transactions of the Association for Computational Linguistics, Volume 5

Translation quality estimation is a task of growing importance in NLP, due to its potential to reduce post-editing human effort in disruptive ways. However, this potential is currently limited by the relatively low accuracy of existing systems. In this paper, we achieve remarkable improvements by exploiting synergies between the related tasks of word-level quality estimation and automatic post-editing. First, we stack a new, carefully engineered, neural model into a rich feature-based word-level quality estimation system. Then, we use the output of an automatic post-editing system as an extra feature, obtaining striking results on WMT16 : a word-level FMULT1 score of 57.47 % (an absolute gain of +7.95 % over the current state of the art), and a Pearson correlation score of 65.56 % for sentence-level HTER prediction (an absolute gain of +13.36 %).