Laurent Besacier


2021

pdf bib
Investigating the Impact of Gender Representation in ASR Training Data : a Case Study on LibrispeechASR Training Data: a Case Study on Librispeech
Mahault Garnerin | Solange Rossato | Laurent Besacier
Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing

In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system. We create an experiment based on the Librispeech corpus and build 3 different training corpora varying only the proportion of data produced by each gender category. We observe that if our system is overall robust to the gender balance or imbalance in training data, it is nonetheless dependant of the adequacy between the individuals present in the training and testing sets.

pdf bib
Controlling Prosody in End-to-End TTS : A Case Study on Contrastive Focus GenerationTTS: A Case Study on Contrastive Focus Generation
Siddique Latif | Inyoung Kim | Ioan Calapodescu | Laurent Besacier
Proceedings of the 25th Conference on Computational Natural Language Learning

While End-2-End Text-to-Speech (TTS) has made significant progresses over the past few years, these systems still lack intuitive user controls over prosody. For instance, generating speech with fine-grained prosody control (prosodic prominence, contextually appropriate emotions) is still an open challenge. In this paper, we investigate whether we can control prosody directly from the input text, in order to code information related to contrastive focus which emphasizes a specific word that is contrary to the presuppositions of the interlocutor. We build and share a specific dataset for this purpose and show that it allows to train a TTS system were this fine-grained prosodic feature can be correctly conveyed using control tokens. Our evaluation compares synthetic and natural utterances and shows that prosodic patterns of contrastive focus (variations of Fo, Intensity and Duration) can be learnt accurately. Such a milestone is important to allow, for example, smart speakers to be programmatically controlled in terms of output prosody.

2020

pdf bib
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Dorothee Beermann | Laurent Besacier | Sakriani Sakti | Claudia Soria
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

pdf bib
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation
Hang Le | Juan Pino | Changhan Wang | Jiatao Gu | Didier Schwab | Laurent Besacier
Proceedings of the 28th International Conference on Computational Linguistics

We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST). Our models are based on the original Transformer architecture (Vaswani et al., 2017) but consist of two decoders, each responsible for one task (ASR or ST). Our major contribution lies in how these decoders interact with each other : one decoder can attend to different information sources from the other via a dual-attention mechanism. We propose two variants of these architectures corresponding to two different levels of dependencies between the decoders, called the parallel and cross dual-decoder Transformers, respectively. Extensive experiments on the MuST-C dataset show that our models outperform the previously-reported highest translation performance in the multilingual settings, and outperform as well bilingual one-to-one results. Furthermore, our parallel models demonstrate no trade-off between ASR and ST compared to the vanilla multi-task architecture. Our code and pre-trained models are available at https://github.com/formiel/speech-translation.

pdf bib
FlauBERT : Unsupervised Language Model Pre-training for FrenchFlauBERT: Unsupervised Language Model Pre-training for French
Hang Le | Loïc Vial | Jibril Frej | Vincent Segonne | Maximin Coavoux | Benjamin Lecouteux | Alexandre Allauzen | Benoit Crabbé | Laurent Besacier | Didier Schwab
Proceedings of the 12th Language Resources and Evaluation Conference

Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their contextualization at the sentence level. This has been widely demonstrated for English using contextualized representations (Dai and Le, 2015 ; Peters et al., 2018 ; Howard and Ruder, 2018 ; Radford et al., 2018 ; Devlin et al., 2019 ; Yang et al., 2019b). In this paper, we introduce and share FlauBERT, a model learned on a very large and heterogeneous French corpus. Models of different sizes are trained using the new CNRS (French National Centre for Scientific Research) Jean Zay supercomputer. We apply our French language models to diverse NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and show that most of the time they outperform other pre-training approaches. Different versions of FlauBERT as well as a unified evaluation protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research community for further reproducible experiments in French NLP.

pdf bib
Gender Representation in Open Source Speech Resources
Mahault Garnerin | Solange Rossato | Laurent Besacier
Proceedings of the 12th Language Resources and Evaluation Conference

With the rise of artificial intelligence (AI) and the growing use of deep-learning architectures, the question of ethics, transparency and fairness of AI systems has become a central concern within the research community. We address transparency and fairness in spoken language systems by proposing a study about gender representation in speech resources available through the Open Speech and Language Resource platform. We show that finding gender information in open source corpora is not straightforward and that gender balance depends on other corpus characteristics (elicited / non elicited speech, low / high resource language, speech task targeted). The paper ends with recommendations about metadata and gender information for researchers in order to assure better transparency of the speech systems built using such corpora.

2019

pdf bib
Naver Labs Europe’s Systems for the Document-Level Generation and Translation Task at WNGT 2019Europe’s Systems for the Document-Level Generation and Translation Task at WNGT 2019
Fahimeh Saleh | Alexandre Berard | Ioan Calapodescu | Laurent Besacier
Proceedings of the 3rd Workshop on Neural Generation and Translation

Recently, neural models led to significant improvements in both machine translation (MT) and natural language generation tasks (NLG). However, generation of long descriptive summaries conditioned on structured data remains an open challenge. Likewise, MT that goes beyond sentence-level context is still an open issue (e.g., document-level MT or MT with metadata). To address these challenges, we propose to leverage data from both tasks and do transfer learning between MT, NLG, and MT with source-side metadata (MT+NLG). First, we train document-based MT systems with large amounts of parallel data. Then, we adapt these models to pure NLG and MT+NLG tasks by fine-tuning with smaller amounts of domain-specific data. This end-to-end NLG approach, without data selection and planning, outperforms the previous state of the art on the Rotowire NLG task. We participated to the Document Generation and Translation task at WNGT 2019, and ranked first in all tracks.

pdf bib
The LIG system for the English-Czech Text Translation Task of IWSLT 2019LIG system for the English-Czech Text Translation Task of IWSLT 2019
Loïc Vial | Benjamin Lecouteux | Didier Schwab | Hang Le | Laurent Besacier
Proceedings of the 16th International Conference on Spoken Language Translation

In this paper, we present our submission for the English to Czech Text Translation Task of IWSLT 2019. Our system aims to study how pre-trained language models, used as input embeddings, can improve a specialized machine translation system trained on few data. Therefore, we implemented a Transformer-based encoder-decoder neural system which is able to use the output of a pre-trained language model as input embeddings, and we compared its performance under three configurations : 1) without any pre-trained language model (constrained), 2) using a language model trained on the monolingual parts of the allowed English-Czech data (constrained), and 3) using a language model trained on a large quantity of external monolingual data (unconstrained). We used BERT as external pre-trained language model (configuration 3), and BERT architecture for training our own language model (configuration 2). Regarding the training data, we trained our MT system on a small quantity of parallel text : one set only consists of the provided MuST-C corpus, and the other set consists of the MuST-C corpus and the News Commentary corpus from WMT. We observed that using the external pre-trained BERT improves the scores of our system by +0.8 to +1.5 of BLEU on our development set, and +0.97 to +1.94 of BLEU on the test set. However, using our own language model trained only on the allowed parallel data seems to improve the machine translation performances only when the system is trained on the smallest dataset.

pdf bib
Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech
William N. Havard | Jean-Pierre Chevrot | Laurent Besacier
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

In this paper, we study how word-like units are represented and activated in a recurrent neural model of visually grounded speech. The model used in our experiments is trained to project an image and its spoken description in a common representation space. We show that a recurrent model trained on spoken sentences implicitly segments its input into word-like units and reliably maps them to their correct visual referents. We introduce a methodology originating from linguistics to analyse the representation learned by neural networks the gating paradigm and show that the correct representation of a word is only activated if the network has access to first phoneme of the target word, suggesting that the network does not rely on a global acoustic pattern. Furthermore, we find out that not all speech frames (MFCC vectors in our case) play an equal role in the final encoded representation of a given word, but that some frames have a crucial effect on it. Finally we suggest that word representation could be activated through a process of lexical competition.

2018

pdf bib
Adaptor Grammars for the Linguist : Word Segmentation Experiments for Very Low-Resource LanguagesAdaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages
Pierre Godard | Laurent Besacier | François Yvon | Martine Adda-Decker | Gilles Adda | Hélène Maynard | Annie Rialland
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

Computational Language Documentation attempts to make the most recent research in speech and language technologies available to linguists working on language preservation and documentation. In this paper, we pursue two main goals along these lines. The first is to improve upon a strong baseline for the unsupervised word discovery task on two very low-resource Bantu languages, taking advantage of the expertise of linguists on these particular languages. The second consists in exploring the Adaptor Grammar framework as a decision and prediction tool for linguists studying a new language. We experiment 162 grammar configurations for each language and show that using Adaptor Grammars for word segmentation enables us to test hypotheses about a language. Specializing a generic grammar with language specific knowledge leads to great improvements for the word discovery task, ultimately achieving a leap of about 30 % token F-score from the results of a strong baseline.

2017

pdf bib
Deep Investigation of Cross-Language Plagiarism Detection Methods
Jérémy Ferrero | Laurent Besacier | Didier Schwab | Frédéric Agnès
Proceedings of the 10th Workshop on Building and Using Comparable Corpora

This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.

pdf bib
Amharic-English Speech Translation in Tourism DomainAmharic-English Speech Translation in Tourism Domain
Michael Melese | Laurent Besacier | Million Meshesha
Proceedings of the Workshop on Speech-Centric Natural Language Processing

This paper describes speech translation from Amharic-to-English, particularly Automatic Speech Recognition (ASR) with post-editing feature and Amharic-English Statistical Machine Translation (SMT). ASR experiment is conducted using morpheme language model (LM) and phoneme acoustic model(AM). Likewise, SMT conducted using word and morpheme as unit. Morpheme based translation shows a 6.29 BLEU score at a 76.4 % of recognition accuracy while word based translation shows a 12.83 BLEU score using 77.4 % word recognition accuracy. Further, after post-edit on Amharic ASR using corpus based n-gram, the word recognition accuracy increased by 1.42 %. Since post-edit approach reduces error propagation, the word based translation accuracy improved by 0.25 (1.95 %) BLEU score. We are now working towards further improving propagated errors through different algorithms at each unit of speech translation cascading component.

pdf bib
CompiLIG at SemEval-2017 Task 1 : Cross-Language Plagiarism Detection Methods for Semantic Textual SimilarityCompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity
Jérémy Ferrero | Laurent Besacier | Didier Schwab | Frédéric Agnès
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1st on track 4a with a correlation of 83.02 % with human annotations.

pdf bib
Using Word Embedding for Cross-Language Plagiarism Detection
Jérémy Ferrero | Laurent Besacier | Didier Schwab | Frédéric Agnès
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following : (a) we introduce new cross-language similarity detection methods based on distributed representation of words ; (b) we combine the different methods proposed to verify their complementarity and finally obtain an overall F1 score of 89.15 % for English-French similarity detection at chunk level (88.5 % at sentence level) on a very challenging corpus.