Chloé Braud


2021

pdf bib
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)
Amir Zeldes | Yang Janet Liu | Mikel Iruskieta | Philippe Muller | Chloé Braud | Sonia Badene
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)

pdf bib
The DISRPT 2021 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation ClassificationDISRPT 2021 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification
Amir Zeldes | Yang Janet Liu | Mikel Iruskieta | Philippe Muller | Chloé Braud | Sonia Badene
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)

In 2021, we organized the second iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms : the DISRPT Shared Task (Discourse Relation Parsing and Treebanking). Adding to the 2019 tasks on Elementary Discourse Unit Segmentation and Connective Detection, this iteration of the Shared Task included for the first time a track on discourse relation classification across three formalisms : RST, SDRT, and PDTB. In this paper we review the data included in the Shared Task, which covers nearly 3 million manually annotated tokens from 16 datasets in 11 languages, survey and compare submitted systems and report on system performance on each task for both annotated and plain-tokenized versions of the data.

pdf bib
Multi-lingual Discourse Segmentation and Connective Identification : MELODI at Disrpt2021MELODI at Disrpt2021
Morteza Kamaladdini Ezzabady | Philippe Muller | Chloé Braud
Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021)

We present an approach for discourse segmentation and discourse connective identification, both at the sentence and document level, within the Disrpt 2021 shared task, a multi-lingual and multi-formalism evaluation campaign. Building on the most successful architecture from the 2019 similar shared task, we leverage datasets in the same or similar languages to augment training data and improve on the best systems from the previous campaign on 3 out of 4 subtasks, with a mean improvement on all 16 datasets of 0.85 %. Within the Disrpt 21 campaign the system ranks 3rd overall, very close to the 2nd system, but with a significant gap with respect to the best system, which uses a rich set of additional features. The system is nonetheless the best on languages that benefited from crosslingual training on sentence internal segmentation (German and Spanish).

pdf bib
Proceedings of the 2nd Workshop on Computational Approaches to Discourse
Chloé Braud | Christian Hardmeier | Junyi Jessy Li | Annie Louis | Michael Strube | Amir Zeldes
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

2020

pdf bib
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole
Christophe Benzitoun | Chloé Braud | Laurine Huber | David Langlois | Slim Ouni | Sylvain Pogodalla | Stéphane Schneider
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d'Études sur la Parole

pdf bib
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles
Christophe Benzitoun | Chloé Braud | Laurine Huber | David Langlois | Slim Ouni | Sylvain Pogodalla | Stéphane Schneider
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

pdf bib
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 3 : Rencontre des Étudiants Chercheurs en Informatique pour le TAL
Christophe Benzitoun | Chloé Braud | Laurine Huber | David Langlois | Slim Ouni | Sylvain Pogodalla | Stéphane Schneider
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 3 : Rencontre des Étudiants Chercheurs en Informatique pour le TAL

pdf bib
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux
Christophe Benzitoun | Chloé Braud | Laurine Huber | David Langlois | Slim Ouni | Sylvain Pogodalla | Stéphane Schneider
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux

pdf bib
Proceedings of the First Workshop on Computational Approaches to Discourse
Chloé Braud | Christian Hardmeier | Junyi Jessy Li | Annie Louis | Michael Strube
Proceedings of the First Workshop on Computational Approaches to Discourse

2019

pdf bib
EusDisParser : improving an under-resourced discourse parser with cross-lingual dataEusDisParser: improving an under-resourced discourse parser with cross-lingual data
Mikel Iruskieta | Chloé Braud
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

Development of discourse parsers to annotate the relational discourse structure of a text is crucial for many downstream tasks. However, most of the existing work focuses on English, assuming a quite large dataset. Discourse data have been annotated for Basque, but training a system on these data is challenging since the corpus is very small. In this paper, we create the first demonstrator based on RST for Basque, and we investigate the use of data in another language to improve the performance of a Basque discourse parser. More precisely, we build a monolingual system using the small set of data available and investigate the use of multilingual word embeddings to train a system for Basque using data annotated for another language. We found that our approach to building a system limited to the small set of data available for Basque allowed us to get an improvement over previous approaches making use of many data annotated in other languages. At best, we get 34.78 in F1 for the full discourse structure. More data annotation is necessary in order to improve the results obtained with these techniques. We also describe which relations match with the gold standard, in order to understand these results.

pdf bib
Which aspects of discourse relations are hard to learn? Primitive decomposition for discourse relation classification
Charlotte Roze | Chloé Braud | Philippe Muller
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Discourse relation classification has proven to be a hard task, with rather low performance on several corpora that notably differ on the relation set they use. We propose to decompose the task into smaller, mostly binary tasks corresponding to various primitive concepts encoded into the discourse relation definitions. More precisely, we translate the discourse relations into a set of values for attributes based on distinctions used in the mappings between discourse frameworks proposed by Sanders et al. This arguably allows for a more robust representation of discourse relations, and enables us to address usually ignored aspects of discourse relation prediction, namely multiple labels and underspecified annotations. We show experimentally which of the conceptual primitives are harder to learn from the Penn Discourse Treebank English corpus, and propose a correspondence to predict the original labels, with preliminary empirical comparisons with a direct model.

2017

pdf bib
Cross-lingual and cross-domain discourse segmentation of entire documents
Chloé Braud | Ophélie Lacroix | Anders Søgaard
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Discourse segmentation is a crucial step in building end-to-end discourse parsers. However, discourse segmenters only exist for a few languages and domains. Typically they only detect intra-sentential segment boundaries, assuming gold standard sentence and token segmentation, and relying on high-quality syntactic parses and rich heuristics that are not generally available across languages and domains. In this paper, we propose statistical discourse segmenters for five languages and three domains that do not rely on gold pre-annotations. We also consider the problem of learning discourse segmenters when no labeled data is available for a language. Our fully supervised system obtains 89.5 % F1 for English newswire, with slight drops in performance on other domains, and we report supervised and unsupervised (cross-lingual) results for five languages in total.

pdf bib
Does syntax help discourse segmentation? Not so much
Chloé Braud | Ophélie Lacroix | Anders Søgaard
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Discourse segmentation is the first step in building discourse parsers. Most work on discourse segmentation does not scale to real-world discourse parsing across languages, for two reasons : (i) models rely on constituent trees, and (ii) experiments have relied on gold standard identification of sentence and token boundaries. We therefore investigate to what extent constituents can be replaced with universal dependencies, or left out completely, as well as how state-of-the-art segmenters fare in the absence of sentence boundaries. Our results show that dependency information is less useful than expected, but we provide a fully scalable, robust model that only relies on part-of-speech information, and show that it performs well across languages in the absence of any gold-standard annotation.