Agata Savary


2020

pdf bib
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
Stella Markantonatou | John McCrae | Jelena Mitrović | Carole Tiberius | Carlos Ramisch | Ashwini Vaidya | Petya Osenova | Agata Savary
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

pdf bib
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Agata Savary | Yue Zhang
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2019

pdf bib
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)
Agata Savary | Carla Parra Escartín | Francis Bond | Jelena Mitrović | Verginica Barbu Mititelu
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

pdf bib
Without lexicons, multiword expression identification will never fly : A position statement
Agata Savary | Silvio Cordeiro | Carlos Ramisch
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

Because most multiword expressions (MWEs), especially verbal ones, are semantically non-compositional, their automatic identification in running text is a prerequisite for semantically-oriented downstream applications. However, recent developments, driven notably by the PARSEME shared task on automatic identification of verbal MWEs, show that this task is harder than related tasks, despite recent contributions both in multilingual corpus annotation and in computational models. In this paper, we analyse possible reasons for this state of affairs. They lie in the nature of the MWE phenomenon, as well as in its distributional properties. We also offer a comparative analysis of the state-of-the-art systems, which exhibit particularly strong sensitivity to unseen data. On this basis, we claim that, in order to make strong headway in MWE identification, the community should bend its mind into coupling identification of MWEs with their discovery, via syntactic MWE lexicons. Such lexicons need not necessarily achieve a linguistically complete modelling of MWEs’ behavior, but they should provide minimal morphosyntactic information to cover some potential uses, so as to complement existing MWE-annotated corpora. We define requirements for such minimal NLP-oriented lexicon, and we propose a roadmap for the MWE community driven by these requirements.

2018

pdf bib
If you’ve seen some, you’ve seen them all : Identifying variants of multiword expressions
Caroline Pasquer | Agata Savary | Carlos Ramisch | Jean-Yves Antoine
Proceedings of the 27th International Conference on Computational Linguistics

Multiword expressions, especially verbal ones (VMWEs), show idiosyncratic variability, which is challenging for NLP applications, hence the need for VMWE identification. We focus on the task of variant identification, i.e. identifying variants of previously seen VMWEs, whatever their surface form. We model the problem as a classification task. Syntactic subtrees with previously seen combinations of lemmas are first extracted, and then classified on the basis of features relevant to morpho-syntactic variation of VMWEs. Feature values are both absolute, i.e. hold for a particular VMWE candidate, and relative, i.e. based on comparing a candidate with previously seen VMWEs. This approach outperforms a baseline by 4 percent points of F-measure on a French corpus.

pdf bib
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
Agata Savary | Carlos Ramisch | Jena D. Hwang | Nathan Schneider | Melanie Andresen | Sameer Pradhan | Miriam R. L. Petruck
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

pdf bib
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword ExpressionsPARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch | Silvio Ricardo Cordeiro | Agata Savary | Veronika Vincze | Verginica Barbu Mititelu | Archna Bhatia | Maja Buljan | Marie Candito | Polona Gantar | Voula Giouli | Tunga Güngör | Abdelati Hawwari | Uxoa Iñurrieta | Jolanta Kovalevskaitė | Simon Krek | Timm Lichte | Chaya Liebeskind | Johanna Monti | Carla Parra Escartín | Behrang QasemiZadeh | Renata Ramisch | Nathan Schneider | Ivelina Stoyanova | Ashwini Vaidya | Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.

pdf bib
VarIDE at PARSEME Shared Task 2018 : Are Variants Really as Alike as Two Peas in a Pod?VarIDE at PARSEME Shared Task 2018: Are Variants Really as Alike as Two Peas in a Pod?
Caroline Pasquer | Carlos Ramisch | Agata Savary | Jean-Yves Antoine
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

We describe the VarIDE system (standing for Variant IDEntification) which participated in the edition 1.1 of the PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs). Our system focuses on the task of VMWE variant identification by using morphosyntactic information in the training data to predict if candidates extracted from the test corpus could be idiomatic, thanks to a naive Bayes classifier. We report results for 19 languages.

pdf bib
Towards a Variability Measure for Multiword Expressions
Caroline Pasquer | Agata Savary | Jean-Yves Antoine | Carlos Ramisch
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

One of the most outstanding properties of multiword expressions (MWEs), especially verbal ones (VMWEs), important both in theoretical models and applications, is their idiosyncratic variability. Some MWEs are always continuous, while some others admit certain types of insertions. Components of some MWEs are rarely or never modified, while some others admit either specific or unrestricted modification. This unpredictable variability profile of MWEs hinders modeling and processing them as words-with-spaces on the one hand, and as regular syntactic structures on the other hand. Since variability of MWEs is a matter of scale rather than a binary property, we propose a 2-dimensional language-independent measure of variability dedicated to verbal MWEs based on syntactic and discontinuity-related clues. We assess its relevance with respect to a linguistic benchmark and its utility for the tasks of VMWE classification and variant identification on a French corpus.

2017

pdf bib
Projecting Multiword Expression Resources on a Polish TreebankPolish Treebank
Agata Savary | Jakub Waszczuk
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing

Multiword expressions (MWEs) are linguistic objects containing two or more words and showing idiosyncratic behavior at different levels. Treebanks with annotated MWEs enable studies of such properties, as well as training and evaluation of MWE-aware parsers. However, few treebanks contain full-fledged MWE annotations. We show how this gap can be bridged in Polish by projecting 3 MWE resources on a constituency treebank.

pdf bib
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Stella Markantonatou | Carlos Ramisch | Agata Savary | Veronika Vincze
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

pdf bib
The PARSEME Shared Task on Automatic Identification of Verbal Multiword ExpressionsPARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Agata Savary | Carlos Ramisch | Silvio Cordeiro | Federico Sangati | Veronika Vincze | Behrang QasemiZadeh | Marie Candito | Fabienne Cap | Voula Giouli | Ivelina Stoyanova | Antoine Doucet
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)

Multiword expressions (MWEs) are known as a pain in the neck for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one’s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as words with spaces. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.