Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Dimitra Gkatzia, Djamé Seddah (Editors)

Association for Computational Linguistics
Dimitra Gkatzia | Djamé Seddah

Using and comparing Rhetorical Structure Theory parsers with rst-workbenchRhetorical Structure Theory parsers with rst-workbench
Arne Neumann

I present rst-workbench, a software package that simplifies the installation and usage of numerous end-to-end Rhetorical Structure Theory (RST) parsers. The tool offers a web-based interface that allows users to enter text and let multiple RST parsers generate analyses concurrently. The resulting RST trees can be compared visually, manually post-edited (in the browser) and stored for later usage.

MATILDA-Multi-AnnoTator multi-language InteractiveLight-weight Dialogue AnnotatorMATILDA - Multi-AnnoTator multi-language InteractiveLight-weight Dialogue Annotator
Davide Cucurnia | Nikolai Rozanov | Irene Sucameli | Augusto Ciuffoletti | Maria Simi

Dialogue Systems are becoming ubiquitous in various forms and shapes-virtual assistants(Siri, Alexa, etc.), chat-bots, customer sup-port, chit-chat systems just to name a few. The advances in language models and their publication have democratised advanced NLP.However, data remains a crucial bottleneck. Our contribution to this essential pillar isMATILDA, to the best of our knowledge the first multi-annotator, multi-language dialogue annotation tool. MATILDA allows the creation of corpora, the management of users, the annotation of dialogues, the quick adaptation of the user interface to any language and the resolution of inter-annotator disagreement. We evaluate the tool on ease of use, annotation speed and interannotation resolution for both experts and novices and conclude that this tool not only supports the full pipeline for dialogue annotation, but also allows non-technical people to easily use it. We are completely open-sourcing the tool at and provide a tutorial video1.

Forum 4.0 : An Open-Source User Comment Analysis Framework
Marlo Haering | Jakob Smedegaard Andersen | Chris Biemann | Wiebke Loosen | Benjamin Milde | Tim Pietz | Christian Stöcker | Gregor Wiedemann | Olaf Zukunft | Walid Maalej

With the increasing number of user comments in diverse domains, including comments on online journalism and e-commerce websites, the manual content analysis of these comments becomes time-consuming and challenging. However, research showed that user comments contain useful information for different domain experts, which is thus worth finding and utilizing. This paper introduces Forum 4.0, an open-source framework to semi-automatically analyze, aggregate, and visualize user comments based on labels defined by domain experts. We demonstrate the applicability of Forum 4.0 with comments analytics scenarios within the domains of online journalism and app stores. We outline the underlying container architecture, including the web-based user interface, the machine learning component, and the task manager for time-consuming tasks. We finally conduct machine learning experiments with simulated annotations and different sampling strategies on existing datasets from both domains to evaluate Forum 4.0’s performance. Forum 4.0 achieves promising classification results (ROC-AUC 0.9 with 100 annotated samples), utilizing transformer-based embeddings with a lightweight logistic regression model. We explain how Forum 4.0’s architecture is applicable for millions of user comments in real-time, yet at feasible training and classification costs.

SLTEV : Comprehensive Evaluation of Spoken Language TranslationSLTEV: Comprehensive Evaluation of Spoken Language Translation
Ebrahim Ansari | Ondřej Bojar | Barry Haddow | Mohammad Mahmoudi

Automatic evaluation of Machine Translation (MT) quality has been investigated over several decades. Spoken Language Translation (SLT), esp. when simultaneous, needs to consider additional criteria and does not have a standard evaluation procedure and a widely used toolkit. To fill the gap, we develop SLTev, an open-source tool for assessing SLT in a comprehensive way. SLTev reports the quality, latency, and stability of an SLT candidate output based on the time-stamped transcript and reference translation into a target language. For quality, we rely on sacreBLEU which provides MT evaluation measures such as chrF or BLEU. For latency, we propose two new scoring techniques. For stability, we extend the previously defined measures with a normalized Flicker in our work. We also propose a new averaging of older measures. A preliminary version of SLTev was used in the IWSLT 2020 shared task. Moreover, a growing collection of test datasets directly accessible by SLTev are provided for system evaluation comparable across papers.

A Dashboard for Mitigating the COVID-19 MisinfodemicCOVID-19 Misinfodemic
Zhengyuan Zhu | Kevin Meng | Josue Caraballo | Israa Jaradat | Xiao Shi | Zeyu Zhang | Farahnaz Akrami | Haojin Liao | Fatma Arslan | Damian Jimenez | Mohanmmed Samiul Saeef | Paras Pathak | Chengkai Li

This paper describes the current milestones achieved in our ongoing project that aims to understand the surveillance of, impact of and intervention on COVID-19 misinfodemic on Twitter. Specifically, it introduces a public dashboard which, in addition to displaying case counts in an interactive map and a navigational panel, also provides some unique features not found in other places. Particularly, the dashboard uses a curated catalog of COVID-19 related facts and debunks of misinformation, and it displays the most prevalent information from the catalog among Twitter users in user-selected U.S. geographic regions. The paper explains how to use BERT models to match tweets with the facts and misinformation and to detect their stance towards such information. The paper also discusses the results of preliminary experiments on analyzing the spatio-temporal spread of misinformation.

A description and demonstration of SAFAR frameworkSAFAR framework
Karim Bouzoubaa | Younes Jaafar | Driss Namly | Ridouane Tachicart | Rachida Tajmout | Hakima Khamar | Hamid Jaafar | Lhoussain Aouragh | Abdellah Yousfi

Several tools and resources have been developed to deal with Arabic NLP. However, a homogenous and flexible Arabic environment that gathers these components is rarely available. In this perspective, we introduce SAFAR which is a monolingual framework developed in accordance with software engineering requirements and dedicated to Arabic language, especially, the modern standard Arabic and Moroccan dialect. After one decade of integration and development, SAFAR possesses today more than 50 tools and resources that can be exploited either using its API or using its web interface.

LOME : Large Ontology Multilingual ExtractionLOME: Large Ontology Multilingual Extraction
Patrick Xia | Guanghui Qin | Siddharth Vashishtha | Yunmo Chen | Tongfei Chen | Chandler May | Craig Harman | Kyle Rawlins | Aaron Steven White | Benjamin Van Durme

We present LOME, a system for performing multilingual information extraction. Given a text document as input, our core system identifies spans of textual entity and event mentions with a FrameNet (Baker et al., 1998) parser. It subsequently performs coreference resolution, fine-grained entity typing, and temporal relation prediction between events. By doing so, the system constructs an event and entity focused knowledge graph. We can further apply third-party modules for other types of annotation, like relation extraction. Our (multilingual) first-party modules either outperform or are competitive with the (monolingual) state-of-the-art. We achieve this through the use of multilingual encoders like XLM-R (Conneau et al., 2020) and leveraging multilingual training data. LOME is available as a Docker container on Docker Hub. In addition, a lightweight version of the system is accessible as a web demo.

Graph Matching and Graph Rewriting : GREW tools for corpus exploration, maintenance and conversionGREW tools for corpus exploration, maintenance and conversion
Bruno Guillaume

This article presents a set of tools built around the Graph Rewriting computational framework which can be used to compute complex rule-based transformations on linguistic structures. Application of the graph matching mechanism for corpus exploration, error mining or quantitative typology are also given.

Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLPMaChAmp): A Toolkit for Multi-task Learning in NLP
Rob van der Goot | Ahmet Üstün | Alan Ramponi | Ibrahim Sharaf | Barbara Plank

Transfer learning, particularly approaches that combine multi-task learning with pre-trained contextualized embeddings and fine-tuning, have advanced the field of Natural Language Processing tremendously in recent years. In this paper we present MaChAmp, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings. The benefits of MaChAmp are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit, from text classification and sequence labeling to dependency parsing, masked language modeling, and text generation.

European Language Grid : A Joint Platform for the European Language Technology CommunityEuropean Language Grid: A Joint Platform for the European Language Technology Community
Georg Rehm | Stelios Piperidis | Kalina Bontcheva | Jan Hajic | Victoria Arranz | Andrejs Vasiļjevs | Gerhard Backfried | Jose Manuel Gomez-Perez | Ulrich Germann | Rémi Calizzano | Nils Feldhus | Stefanie Hegele | Florian Kintzel | Katrin Marheinecke | Julian Moreno-Schneider | Dimitris Galanis | Penny Labropoulou | Miltos Deligiannis | Katerina Gkirtzou | Athanasia Kolovou | Dimitris Gkoumas | Leon Voukoutis | Ian Roberts | Jana Hamrlova | Dusan Varis | Lukas Kacena | Khalid Choukri | Valérie Mapelli | Mickaël Rigault | Julija Melnika | Miro Janosik | Katja Prinz | Andres Garcia-Silva | Cristian Berrio | Ondrej Klejch | Steve Renals

Europe is a multilingual society, in which dozens of languages are spoken. The only option to enable and to benefit from multilingualism is through Language Technologies (LT), i.e., Natural Language Processing and Speech Technologies. We describe the European Language Grid (ELG), which is targeted to evolve into the primary platform and marketplace for LT in Europe by providing one umbrella platform for the European LT landscape, including research and industry, enabling all stakeholders to upload, share and distribute their services, products and resources. At the end of our EU project, which will establish a legal entity in 2022, the ELG will provide access to approx. 1300 services for all European languages as well as thousands of data sets.

A New Surprise Measure for Extracting Interesting Relationships between Persons
Hidetaka Kamigaito | Jingun Kwon | Young-In Song | Manabu Okumura

One way to enhance user engagement in search engines is to suggest interesting facts to the user. Although relationships between persons are important as a target for text mining, there are few effective approaches for extracting the interesting relationships between persons. We therefore propose a method for extracting interesting relationships between persons from natural language texts by focusing on their surprisingness. Our method first extracts all personal relationships from dependency trees for the texts and then calculates surprise scores for distributed representations of the extracted relationships in an unsupervised manner. The unique point of our method is that it does not require any labeled dataset with annotation for the surprising personal relationships. The results of the human evaluation show that the proposed method could extract more interesting relationships between persons from Japanese Wikipedia articles than a popularity-based baseline method. We demonstrate our proposed method as a chrome plugin on google search.

Paladin : an annotation tool based on active and proactive learning
Minh-Quoc Nghiem | Paul Baylis | Sophia Ananiadou

In this paper, we present Paladin, an open-source web-based annotation tool for creating high-quality multi-label document-level datasets. By integrating active learning and proactive learning to the annotation task, Paladin makes the task less time-consuming and requiring less human effort. Although Paladin is designed for multi-label settings, the system is flexible and can be adapted to other tasks in single-label settings.

Story Centaur : Large Language Model Few Shot Learning as a Creative Writing Tool
Ben Swanson | Kory Mathewson | Ben Pietrzak | Sherol Chen | Monica Dinalescu

Few shot learning with large language models has the potential to give individuals without formal machine learning training the access to a wide range of text to text models. We consider how this applies to creative writers and present Story Centaur, a user interface for prototyping few shot models and a set of recombinable web components that deploy them. Story Centaur’s goal is to expose creative writers to few shot learning with a simple but powerful interface that lets them compose their own co-creation tools that further their own unique artistic directions. We build out several examples of such tools, and in the process probe the boundaries and issues surrounding generation with large language models.

OCTIS : Comparing and Optimizing Topic models is Simple !OCTIS: Comparing and Optimizing Topic models is Simple!
Silvia Terragni | Elisabetta Fersini | Bruno Giovanni Galuzzi | Pietro Tropeano | Antonio Candelieri

In this paper, we present OCTIS, a framework for training, analyzing, and comparing Topic Models, whose optimal hyper-parameters are estimated using a Bayesian Optimization approach. The proposed solution integrates several state-of-the-art topic models and evaluation metrics. These metrics can be targeted as objective by the underlying optimization procedure to determine the best hyper-parameter configuration. OCTIS allows researchers and practitioners to have a fair comparison between topic models of interest, using several benchmark datasets and well-known evaluation metrics, to integrate novel algorithms, and to have an interactive visualization of the results for understanding the behavior of each model. The code is available at the following link :

Breaking Writer’s Block : Low-cost Fine-tuning of Natural Language Generation Models
Alexandre Duval | Thomas Lamson | Gaël de Léséleuc de Kérouara | Matthias Gallé

It is standard procedure these days to solve Information Extraction task by fine-tuning large pre-trained language models. This is not the case for generation task, which relies on a variety of techniques for controlled language generation. In this paper, we describe a system that fine-tunes a natural language generation model for the problem of solving writer’s block. The fine-tuning changes the conditioning to also include the right context in addition to the left context, as well as an optional list of entities, the size, the genre and a summary of the paragraph that the human author wishes to generate. Our proposed fine-tuning obtains excellent results, even with a small number of epochs and a total cost of USD 150. The system can be accessed as a web-service and all the code is released. A video showcasing the interface and the model is also available.

Domain Expert Platform for Goal-Oriented Dialog Collection
Didzis Goško | Arturs Znotins | Inguna Skadina | Normunds Gruzitis | Gunta Nešpore-Bērzkalne

Today, most dialogue systems are fully or partly built using neural network architectures. A crucial prerequisite for the creation of a goal-oriented neural network dialogue system is a dataset that represents typical dialogue scenarios and includes various semantic annotations, e.g. intents, slots and dialogue actions, that are necessary for training a particular neural network architecture. In this demonstration paper, we present an easy to use interface and its back-end which is oriented to domain experts for the collection of goal-oriented dialogue samples. The platform not only allows to collect or write sample dialogues in a structured way, but also provides a means for simple annotation and interpretation of the dialogues. The platform itself is language-independent ; it depends only on the availability of particular language processing components for a specific language. It is currently being used to collect dialogue samples in Latvian (a highly inflected language) which represent typical communication between students and the student service.

Which is Better for Deep Learning : Python or MATLAB? Answering Comparative Questions in Natural LanguageMATLAB? Answering Comparative Questions in Natural Language
Viktoriia Chekalina | Alexander Bondarenko | Chris Biemann | Meriem Beloucif | Varvara Logacheva | Alexander Panchenko

We present a system for answering comparative questions (Is X better than Y with respect to Z?) in natural language. Answering such questions is important for assisting humans in making informed decisions. The key component of our system is a natural language interface for comparative QA that can be used in personal assistants, chatbots, and similar NLP devices. Comparative QA is a challenging NLP task, since it requires collecting support evidence from many different sources, and direct comparisons of rare objects may be not available even on the entire Web. We take the first step towards a solution for such a task offering a testbed for comparative QA in natural language by probing several methods, making the three best ones available as an online demo.