Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Qun Liu, David Schlangen (Editors)


Anthology ID:
2020.emnlp-demos
Month:
October
Year:
2020
Address:
Online
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2020.emnlp-demos
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Qun Liu | David Schlangen

pdf bib
BERTweet : A pre-trained language model for English TweetsBERTweet: A pre-trained language model for English Tweets
Dat Quoc Nguyen | Thanh Vu | Anh Tuan Nguyen

We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks : Part-of-speech tagging, Named-entity recognition and text classification. We release BERTweet under the MIT License to facilitate future research and applications on Tweet data. Our BERTweet is available at https://github.com/VinAIResearch/BERTweet

pdf bib
NeuralQA : A Usable Library for Question Answering (Contextual Query Expansion + BERT) on Large DatasetsNeuralQA: A Usable Library for Question Answering (Contextual Query Expansion + BERT) on Large Datasets
Victor Dibia

Existing tools for Question Answering (QA) have challenges that limit their use in practice. They can be complex to set up or integrate with existing infrastructure, do not offer configurable interactive interfaces, and do not cover the full set of subtasks that frequently comprise the QA pipeline (query expansion, retrieval, reading, and explanation / sensemaking). To help address these issues, we introduce NeuralQA-a usable library for QA on large datasets. NeuralQA integrates well with existing infrastructure (e.g., ElasticSearch instances and reader models trained with the HuggingFace Transformers API) and offers helpful defaults for QA subtasks. It introduces and implements contextual query expansion (CQE) using a masked language model (MLM) as well as relevant snippets (RelSnip)-a method for condensing large documents into smaller passages that can be speedily processed by a document reader model. Finally, it offers a flexible user interface to support workflows for research explorations (e.g., visualization of gradient-based explanations to support qualitative inspection of model behaviour) and large scale search deployment. Code and documentation for NeuralQA is available as open source on Github.RelSnip) - a method for condensing large documents into smaller passages that can be speedily processed by a document reader model. Finally, it offers a flexible user interface to support workflows for research explorations (e.g., visualization of gradient-based explanations to support qualitative inspection of model behaviour) and large scale search deployment. Code and documentation for NeuralQA is available as open source on Github.

pdf bib
DeezyMatch : A Flexible Deep Learning Approach to Fuzzy String MatchingDeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching
Kasra Hosseini | Federico Nanni | Mariona Coll Ardanuy

We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach is especially useful where only limited training examples are available. The learned DeezyMatch models can be used to generate rich vector representations from string inputs. The candidate ranker component in DeezyMatch uses these vector representations to find, for a given query, the best matching candidates in a knowledge base. It uses an adaptive searching algorithm applicable to large knowledge bases and query sets. We describe DeezyMatch’s functionality, design and implementation, accompanied by a use case in toponym matching and candidate ranking in realistic noisy datasets.

pdf bib
InVeRo : Making Semantic Role Labeling Accessible with Intelligible Verbs and RolesInVeRo: Making Semantic Role Labeling Accessible with Intelligible Verbs and Roles
Simone Conia | Fabrizio Brignone | Davide Zanfardino | Roberto Navigli

Semantic Role Labeling (SRL) is deeply dependent on complex linguistic resources and sophisticated neural models, which makes the task difficult to approach for non-experts. To address this issue we present a new platform named Intelligible Verbs and Roles (InVeRo). This platform provides access to a new verb resource, VerbAtlas, and a state-of-the-art pretrained implementation of a neural, span-based architecture for SRL. Both the resource and the system provide human-readable verb sense and semantic role information, with an easy to use Web interface and RESTful APIs available at http://nlp.uniroma1.it/invero.

pdf bib
Youling : an AI-assisted Lyrics Creation SystemAI-assisted Lyrics Creation System
Rongsheng Zhang | Xiaoxi Mao | Le Li | Lin Jiang | Lin Chen | Zhiwei Hu | Yadong Xi | Changjie Fan | Minlie Huang

Recently, a variety of neural models have been proposed for lyrics generation. However, most previous work completes the generation process in a single pass with little human intervention. We believe that lyrics creation is a creative process with human intelligence centered. AI should play a role as an assistant in the lyrics creation process, where human interactions are crucial for high-quality creation. This paper demonstrates Youling, an AI-assisted lyrics creation system, designed to collaborate with music creators. In the lyrics generation process, Youling supports traditional one pass full-text generation mode as well as an interactive generation mode, which allows users to select the satisfactory sentences from generated candidates conditioned on preceding context. The system also provides a revision module which enables users to revise undesired sentences or words of lyrics repeatedly. Besides, Youling allows users to use multifaceted attributes to control the content and format of generated lyrics. The demo video of the system is available at https://youtu.be/DFeNpHk0pm4.Youling, an AI-assisted lyrics creation system, designed to collaborate with music creators. In the lyrics generation process, Youling supports traditional one pass full-text generation mode as well as an interactive generation mode, which allows users to select the satisfactory sentences from generated candidates conditioned on preceding context. The system also provides a revision module which enables users to revise undesired sentences or words of lyrics repeatedly. Besides, Youling allows users to use multifaceted attributes to control the content and format of generated lyrics. The demo video of the system is available at https://youtu.be/DFeNpHk0pm4.

pdf bib
A Technical Question Answering System with Transfer Learning
Wenhao Yu | Lingfei Wu | Yu Deng | Ruchi Mahindru | Qingkai Zeng | Sinem Guven | Meng Jiang

In recent years, the need for community technical question-answering sites has increased significantly. However, it is often expensive for human experts to provide timely and helpful responses on those forums. We develop TransTQA, which is a novel system that offers automatic responses by retrieving proper answers based on correctly answered similar questions in the past. TransTQA is built upon a siamese ALBERT network, which enables it to respond quickly and accurately. Furthermore, TransTQA adopts a standard deep transfer learning strategy to improve its capability of supporting multiple technical domains.

pdf bib
The Language Interpretability Tool : Extensible, Interactive Visualizations and Analysis for NLP ModelsNLP Models
Ian Tenney | James Wexler | Jasmijn Bastings | Tolga Bolukbasi | Andy Coenen | Sebastian Gehrmann | Ellen Jiang | Mahima Pushkarna | Carey Radebaugh | Emily Reif | Ann Yuan

We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior : Why did my model make this prediction? When does it perform poorly? What happens under a controlled change in the input? LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamlined, browser-based interface to enable rapid exploration and error analysis. We include case studies for a diverse set of workflows, including exploring counterfactuals for sentiment analysis, measuring gender bias in coreference systems, and exploring local behavior in text generation. LIT supports a wide range of modelsincluding classification, seq2seq, and structured predictionand is highly extensible through a declarative, framework-agnostic API. LIT is under active development, with code and full documentation available at https://github.com/pair-code/lit.

pdf bib
A Data-Centric Framework for Composable NLP WorkflowsNLP Workflows
Zhengzhong Liu | Guanxiong Ding | Avinash Bukkittu | Mansi Gupta | Pengzhi Gao | Atif Ahmed | Shikun Zhang | Xin Gao | Swapnil Singhavi | Linwei Li | Wei Wei | Zecong Hu | Haoran Shi | Xiaodan Liang | Teruko Mitamura | Eric Xing | Zhiting Hu

Empirical natural language processing (NLP) systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components, ranging from data ingestion, human annotation, to text retrieval, analysis, generation, and visualization. We establish a unified open-source framework to support fast development of such sophisticated NLP workflows in a composable manner. The framework introduces a uniform data representation to encode heterogeneous results by a wide range of NLP tasks. It offers a large repository of processors for NLP tasks, visualization, and annotation, which can be easily assembled with full interoperability under the unified representation. The highly extensible framework allows plugging in custom processors from external off-the-shelf NLP and deep learning libraries. The whole framework is delivered through two modularized yet integratable open-source projects, namely Forte (for workflow infrastructure and NLP function processors) and Stave (for user interaction, visualization, and annotation).

pdf bib
CoRefi : A Crowd Sourcing Suite for Coreference AnnotationCoRefi: A Crowd Sourcing Suite for Coreference Annotation
Ari Bornstein | Arie Cattan | Ido Dagan

Coreference annotation is an important, yet expensive and time consuming, task, which often involved expert annotators trained on complex decision guidelines. To enable cheaper and more efficient annotation, we present CoRefi, a web-based coreference annotation suite, oriented for crowdsourcing. Beyond the core coreference annotation tool, CoRefi provides guided onboarding for the task as well as a novel algorithm for a reviewing phase. CoRefi is open source and directly embeds into any website, including popular crowdsourcing platforms. CoRefi Demo : aka.ms/corefi Video Tour : aka.ms/corefivideo Github Repo : https://github.com/aribornstein/corefi