André Freitas


2021

pdf bib
What is SemEval evaluating? A Systematic Analysis of Evaluation Campaigns in NLPSemEval evaluating? A Systematic Analysis of Evaluation Campaigns in NLP
Oskar Wysocki | Malina Florea | Dónal Landers | André Freitas
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems

SemEval is the primary venue in the NLP community for the proposal of new challenges and for the systematic empirical evaluation of NLP systems. This paper provides a systematic quantitative analysis of SemEval aiming to evidence the patterns of the contributions behind SemEval. By understanding the distribution of task types, metrics, architectures, participation and citations over time we aim to answer the question on what is being evaluated by SemEval.

pdf bib
Switching Contexts : Transportability Measures for NLPNLP
Guy Marshall | Mokanarangan Thayaparan | Philip Osborne | André Freitas
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

This paper explores the topic of transportability, as a sub-area of generalisability. By proposing the utilisation of metrics based on well-established statistics, we are able to estimate the change in performance of NLP models in new contexts. Defining a new measure for transportability may allow for better estimation of NLP system performance in new domains, and is crucial when assessing the performance of NLP systems in new tasks and domains. Through several instances of increasing complexity, we demonstrate how lightweight domain similarity measures can be used as estimators for the transportability in NLP applications. The proposed transportability measures are evaluated in the context of Named Entity Recognition and Natural Language Inference tasks.

2020

pdf bib
Premise Selection in Natural Language Mathematical Texts
Deborah Ferreira | André Freitas
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The discovery of supporting evidence for addressing complex mathematical problems is a semantically challenging task, which is still unexplored in the field of natural language processing for mathematical text. The natural language premise selection task consists in using conjectures written in both natural language and mathematical formulae to recommend premises that most likely will be useful to prove a particular statement. We propose an approach to solve this task as a link prediction problem, using Deep Convolutional Graph Neural Networks. This paper also analyses how different baselines perform in this task and shows that a graph structure can provide higher F1-score, especially when considering multi-hop premise selection.

2019

pdf bib
Identifying and Explaining Discriminative Attributes
Armins Stepanjans | André Freitas
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Identifying what is at the center of the meaning of a word and what discriminates it from other words is a fundamental natural language inference task. This paper describes an explicit word vector representation model (WVM) to support the identification of discriminative attributes. A core contribution of the paper is a quantitative and qualitative comparative analysis of different types of data sources and Knowledge Bases in the construction of explainable and explicit WVMs : (i) knowledge graphs built from dictionary definitions, (ii) entity-attribute-relationships graphs derived from images and (iii) commonsense knowledge graphs. Using a detailed quantitative and qualitative analysis, we demonstrate that these data sources have complementary semantic aspects, supporting the creation of explicit semantic vector spaces. The explicit vector spaces are evaluated using the task of discriminative attribute identification, showing comparable performance to the state-of-the-art systems in the task (F1-score = 0.69), while delivering full model transparency and explainability.

pdf bib
Identifying Supporting Facts for Multi-hop Question Answering with Document Graph Networks
Mokanarangan Thayaparan | Marco Valentino | Viktor Schlegel | André Freitas
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

Recent advances in reading comprehension have resulted in models that surpass human performance when the answer is contained in a single, continuous passage of text. However, complex Question Answering (QA) typically requires multi-hop reasoning-i.e. the integration of supporting facts from different sources, to infer the correct answer. This paper proposes Document Graph Network (DGN), a message passing architecture for the identification of supporting facts over a graph-structured representation of text. The evaluation on HotpotQA shows that DGN obtains competitive results when compared to a reading comprehension baseline operating on raw text, confirming the relevance of structured representations for supporting multi-hop reasoning.

pdf bib
DBee : A Database for Creating and Managing Knowledge Graphs and EmbeddingsDBee: A Database for Creating and Managing Knowledge Graphs and Embeddings
Viktor Schlegel | André Freitas
Proceedings of the Thirteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-13)

This paper describes DBee, a database to support the construction of data-intensive AI applications. DBee provides a unique data model which operates jointly over large-scale knowledge graphs (KGs) and embedding vector spaces (VSs). This model supports queries which exploit the semantic properties of both types of representations (KGs and VSs). Additionally, DBee aims to facilitate the construction of KGs and VSs, by providing a library of generators, which can be used to create, integrate and transform data into KGs and VSs.

pdf bib
MinWikiSplit : A Sentence Splitting Corpus with Minimal PropositionsMinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions
Christina Niklaus | André Freitas | Siegfried Handschuh
Proceedings of the 12th International Conference on Natural Language Generation

We compiled a new sentence splitting corpus that is composed of 203 K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that can not be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance.

pdf bib
DisSim : A Discourse-Aware Syntactic Text Simplification Framework for English and GermanDisSim: A Discourse-Aware Syntactic Text Simplification Framework for English and German
Christina Niklaus | Matthias Cetto | André Freitas | Siegfried Handschuh
Proceedings of the 12th International Conference on Natural Language Generation

We introduce DisSim, a discourse-aware sentence splitting framework for English and German whose goal is to transform syntactically complex sentences into an intermediate representation that presents a simple and more regular structure which is easier to process for downstream semantic applications. For this purpose, we turn input sentences into a two-layered semantic hierarchy in the form of core facts and accompanying contexts, while identifying the rhetorical relations that hold between them. In that way, we preserve the coherence structure of the input and, hence, its interpretability for downstream tasks.

2018

pdf bib
A Survey on Open Information Extraction
Christina Niklaus | Matthias Cetto | André Freitas | Siegfried Handschuh
Proceedings of the 27th International Conference on Computational Linguistics

We provide a detailed overview of the various approaches that were proposed to date to solve the task of Open Information Extraction. We present the major challenges that such systems face, show the evolution of the suggested approaches over time and depict the specific issues they address. In addition, we provide a critique of the commonly applied evaluation procedures for assessing the performance of Open IE systems and highlight some directions for future work.

2017

pdf bib
SemEval-2017 Task 5 : Fine-Grained Sentiment Analysis on Financial Microblogs and NewsSemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
Keith Cortis | André Freitas | Tobias Daudert | Manuela Huerlimann | Manel Zarrouk | Siegfried Handschuh | Brian Davis
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper discusses the Fine-Grained Sentiment Analysis on Financial Microblogs and News task as part of SemEval-2017, specifically under the Detecting sentiment, humour, and truth theme. This task contains two tracks, where the first one concerns Microblog messages and the second one covers News Statements and Headlines. The main goal behind both tracks was to predict the sentiment score for each of the mentioned companies / stocks. The sentiment scores for each text instance adopted floating point values in the range of -1 (very negative / bearish) to 1 (very positive / bullish), with 0 designating neutral sentiment. This task attracted a total of 32 participants, with 25 participating in Track 1 and 29 in Track 2.

pdf bib
SemEval-2017 Task 11 : End-User Development using Natural LanguageSemEval-2017 Task 11: End-User Development using Natural Language
Juliano Sales | Siegfried Handschuh | André Freitas
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This task proposes a challenge to support the interaction between users and applications, micro-services and software APIs using natural language. The task aims for supporting the evaluation and evolution of the discussions surrounding the natural language processing approaches within the context of end-user natural language programming, under scenarios of high semantic heterogeneity / gap.