Stergios Chatzikyriakidis


2021

pdf bib
NLI Data Sanity Check : Assessing the Effect of Data Corruption on Model PerformanceNLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance
Aarne Talman | Marianna Apidianaki | Stergios Chatzikyriakidis | Jörg Tiedemann
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

Pre-trained neural language models give high performance on natural language inference (NLI) tasks. But whether they actually understand the meaning of the processed sequences is still unclear. We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models’ meaning understanding capabilities. We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI), which involve removing entire word classes and often lead to non-sensical sentence pairs. If model accuracy on the corrupted data remains high, then the dataset is likely to contain statistical biases and artefacts that guide prediction. Inversely, a large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models’ reasoning capabilities. Hence, our proposed controls can serve as a crash test for developing high quality data for NLI tasks.

pdf bib
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)
Christine Howes | Simon Dobnik | Ellen Breitholtz | Stergios Chatzikyriakidis
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

pdf bib
Applied Temporal Analysis : A Complete Run of the FraCaS Test SuiteFraCaS Test Suite
Jean-Philippe Bernardy | Stergios Chatzikyriakidis
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

In this paper, we propose an implementation of temporal semantics that translates syntax trees to logical formulas, suitable for consumption by the Coq proof assistant. The analysis supports a wide range of phenomena including : temporal references, temporal adverbs, aspectual classes and progressives. The new semantics are built on top of a previous system handling all sections of the FraCaS test suite except the temporal reference section, and we obtain an accuracy of 81 percent overall and 73 percent for the problems explicitly marked as related to temporal reference. To the best of our knowledge, this is the best performance of a logical system on the whole of the FraCaS.

pdf bib
Can predicate-argument relationships be extracted from UD trees?UD trees?
Adam Ek | Jean-Philippe Bernardy | Stergios Chatzikyriakidis
Proceedings of The Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

In this paper we investigate the possibility of extracting predicate-argument relations from UD trees (and enhanced UD graphs). Con- cretely, we apply UD parsers on an En- glish question answering / semantic-role label- ing data set (FitzGerald et al., 2018) and check if the annotations reflect the relations in the resulting parse trees, using a small number of rules to extract this information. We find that 79.1 % of the argument-predicate pairs can be found in this way, on the basis of Ud- ify (Kondratyuk and Straka, 2019). Error anal- ysis reveals that half of the error cases are at- tributable to shortcomings in the dataset. The remaining errors are mostly due to predicate- argument relations not being extractible algo- rithmically from the UD trees (requiring se- mantic reasoning to be resolved). The parser itself is only responsible for a small portion of errors. Our analysis suggests a number of improvements to the UD annotation schema : we propose to enhance the schema in four ways, in order to capture argument-predicate relations. Additionally, we propose improve- ments regarding data collection for question answering / semantic-role labeling data.

2020

pdf bib
Proceedings of the Probability and Meaning Conference (PaM 2020)
Christine Howes | Stergios Chatzikyriakidis | Adam Ek | Vidya Somashekarappa
Proceedings of the Probability and Meaning Conference (PaM 2020)

2019

pdf bib
Bayesian Inference Semantics : A Modelling System and A Test SuiteBayesian Inference Semantics: A Modelling System and A Test Suite
Jean-Philippe Bernardy | Rasmus Blanck | Stergios Chatzikyriakidis | Shalom Lappin | Aleksandre Maskharashvili
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

We present BIS, a Bayesian Inference Semantics, for probabilistic reasoning in natural language. The current system is based on the framework of Bernardy et al. (2018), but departs from it in important respects. BIS makes use of Bayesian learning for inferring a hypothesis from premises. This involves estimating the probability of the hypothesis, given the data supplied by the premises of an argument. It uses a syntactic parser to generate typed syntactic structures that serve as input to a model generation system. Sentences are interpreted compositionally to probabilistic programs, and the corresponding truth values are estimated using sampling methods. BIS successfully deals with various probabilistic semantic phenomena, including frequency adverbs, generalised quantifiers, generics, and vague predicates. It performs well on a number of interesting probabilistic reasoning tasks. It also sustains most classically valid inferences (instantiation, de Morgan’s laws, etc.). To test BIS we have built an experimental test suite with examples of a range of probabilistic and classical inference patterns.

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Long Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Short Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg
Proceedings of the 13th International Conference on Computational Semantics - Short Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Student Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg | Kathrein Abu Kwaik | Vladislav Maraev
Proceedings of the 13th International Conference on Computational Semantics - Student Papers

pdf bib
Testing the Generalization Power of Neural Network Models across NLI BenchmarksNLI Benchmarks
Aarne Talman | Stergios Chatzikyriakidis
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Neural network models have been very successful in natural language inference, with the best models reaching 90 % accuracy in some benchmarks. However, the success of these models turns out to be largely benchmark specific. We show that models trained on a natural language inference dataset drawn from one benchmark fail to perform well in others, even if the notion of inference assumed in these benchmarks is the same or similar. We train six high performing neural network models on different datasets and show that each one of these has problems of generalizing when we replace the original test set with a test set taken from another corpus designed for the same task. In light of these results, we argue that most of the current neural network models are not able to generalize well in the task of natural language inference. We find that using large pre-trained language models helps with transfer learning when the datasets are similar enough. Our results also highlight that the current NLI datasets do not cover the different nuances of inference extensively enough.

pdf bib
A Wide-Coverage Symbolic Natural Language Inference System
Stergios Chatzikyriakidis | Jean-Philippe Bernardy
Proceedings of the 22nd Nordic Conference on Computational Linguistics

We present a system for Natural Language Inference which uses a dynamic semantics converter from abstract syntax trees to Coq types. It combines the fine-grainedness of a dynamic semantics system with the powerfulness of a state-of-the-art proof assistant, like Coq. We evaluate the system on all sections of the FraCaS test suite, excluding section 6. This is the first system that does a complete run on the anaphora and ellipsis sections of the FraCaS. It has a better overall accuracy than any previous system.

pdf bib
Predicates as Boxes in Bayesian Semantics for Natural LanguageBayesian Semantics for Natural Language
Jean-Philippe Bernardy | Rasmus Blanck | Stergios Chatzikyriakidis | Shalom Lappin | Aleksandre Maskharashvili
Proceedings of the 22nd Nordic Conference on Computational Linguistics

In this paper, we present a Bayesian approach to natural language semantics. Our main focus is on the inference task in an environment where judgments require probabilistic reasoning. We treat nouns, verbs, adjectives, etc. as unary predicates, and we model them as boxes in a bounded domain. We apply Bayesian learning to satisfy constraints expressed as premises. In this way we construct a model, by specifying boxes for the predicates. The probability of the hypothesis (the conclusion) is evaluated against the model that incorporates the premises as constraints.

2017

pdf bib
Deep Learning : Detecting Metaphoricity in Adjective-Noun Pairs
Yuri Bizzoni | Stergios Chatzikyriakidis | Mehdi Ghanimifard
Proceedings of the Workshop on Stylistic Variation

Metaphor is one of the most studied and widespread figures of speech and an essential element of individual style. In this paper we look at metaphor identification in Adjective-Noun pairs. We show that using a single neural network combined with pre-trained vector embeddings can outperform the state of the art in terms of accuracy. In specific, the approach presented in this paper is based on two ideas : a) transfer learning via using pre-trained vectors representing adjective noun pairs, and b) a neural network as a model of composition that predicts a metaphoricity score as output. We present several different architectures for our system and evaluate their performances. Variations on dataset size and on the kinds of embeddings are also investigated. We show considerable improvement over the previous approaches both in terms of accuracy and w.r.t the size of annotated training data.