Elizaveta Kuzmenko


2019

pdf bib
Distributional Semantics in the Real World : Building Word Vector Representations from a Truth-Theoretic Model
Elizaveta Kuzmenko | Aurélie Herbelot
Proceedings of the 13th International Conference on Computational Semantics - Short Papers

Distributional semantics models (DSMs) are known to produce excellent representations of word meaning, which correlate with a range of behavioural data. As lexical representations, they have been said to be fundamentally different from truth-theoretic models of semantics, where meaning is defined as a correspondence relation to the world. There are two main aspects to this difference : a) DSMs are built over corpus data which may or may not reflect ‘what is in the world’ ; b) they are built from word co-occurrences, that is, from lexical types rather than entities and sets. In this paper, we inspect the properties of a distributional model built over a set-theoretic approximation of ‘the real world’. To achieve this, we take the annotation a large database of images marked with objects, attributes and relations, convert the data into a representation akin to first-order logic and build several distributional models using various combinations of features. We evaluate those models over both relatedness and similarity datasets, demonstrating their effectiveness in standard evaluations. This allows us to conclude that, despite prior claims, truth-theoretic models are good candidates for building graded lexical representations of meaning.

pdf bib
CONAN-COunter NArratives through Nichesourcing : a Multilingual Dataset of Responses to Fight Online Hate SpeechCONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech
Yi-Ling Chung | Elizaveta Kuzmenko | Serra Sinem Tekiroglu | Marco Guerini
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Although there is an unprecedented effort to provide adequate responses in terms of laws and policies to hate content on social media platforms, dealing with hatred online is still a tough problem. Tackling hate speech in the standard way of content deletion or user suspension may be charged with censorship and overblocking. One alternate strategy, that has received little attention so far by the research community, is to actually oppose hate content with counter-narratives (i.e. informed textual responses). In this paper, we describe the creation of the first large-scale, multilingual, expert-based dataset of hate-speech / counter-narrative pairs. This dataset has been built with the effort of more than 100 operators from three different NGOs that applied their training and expertise to the task. Together with the collected data we also provide additional annotations about expert demographics, hate and response type, and data augmentation through translation and paraphrasing. Finally, we provide initial experiments to assess the quality of our data.

2017

pdf bib
Clustering of Russian Adjective-Noun Constructions using Word EmbeddingsRussian Adjective-Noun Constructions using Word Embeddings
Andrey Kutuzov | Elizaveta Kuzmenko | Lidia Pivovarova
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing

This paper presents a method of automatic construction extraction from a large corpus of Russian. The term ‘construction’ here means a multi-word expression in which a variable can be replaced with another word from the same semantic class, for example, ‘a glass of [ water / juice / milk ]’. We deal with constructions that consist of a noun and its adjective modifier. We propose a method of grouping such constructions into semantic classes via 2-step clustering of word vectors in distributional models. We compare it with other clustering techniques and evaluate it against A Russian-English Collocational Dictionary of the Human Body that contains manually annotated groups of constructions with nouns meaning human body parts. The best performing method is used to cluster all adjective-noun bigrams in the Russian National Corpus. Results of this procedure are publicly available and can be used for building Russian construction dictionary as well as to accelerate theoretical studies of constructions.

pdf bib
Building Web-Interfaces for Vector Semantic Models with the WebVectors ToolkitWebVectors Toolkit
Andrey Kutuzov | Elizaveta Kuzmenko
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

In this demo we present WebVectors, a free and open-source toolkit helping to deploy web services which demonstrate and visualize distributional semantic models (widely known as word embeddings). WebVectors can be useful in a very common situation when one has trained a distributional semantics model for one’s particular corpus or language (tools for this are now widespread and simple to use), but then there is a need to demonstrate the results to general public over the Web. We show its abilities on the example of the living web services featuring distributional models for English, Norwegian and Russian.