Alexander Panchenko


pdf bib
ParaDetox Detoxification with Parallel DataParaDetox: Detoxification with Parallel Data
Varvara Logacheva | Daryna Dementieva | Sergey Ustyantsev | Daniil Moskovskiy | David Dale | Irina Krotova | Nikita Semenov | Alexander Panchenko
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a novel pipeline for the collection of parallel data for the detoxification task We collect non toxic paraphrases for over 10,000 English toxic sentences We also show that this pipeline can be used to distill a large existing corpus of paraphrases to get toxic neutral sentence pairs We release two parallel corpora which can be used for the training of detoxification models To the best of our knowledge these are the first parallel datasets for this task We describe our pipeline in detail to make it fast to set up for a new language or domain thus contributing to faster and easier development of new parallel resources We train several detoxification models on the collected data and compare them with several baselines and state of the art unsupervised approaches We conduct both automatic and manual evaluations All models trained on parallel data outperform the state of the art unsupervised models by a large margin This suggests that our novel datasets can boost the performance of detoxification systems


pdf bib
Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty EstimatesBayesian Uncertainty Estimates
Artem Shelmanov | Dmitri Puzyrev | Lyubov Kupriyanova | Denis Belyakov | Daniil Larionov | Nikita Khromov | Olga Kozlova | Ekaterina Artemova | Dmitry V. Dylov | Alexander Panchenko
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Annotating training data for sequence tagging of texts is usually very time-consuming. Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We are the first to thoroughly investigate this powerful combination for the sequence tagging task. We conduct an extensive empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework and find the best combinations for different types of models. Besides, we also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance and reduces obstacles for applying deep active learning in practice.

pdf bib
Which is Better for Deep Learning : Python or MATLAB? Answering Comparative Questions in Natural LanguageMATLAB? Answering Comparative Questions in Natural Language
Viktoriia Chekalina | Alexander Bondarenko | Chris Biemann | Meriem Beloucif | Varvara Logacheva | Alexander Panchenko
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

We present a system for answering comparative questions (Is X better than Y with respect to Z?) in natural language. Answering such questions is important for assisting humans in making informed decisions. The key component of our system is a natural language interface for comparative QA that can be used in personal assistants, chatbots, and similar NLP devices. Comparative QA is a challenging NLP task, since it requires collecting support evidence from many different sources, and direct comparisons of rare objects may be not available even on the entire Web. We take the first step towards a solution for such a task offering a testbed for comparative QA in natural language by probing several methods, making the three best ones available as an online demo.

pdf bib
Text Detoxification using Large Pre-trained Neural Models
David Dale | Anton Voronov | Daryna Dementieva | Varvara Logacheva | Olga Kozlova | Nikita Semenov | Alexander Panchenko
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas : (1) guidance of the generation process with small style-conditional language models and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained language models to keep the text content and remove toxicity. Our second method uses BERT to replace toxic words with their non-offensive synonyms. We make the method more flexible by enabling BERT to replace mask tokens with a variable number of words. Finally, we present the first large-scale comparative study of style transfer models on the task of toxicity removal. We compare our models with a number of methods for style transfer. The models are evaluated in a reference-free way using a combination of unsupervised style transfer metrics. Both methods we suggest yield new SOTA results.

pdf bib
Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company’s Reputation
Nikolay Babakov | Varvara Logacheva | Olga Kozlova | Nikita Semenov | Alexander Panchenko
Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing

Not all topics are equally flammable in terms of toxicity : a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labelling a dataset for appropriateness. While toxicity in user-generated data is well-studied, we aim at defining a more fine-grained notion of inappropriateness. The core of inappropriateness is that it can harm the reputation of a speaker. This is different from toxicity in two respects : (i) inappropriateness is topic-related, and (ii) inappropriate message is not toxic but still unacceptable. We collect and release two datasets for Russian : a topic-labelled dataset and an appropriateness-labelled dataset. We also release pre-trained classification models trained on this data.

pdf bib
SkoltechNLP at SemEval-2021 Task 2 : Generating Cross-Lingual Training Data for the Word-in-Context TaskSkoltechNLP at SemEval-2021 Task 2: Generating Cross-Lingual Training Data for the Word-in-Context Task
Anton Razzhigaev | Nikolay Arefyev | Alexander Panchenko
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

In this paper, we present a system for the solution of the cross-lingual and multilingual word-in-context disambiguation task. Task organizers provided monolingual data in several languages, but no cross-lingual training data were available. To address the lack of the officially provided cross-lingual training data, we decided to generate such data ourselves. We describe a simple yet effective approach based on machine translation and back translation of the lexical units to the original language used in the context of this shared task. In our experiments, we used a neural system based on the XLM-R, a pre-trained transformer-based masked language model, as a baseline. We show the effectiveness of the proposed approach as it allows to substantially improve the performance of this strong neural baseline model. In addition, in this study, we present multiple types of the XLM-R based classifier, experimenting with various ways of mixing information from the first and second occurrences of the target word in two samples.


pdf bib
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)
Dmitry Ustalov | Swapna Somasundaran | Alexander Panchenko | Fragkiskos D. Malliaros | Ioana Hulpuș | Peter Jansen | Abhik Jana
Proceedings of the Graph-based Methods for Natural Language Processing (TextGraphs)

pdf bib
Generating Lexical Representations of Frames using Lexical Substitution
Saba Anwar | Artem Shelmanov | Alexander Panchenko | Chris Biemann
Proceedings of the Probability and Meaning Conference (PaM 2020)

Semantic frames are formal linguistic structures describing situations / actions / events, e.g. Commercial transfer of goods. Each frame provides a set of roles corresponding to the situation participants, e.g. Buyer and Goods, and lexical units (LUs) words and phrases that can evoke this particular frame in texts, e.g. Sell. The scarcity of annotated resources hinders wider adoption of frame semantics across languages and domains. We investigate a simple yet effective method, lexical substitution with word representation models, to automatically expand a small set of frame-annotated sentences with new words for their respective roles and LUs. We evaluate the expansion quality using FrameNet. Contextualized models demonstrate overall superior performance compared to the non-contextualized ones on roles. However, the latter show comparable performance on the task of LU expansion.

pdf bib
SkoltechNLP at SemEval-2020 Task 11 : Exploring Unsupervised Text Augmentation for Propaganda DetectionSkoltechNLP at SemEval-2020 Task 11: Exploring Unsupervised Text Augmentation for Propaganda Detection
Daryna Dementieva | Igor Markov | Alexander Panchenko
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents a solution for the Span Identification (SI) task in the Detection of Propaganda Techniques in News Articles competition at SemEval-2020. The goal of the SI task is to identify specific fragments of each article which contain the use of at least one propaganda technique. This is a binary sequence tagging task. We tested several approaches finally selecting a fine-tuned BERT model as our baseline model. Our main contribution is an investigation of several unsupervised data augmentation techniques based on distributional semantics expanding the original small training dataset as applied to this BERT-based sequence tagger. We explore various expansion strategies and show that they can substantially shift the balance between precision and recall, while maintaining comparable levels of the F1 score.

pdf bib
Proceedings of Knowledgeable NLP: the First Workshop on Integrating Structured Knowledge and Neural Networks for NLP
Oren Sar Shalom | Alexander Panchenko | Cicero dos Santos | Varvara Logacheva | Alessandro Moschitti | Ido Dagan
Proceedings of Knowledgeable NLP: the First Workshop on Integrating Structured Knowledge and Neural Networks for NLP

pdf bib
Word Sense Disambiguation for 158 Languages using Word Embeddings Only
Varvara Logacheva | Denis Teslenko | Artem Shelmanov | Steffen Remus | Dmitry Ustalov | Andrey Kutuzov | Ekaterina Artemova | Chris Biemann | Simone Paolo Ponzetto | Alexander Panchenko
Proceedings of the 12th Language Resources and Evaluation Conference

Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al., (2018), enabling WSD in these languages. Models and system are available online.


pdf bib
A Dataset for Noun Compositionality Detection for a Slavic LanguageSlavic Language
Dmitry Puzyrev | Artem Shelmanov | Alexander Panchenko | Ekaterina Artemova
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

This paper presents the first gold-standard resource for Russian annotated with compositionality information of noun compounds. The compound phrases are collected from the Universal Dependency treebanks according to part of speech patterns, such as ADJ+NOUN or NOUN+NOUN, using the gold-standard annotations. Each compound phrase is annotated by two experts and a moderator according to the following schema : the phrase can be either compositional, non-compositional, or ambiguous (i.e., depending on the context it can be interpreted both as compositional or non-compositional). We conduct an experimental evaluation of models and methods for predicting compositionality of noun compounds in unsupervised and supervised setups. We show that methods from previous work evaluated on the proposed Russian-language resource achieve the performance comparable with results on English corpora.

pdf bib
Categorizing Comparative Sentences
Alexander Panchenko | Alexander Bondarenko | Mirco Franzek | Matthias Hagen | Chris Biemann
Proceedings of the 6th Workshop on Argument Mining

We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e.g., Python has better NLP libraries than MATLAB Python, better, MATLAB). To this end, we manually annotate 7,199 sentences for 217 distinct target item pairs from several domains (27 % of the sentences contain an oriented comparison in the sense of better or worse). A gradient boosting model based on pre-trained sentence embeddings reaches an F1 score of 85 % in our experimental evaluation. The model can be used to extract comparative sentences for pro / con argumentation in comparative / argument search engines or debating technologies.

pdf bib
On the Compositionality Prediction of Noun Phrases using Poincar Embeddings
Abhik Jana | Dima Puzyrev | Alexander Panchenko | Pawan Goyal | Chris Biemann | Animesh Mukherjee
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The compositionality degree of multiword expressions indicates to what extent the meaning of a phrase can be derived from the meaning of its constituents and their grammatical relations. Prediction of (non)-compositionality is a task that has been frequently addressed with distributional semantic models. We introduce a novel technique to blend hierarchical information with distributional information for predicting compositionality. In particular, we use hypernymy information of the multiword and its constituents encoded in the form of the recently introduced Poincar embeddings in addition to the distributional information to detect compositionality for noun phrases. Using a weighted average of the distributional similarity and a Poincar similarity function, we obtain consistent and substantial, statistically significant improvement across three gold standard datasets over state-of-the-art models based on distributional information only. Unlike traditional approaches that solely use an unsupervised setting, we have also framed the problem as a supervised task, obtaining comparable improvements. Further, we publicly release our Poincar embeddings, which are trained on the output of handcrafted lexical-syntactic patterns on a large corpus.

pdf bib
Making Fast Graph-based Algorithms with Graph Metric Embeddings
Andrey Kutuzov | Mohammad Dorgham | Oleksiy Oliynyk | Chris Biemann | Alexander Panchenko
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Graph measures, such as node distances, are inefficient to compute. We explore dense vector representations as an effective way to approximate the same information. We introduce a simple yet efficient and effective approach for learning graph embeddings. Instead of directly operating on the graph structure, our method takes structural measures of pairwise node similarities into account and learns dense node representations reflecting user-defined graph distance measures, such as e.g. the shortest path distance or distance measures that take information beyond the graph structure into account. We demonstrate a speed-up of several orders of magnitude when predicting word similarity by vector operations on our embeddings as opposed to directly computing the respective path-based measures, while outperforming various other graph embeddings on semantic similarity and word sense disambiguation tasks.

pdf bib
Improving Neural Entity Disambiguation with Graph Embeddings
Özge Sevgili | Alexander Panchenko | Chris Biemann
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Entity Disambiguation (ED) is the task of linking an ambiguous entity mention to a corresponding entry in a knowledge base. Current methods have mostly focused on unstructured text data to learn representations of entities, however, there is structured information in the knowledge base itself that should be useful to disambiguate entities. In this work, we propose a method that uses graph embeddings for integrating structured information from the knowledge base with unstructured information from text-based representations. Our experiments confirm that graph embeddings trained on a graph of hyperlinks between Wikipedia articles improve the performances of simple feed-forward neural ED model and a state-of-the-art neural ED system.

pdf bib
TARGER : Neural Argument Mining at Your FingertipsTARGER: Neural Argument Mining at Your Fingertips
Artem Chernodub | Oleksiy Oliynyk | Philipp Heidenreich | Alexander Bondarenko | Matthias Hagen | Chris Biemann | Alexander Panchenko
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We present TARGER, an open source neural argument mining framework for tagging arguments in free input texts and for keyword-based retrieval of arguments from an argument-tagged web-scale corpus. The currently available models are pre-trained on three recent argument mining datasets and enable the use of neural argument mining without any reproducibility effort on the user’s side. The open source code ensures portability to other domains and use cases.


pdf bib
Unsupervised Semantic Frame Induction using Triclustering
Dmitry Ustalov | Alexander Panchenko | Andrey Kutuzov | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We use dependency triples automatically extracted from a Web-scale corpus to perform unsupervised semantic frame induction. We cast the frame induction problem as a triclustering problem that is a generalization of clustering for triadic data. Our replicable benchmarks demonstrate that the proposed graph-based approach, Triframes, shows state-of-the art results on this task on a FrameNet-derived dataset and performing on par with competitive methods on a verb class clustering task.


pdf bib
Watset : Automatic Induction of Synsets from a Graph of SynonymsWatset: Automatic Induction of Synsets from a Graph of Synonyms
Dmitry Ustalov | Alexander Panchenko | Chris Biemann
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings. First, we build a weighted graph of synonyms extracted from commonly available resources, such as Wiktionary. Second, we apply word sense induction to deal with ambiguous words. Finally, we cluster the disambiguated version of the ambiguous input graph into synsets. Our meta-clustering approach lets us use an efficient hard clustering algorithm to perform a fuzzy clustering of the graph. Despite its simplicity, our approach shows excellent results, outperforming five competitive state-of-the-art methods in terms of F-score on three gold standard datasets for English and Russian derived from large-scale manually constructed lexical resources.

pdf bib
Unsupervised, Knowledge-Free, and Interpretable Word Sense Disambiguation
Alexander Panchenko | Fide Marten | Eugen Ruppert | Stefano Faralli | Dmitry Ustalov | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration.

pdf bib
Unsupervised Does Not Mean Uninterpretable : The Case for Word Sense Induction and Disambiguation
Alexander Panchenko | Eugen Ruppert | Stefano Faralli | Simone Paolo Ponzetto | Chris Biemann
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The current trend in NLP is the use of highly opaque models, e.g. neural networks and word embeddings. While these models yield state-of-the-art results on a range of tasks, their drawback is poor interpretability. On the example of word sense induction and disambiguation (WSID), we show that it is possible to develop an interpretable model that matches the state-of-the-art models in accuracy. Namely, we present an unsupervised, knowledge-free WSID approach, which is interpretable at three levels : word sense inventory, sense feature representations, and disambiguation procedure. Experiments show that our model performs on par with state-of-the-art word sense embeddings and other unsupervised systems while offering the possibility to justify its decisions in human-readable form.

pdf bib
The ContrastMedium Algorithm : Taxonomy Induction From Noisy Knowledge Graphs With Just A Few LinksContrastMedium Algorithm: Taxonomy Induction From Noisy Knowledge Graphs With Just A Few Links
Stefano Faralli | Alexander Panchenko | Chris Biemann | Simone Paolo Ponzetto
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In this paper, we present ContrastMedium, an algorithm that transforms noisy semantic networks into full-fledged, clean taxonomies. ContrastMedium is able to identify the embedded taxonomy structure from a noisy knowledge graph without explicit human supervision such as, for instance, a set of manually selected input root and leaf concepts. This is achieved by leveraging structural information from a companion reference taxonomy, to which the input knowledge graph is linked (either automatically or manually). When used in conjunction with methods for hypernym acquisition and knowledge base linking, our methodology provides a complete solution for end-to-end taxonomy induction. We conduct experiments using automatically acquired knowledge graphs, as well as a SemEval benchmark, and show that our method is able to achieve high performance on the task of taxonomy induction.

pdf bib
Negative Sampling Improves Hypernymy Extraction Based on Projection Learning
Dmitry Ustalov | Nikolay Arefyev | Chris Biemann | Alexander Panchenko
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We present a new approach to extraction of hypernyms based on projection learning and word embeddings. In contrast to classification-based approaches, projection-based methods require no candidate hyponym-hypernym pairs. While it is natural to use both positive and negative training examples in supervised relation extraction, the impact of positive examples on hypernym prediction was not studied so far. In this paper, we show that explicit negative examples used for regularization of the model significantly improve performance compared to the state-of-the-art approach of Fu et al. (2014) on three datasets from different languages.