International Conference Recent Advances in Natural Language Processing (2019)


Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Ruslan Mitkov | Galia Angelova

Identification of Good and Bad News on TwitterTwitter
Piush Aggarwal | Ahmet Aker

Social media plays a great role in news dissemination which includes good and bad news. However, studies show that news, in general, has a significant impact on our mental stature and that this influence is more in bad news. An ideal situation would be that we have a tool that can help to filter out the type of news we do not want to consume. In this paper, we provide the basis for such a tool. In our work, we focus on Twitter. We release a manually annotated dataset containing 6,853 tweets from 5 different topical categories. Each tweet is annotated with good and bad labels. We also investigate various machine learning systems and features and evaluate their performance on the newly generated dataset. We also perform a comparative analysis with sentiments showing that sentiment alone is not enough to distinguish between good and bad news.

Bilingual Low-Resource Neural Machine Translation with Round-Tripping : The Case of Persian-SpanishPersian-Spanish
Benyamin Ahmadnia | Bonnie Dorr

The quality of Neural Machine Translation (NMT), as a data-driven approach, massively depends on quantity, quality, and relevance of the training dataset. Such approaches have achieved promising results for bilingually high-resource scenarios but are inadequate for low-resource conditions. This paper describes a round-trip training approach to bilingual low-resource NMT that takes advantage of monolingual datasets to address training data scarcity, thus augmenting translation quality. We conduct detailed experiments on Persian-Spanish as a bilingually low-resource scenario. Experimental results demonstrate that this competitive approach outperforms the baselines.

Diachronic Analysis of Entities by Exploiting Wikipedia Page revisionsWikipedia Page revisions
Pierpaolo Basile | Annalina Caputo | Seamus Lawless | Giovanni Semeraro

In the last few years, the increasing availability of large corpora spanning several time periods has opened new opportunities for the diachronic analysis of language. This type of analysis can bring to the light not only linguistic phenomena related to the shift of word meanings over time, but it can also be used to study the impact that societal and cultural trends have on this language change. This paper introduces a new resource for performing the diachronic analysis of named entities built upon Wikipedia page revisions. This resource enables the analysis over time of changes in the relations between entities (concepts), surface forms (words), and the contexts surrounding entities and surface forms, by analysing the whole history of Wikipedia internal links. We provide some useful use cases that prove the impact of this resource on diachronic studies and delineate some possible future usage.

Using a Lexical Semantic Network for the Ontology Building
Nadia Bebeshina-Clairet | Sylvie Despres | Mathieu Lafourcade

Building multilingual ontologies is a hard task as ontologies are often data-rich resources. We introduce an approach which allows exploiting structured lexical semantic knowledge for the ontology building. Given a multilingual lexical semantic (non ontological) resource and an ontology model, it allows mining relevant semantic knowledge and make the ontology building and enhancement process faster.

Evaluating the Consistency of Word Embeddings from Small Data
Jelke Bloem | Antske Fokkens | Aurélie Herbelot

In this work, we address the evaluation of distributional semantic models trained on smaller, domain-specific texts, specifically, philosophical text. Specifically, we inspect the behaviour of models using a pre-trained background space in learning. We propose a measure of consistency which can be used as an evaluation metric when no in-domain gold-standard data is available. This measure simply computes the ability of a model to learn similar embeddings from different parts of some homogeneous data. We show that in spite of being a simple evaluation, consistency actually depends on various combinations of factors, including the nature of the data itself, the model used to train the semantic space, and the frequency of the learnt terms, both in the background space and in the in-domain data of interest.

Learning Sentence Embeddings for Coherence Modelling and Beyond
Tanner Bohn | Yining Hu | Jinhang Zhang | Charles Ling

We present a novel and effective technique for performing text coherence tasks while facilitating deeper insights into the data. Despite obtaining ever-increasing task performance, modern deep-learning approaches to NLP tasks often only provide users with the final network decision and no additional understanding of the data. In this work, we show that a new type of sentence embedding learned through self-supervision can be applied effectively to text coherence tasks while serving as a window through which deeper understanding of the data can be obtained. To produce these sentence embeddings, we train a recurrent neural network to take individual sentences and predict their location in a document in the form of a distribution over locations. We demonstrate that these embeddings, combined with simple visual heuristics, can be used to achieve performance competitive with state-of-the-art on multiple text coherence tasks, outperforming more complex and specialized approaches. Additionally, we demonstrate that these embeddings can provide insights useful to writers for improving writing quality and informing document structuring, and assisting readers in summarizing and locating information.

Classifying Author Intention for Writer Feedback in Related Work
Arlene Casey | Bonnie Webber | Dorota Glowacka

The ability to produce high-quality publishable material is critical to academic success but many Post-Graduate students struggle to learn to do so. While recent years have seen an increase in tools designed to provide feedback on aspects of writing, one aspect that has so far been neglected is the Related Work section of academic research papers. To address this, we have trained a supervised classifier on a corpus of 94 Related Work sections and evaluated it against a manually annotated gold standard. The classifier uses novel features pertaining to citation types and co-reference, along with patterns found from studying Related Works. We show that these novel features contribute to classifier performance with performance being favourable compared to other similar works that classify author intentions and consider feedback for academic writing.

Sparse Victory A Large Scale Systematic Comparison of count-based and prediction-based vectorizers for text classification
Rupak Chakraborty | Ashima Elhence | Kapil Arora

In this paper we study the performance of several text vectorization algorithms on a diverse collection of 73 publicly available datasets. Traditional sparse vectorizers like Tf-Idf and Feature Hashing have been systematically compared with the latest state of the art neural word embeddings like Word2Vec, GloVe, FastText and character embeddings like ELMo, Flair. We have carried out an extensive analysis of the performance of these vectorizers across different dimensions like classification metrics (.i.e. precision, recall, accuracy), dataset-size, and imbalanced data (in terms of the distribution of the number of class labels). Our experiments reveal that the sparse vectorizers beat the neural word and character embedding models on 61 of the 73 datasets by an average margin of 3-5 % (in terms of macro f1 score) and this performance is consistent across the different dimensions of comparison.

Personality-dependent Neural Text Summarization
Pablo Costa | Ivandré Paraboni

In Natural Language Generation systems, personalization strategies-i.e, the use of information about a target author to generate text that (more) closely resembles human-produced language-have long been applied to improve results. The present work addresses one such strategy-namely, the use of Big Five personality information about the target author-applied to the case of abstractive text summarization using neural sequence-to-sequence models. Initial results suggest that having access to personality information does lead to more accurate (or human-like) text summaries, and paves the way for more robust systems of this kind.

Detecting Toxicity in News Articles : Application to BulgarianBulgarian
Yoan Dinkov | Ivan Koychev | Preslav Nakov

Online media aim for reaching ever bigger audience and for attracting ever longer attention span. This competition creates an environment that rewards sensational, fake, and toxic news. To help limit their spread and impact, we propose and develop a news toxicity detector that can recognize various types of toxic content. While previous research primarily focused on English, here we target Bulgarian. We created a new dataset by crawling a website that for five years has been collecting Bulgarian news articles that were manually categorized into eight toxicity groups. Then we trained a multi-class classifier with nine categories : eight toxic and one non-toxic. We experimented with different representations based on ElMo, BERT, and XLM, as well as with a variety of domain-specific features. Due to the small size of our dataset, we created a separate model for each feature type, and we ultimately combined these models into a meta-classifier. The evaluation results show an accuracy of 59.0 % and a macro-F1 score of 39.7 %, which represent sizable improvements over the majority-class baseline (Acc=30.3 %, macro-F1=5.2 %).

De-Identification of Emails : Pseudonymizing Privacy-Sensitive Data in a German Email CorpusGerman Email Corpus
Elisabeth Eder | Ulrike Krieg-Holz | Udo Hahn

We deal with the pseudonymization of those stretches of text in emails that might allow to identify real individual persons. This task is decomposed into two steps. First, named entities carrying privacy-sensitive information (e.g., names of persons, locations, phone numbers or dates) are identified, and, second, these privacy-bearing entities are replaced by synthetically generated surrogates (e.g., a person originally named ‘John Doe’ is renamed as ‘Bill Powers’). We describe a system architecture for surrogate generation and evaluate our approach on CodeAlltag, a German email corpus.

Lexical Quantile-Based Text Complexity Measure
Maksim Eremeev | Konstantin Vorontsov

This paper introduces a new approach to estimating the text document complexity. Common readability indices are based on average length of sentences and words. In contrast to these methods, we propose to count the number of rare words occurring abnormally often in the document. We use the reference corpus of texts and the quantile approach in order to determine what words are rare, and what frequencies are abnormal. We construct a general text complexity model, which can be adjusted for the specific task, and introduce two special models. The experimental design is based on a set of thematically similar pairs of Wikipedia articles, labeled using crowdsourcing. The experiments demonstrate the competitiveness of the proposed approach.

Demo Application for LETO : Learning Engine Through OntologiesLETO: Learning Engine Through Ontologies
Suilan Estevez-Velarde | Andrés Montoyo | Yudivian Almeida-Cruz | Yoan Gutiérrez | Alejandro Piad-Morffis | Rafael Muñoz

The massive amount of multi-formatted information available on the Web necessitates the design of software systems that leverage this information to obtain knowledge that is valid and useful. The main challenge is to discover relevant information and continuously update, enrich and integrate knowledge from various sources of structured and unstructured data. This paper presents the Learning Engine Through Ontologies(LETO) framework, an architecture for the continuous and incremental discovery of knowledge from multiple sources of unstructured and structured data. We justify the main design decision behind LETO’s architecture and evaluate the framework’s feasibility using the Internet Movie Data Base(IMDB) and Twitter as a practical application.

Sentence Simplification for Semantic Role Labelling and Information Extraction
Richard Evans | Constantin Orasan

In this paper, we report on the extrinsic evaluation of an automatic sentence simplification method with respect to two NLP tasks : semantic role labelling (SRL) and information extraction (IE). The paper begins with our observation of challenges in the intrinsic evaluation of sentence simplification systems, which motivates the use of extrinsic evaluation of these systems with respect to other NLP tasks. We describe the two NLP systems and the test data used in the extrinsic evaluation, and present arguments and evidence motivating the integration of a sentence simplification step as a means of improving the accuracy of these systems. Our evaluation reveals that their performance is improved by the simplification step : the SRL system is better able to assign semantic roles to the majority of the arguments of verbs and the IE system is better able to identify fillers for all IE template slots.

OlloBot-Towards A Text-Based Arabic Health Conversational Agent : Evaluation and ResultsOlloBot - Towards A Text-Based Arabic Health Conversational Agent: Evaluation and Results
Ahmed Fadhil | Ahmed AbuRa’ed

We introduce OlloBot, an Arabic conversational agent that assists physicians and supports patients with the care process. It does n’t replace the physicians, instead provides health tracking and support and assists physicians with the care delivery through a conversation medium. The current model comprises healthy diet, physical activity, mental health, in addition to food logging. Not only OlloBot tracks user daily food, it also offers useful tips for healthier living. We will discuss the design, development and testing of OlloBot, and highlight the findings and limitations arose from the testing.

Summarizing Legal Rulings : Comparative Experiments
Diego Feijo | Viviane Moreira

In the context of text summarization, texts in the legal domain have peculiarities related to their length and to their specialized vocabulary. Recent neural network-based approaches can achieve high-quality scores for text summarization. However, these approaches have been used mostly for generating very short abstracts for news articles. Thus, their applicability to the legal domain remains an open issue. In this work, we experimented with ten extractive and four abstractive models in a real dataset of legal rulings. These models were compared with an extractive baseline based on heuristics to select the most relevant parts of the text. Our results show that abstractive approaches significantly outperform extractive methods in terms of ROUGE scores.

Comparing Automated Methods to Detect Explicit Content in Song Lyrics
Michael Fell | Elena Cabrio | Michele Corazza | Fabien Gandon

The Parental Advisory Label (PAL) is a warning label that is placed on audio recordings in recognition of profanity or inappropriate references, with the intention of alerting parents of material potentially unsuitable for children. Since 2015, digital providers such as iTunes, Spotify, Amazon Music and Deezer also follow PAL guidelines and tag such tracks as explicit. Nowadays, such labelling is carried out mainly manually on voluntary basis, with the drawbacks of being time consuming and therefore costly, error prone and partly a subjective task. In this paper, we compare automated methods ranging from dictionary-based lookup to state-of-the-art deep neural networks to automatically detect explicit contents in English lyrics. We show that more complex models perform only slightly better on this task, and relying on a qualitative analysis of the data, we discuss the inherent hardness and subjectivity of the task.

Linguistic classification : dealing jointly with irrelevance and inconsistency
Laura Franzoi | Andrea Sgarro | Anca Dinu | Liviu P. Dinu

In this paper, we present new methods for language classification which put to good use both syntax and fuzzy tools, and are capable of dealing with irrelevant linguistic features (i.e. features which should not contribute to the classification) and even inconsistent features (which do not make sense for specific languages). We introduce a metric distance, based on the generalized Steinhaus transform, which allows one to deal jointly with irrelevance and inconsistency. To evaluate our methods, we test them on a syntactic data set, due to the linguist G. Longobardi and his school. We obtain phylogenetic trees which sometimes outperform the ones obtained by Atkinson and Gray.

Two Discourse Tree-Based Approaches to Indexing Answers
Boris Galitsky | Dmitry Ilvovsky

We explore anatomy of answers with respect to which text fragments from an answer are worth matching with a question and which should not be matched. We apply the Rhetorical Structure Theory to build a discourse tree of an answer and select elementary discourse units that are suitable for indexing. Manual rules for selection of these discourse units as well as automated classification based on web search engine mining are evaluated con-cerning improving search accuracy. We form two sets of question-answer pairs for FAQ and community QA search domains and use them for evaluation of the proposed indexing methodology, which delivers up to 16 percent improvement in search recall.

On a Chatbot Providing Virtual Dialogues
Boris Galitsky | Dmitry Ilvovsky | Elizaveta Goncharova

We present a chatbot that delivers content in the form of virtual dialogues automatically produced from the plain texts that are extracted and selected from the documents. This virtual dialogue content is provided in the form of answers derived from the found and selected documents split into fragments, and questions that are automatically generated for these answers based on the initial text.

Assessing socioeconomic status of Twitter users : A surveyTwitter users: A survey
Dhouha Ghazouani | Luigi Lancieri | Habib Ounelli | Chaker Jebari

Every day, the emotion and opinion of different people across the world are reflected in the form of short messages using microblogging platforms. Despite the existence of enormous potential introduced by this data source, the Twitter community is still ambiguous and is not fully explored yet. While there are a huge number of studies examining the possibilities of inferring gender and age, there exist hardly researches on socioeconomic status (SES) inference of Twitter users. As socioeconomic status is essential to treating diverse questions linked to human behavior in several fields (sociology, demography, public health, etc.), we conducted a comprehensive literature review of SES studies, inference methods, and metrics. With reference to the research on literature’s results, we came to outline the most critical challenges for researchers. To the best of our knowledge, this paper is the first review that introduces the different aspects of SES inference. Indeed, this article provides the benefits for practitioners who aim to process and explore Twitter SES inference.

Divide and Extract Disentangling Clause Splitting and Proposition Extraction
Darina Gold | Torsten Zesch

Proposition extraction from sentences is an important task for information extraction systems Evaluation of such systems usually conflates two aspects : splitting complex sentences into clauses and the extraction of propositions. It is thus difficult to independently determine the quality of the proposition extraction step. We create a manually annotated proposition dataset from sentences taken from restaurant reviews that distinguishes between clauses that need to be split and those that do not. The resulting proposition evaluation dataset allows us to independently compare the performance of proposition extraction systems on simple and complex clauses. Although performance drastically drops on more complex sentences, we show that the same systems perform best on both simple and complex clauses. Furthermore, we show that specific kinds of subordinate clauses pose difficulties to most systems.

Automatic Question Answering for Medical MCQs : Can It go Further than Information Retrieval?MCQs: Can It go Further than Information Retrieval?
Le An Ha | Victoria Yaneva

We present a novel approach to automatic question answering that does not depend on the performance of an information retrieval (IR) system and does not require that the training data come from the same source as the questions. We evaluate the system performance on a challenging set of university-level medical science multiple-choice questions. Best performance is achieved when combining a neural approach with an IR approach, both of which work independently. Unlike previous approaches, the system achieves statistically significant improvement over the random guess baseline even for questions that are labeled as challenging based on the performance of baseline solvers.

Investigating Terminology Translation in Statistical and Neural Machine Translation : A Case Study on English-to-Hindi and Hindi-to-EnglishEnglish-to-Hindi and Hindi-to-English
Rejwanul Haque | Md Hasanuzzaman | Andy Way

Terminology translation plays a critical role in domain-specific machine translation (MT). In this paper, we conduct a comparative qualitative evaluation on terminology translation in phrase-based statistical MT (PB-SMT) and neural MT (NMT) in two translation directions : English-to-Hindi and Hindi-to-English. For this, we select a test set from a legal domain corpus and create a gold standard for evaluating terminology translation in MT. We also propose an error typology taking the terminology translation errors into consideration. We evaluate the MT systems’ performance on terminology translation, and demonstrate our findings, unraveling strengths, weaknesses, and similarities of PB-SMT and NMT in the area of term translation.

Beyond English-Only Reading Comprehension : Experiments in Zero-shot Multilingual Transfer for BulgarianEnglish-Only Reading Comprehension: Experiments in Zero-shot Multilingual Transfer for Bulgarian
Momchil Hardalov | Ivan Koychev | Preslav Nakov

Recently, reading comprehension models achieved near-human performance on large-scale datasets such as SQuAD, CoQA, MS Macro, RACE, etc. This is largely due to the release of pre-trained contextualized representations such as BERT and ELMo, which can be fine-tuned for the target task. Despite those advances and the creation of more challenging datasets, most of the work is still done for English. Here, we study the effectiveness of multilingual BERT fine-tuned on large-scale English datasets for reading comprehension (e.g., for RACE), and we apply it to Bulgarian multiple-choice reading comprehension. We propose a new dataset containing 2,221 questions from matriculation exams for twelfth grade in various subjects history, biology, geography and philosophy, and 412 additional questions from online quizzes in history. While the quiz authors gave no relevant context, we incorporate knowledge from Wikipedia, retrieving documents matching the combination of question + each answer option. Moreover, we experiment with different indexing and pre-training strategies. The evaluation results show accuracy of 42.23 %, which is well above the baseline of 24.89 %.

Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media
Hansi Hettiarachchi | Tharindu Ranasinghe

This paper describes a novel research approach to detect type and target of offensive posts in social media using a capsule network. The input to the network was character embeddings combined with emoji embeddings. The approach was evaluated on all three subtasks in Task 6-SemEval 2019 : OffensEval : Identifying and Categorizing Offensive Language in Social Media. The evaluation also showed that even though the capsule networks have not been used commonly in natural language processing tasks, they can outperform existing state of the art solutions for offensive language detection in social media.

Using Syntax to Resolve NPE in EnglishNPE in English
Payal Khullar | Allen Antony | Manish Shrivastava

This paper describes a novel, syntax-based system for automatic detection and resolution of Noun Phrase Ellipsis (NPE) in English. The system takes in free input English text, detects the site of nominal elision, and if present, selects potential antecedent candidates. The rules are built using the syntactic information on ellipsis and its antecedent discussed in previous theoretical linguistics literature on NPE. Additionally, we prepare a curated dataset of 337 sentences from well-known, reliable sources, containing positive and negative samples of NPE. We split this dataset into two parts, and use one part to refine our rules and the other to test the performance of our final system. We get an F1-score of 76.47 % for detection and 70.27 % for NPE resolution on the testset. To the best of our knowledge, ours is the first system that detects and resolves NPE in English. The curated dataset used for this task, albeit small, covers a wide variety of NPE cases and will be made public for future work.

Is Similarity Visually Grounded? Computational Model of Similarity for the Estonian languageEstonian language
Claudia Kittask | Eduard Barbu

Researchers in Computational Linguistics build models of similarity and test them against human judgments. Although there are many empirical studies of the computational models of similarity for the English language, the similarity for other languages is less explored. In this study we are chiefly interested in two aspects. In the first place we want to know how much of the human similarity is grounded in the visual perception. To answer this question two neural computer vision models are used and their correlation with the human derived similarity scores is computed. In the second place we investigate if language influences the similarity computation. To this purpose diverse computational models trained on Estonian resources are evaluated against human judgments

Language-Agnostic Twitter-Bot DetectionTwitter-Bot Detection
Jürgen Knauth

In this paper we address the problem of detecting Twitter bots. We analyze a dataset of 8385 Twitter accounts and their tweets consisting of both humans and different kinds of bots. We use this data to train machine learning classifiers that distinguish between real and bot accounts. We identify features that are easy to extract while still providing good results. We analyze different feature groups based on account specific, tweet specific and behavioral specific features and measure their performance compared to other state of the art bot detection methods. For easy future portability of our work we focus on language-agnostic features. With AdaBoost, the best performing classifier, we achieve an accuracy of 0.988 and an AUC of 0.995. As the creation of good training data in machine learning is often difficult-especially in the domain of Twitter bot detection-we additionally analyze to what extent smaller amounts of training data lead to useful results by reviewing cross-validated learning curves. Our results indicate that using few but expressive features already has a good practical benefit for bot detection, especially if only a small amount of training data is available.

Question Similarity in Community Question Answering : A Systematic Exploration of Preprocessing Methods and Models
Florian Kunneman | Thiago Castro Ferreira | Emiel Krahmer | Antal van den Bosch

Community Question Answering forums are popular among Internet users, and a basic problem they encounter is trying to find out if their question has already been posed before. To address this issue, NLP researchers have developed methods to automatically detect question-similarity, which was one of the shared tasks in SemEval. The best performing systems for this task made use of Syntactic Tree Kernels or the SoftCosine metric. However, it remains unclear why these methods seem to work, whether their performance can be improved by better preprocessing methods and what kinds of errors they (and other methods) make. In this paper, we therefore systematically combine and compare these two approaches with the more traditional BM25 and translation-based models. Moreover, we analyze the impact of preprocessing steps (lowercasing, suppression of punctuation and stop words removal) and word meaning similarity based on different distributions (word translation probability, Word2Vec, fastText and ELMo) on the performance of the task. We conduct an error analysis to gain insight into the differences in performance between the system set-ups. The implementation is made publicly available from

Resolving Pronouns for a Resource-Poor Language, Malayalam Using Resource-Rich Language, Tamil.Malayalam Using Resource-Rich Language, Tamil.
Sobha Lalitha Devi

In this paper we give in detail how a resource rich language can be used for resolving pronouns for a less resource language. The source language, which is resource rich language in this study, is Tamil and the resource poor language is Malayalam, both belonging to the same language family, Dravidian. The Pronominal resolution developed for Tamil uses CRFs. Our approach is to leverage the Tamil language model to test Malayalam data and the processing required for Malayalam data is detailed. The similarity at the syntactic level between the languages is exploited in identifying the features for developing the Tamil language model. The word form or the lexical item is not considered as a feature for training the CRFs. Evaluation on Malayalam Wikipedia data shows that our approach is correct and the results, though not as good as Tamil, but comparable.

Semantic Role Labeling with Pretrained Language Models for Known and Unknown Predicates
Daniil Larionov | Artem Shelmanov | Elena Chistova | Ivan Smirnov

We build the first full pipeline for semantic role labelling of Russian texts. The pipeline implements predicate identification, argument extraction, argument classification (labeling), and global scoring via integer linear programming. We train supervised neural network models for argument classification using Russian semantically annotated corpus FrameBank. However, we note that this resource provides annotations only to a very limited set of predicates. We combat the problem of annotation scarcity by introducing two models that rely on different sets of features : one for known predicates that are present in the training set and one for unknown predicates that are not. We show that the model for unknown predicates can alleviate the lack of annotation by using pretrained embeddings. We perform experiments with various types of embeddings including the ones generated by deep pretrained language models : word2vec, FastText, ELMo, BERT, and show that embeddings generated by deep pretrained language models are superior to classical shallow embeddings for argument classification of both known and unknown predicates.

The Impact of Semantic Linguistic Features in Relation Extraction : A Logical Relational Learning Approach
Rinaldo Lima | Bernard Espinasse | Frederico Freitas

Relation Extraction (RE) consists in detecting and classifying semantic relations between entities in a sentence. The vast majority of the state-of-the-art RE systems relies on morphosyntactic features and supervised machine learning algorithms. This paper tries to answer important questions concerning both the impact of semantic based features, and the integration of external linguistic knowledge resources on RE performance. For that, a RE system based on a logical and relational learning algorithm was used and evaluated on three reference datasets from two distinct domains. The yielded results confirm that the classifiers induced using the proposed richer feature set outperformed the classifiers built with morphosyntactic features in average 4 % (F1-measure).

Detecting Anorexia in Spanish TweetsSpanish Tweets
Pilar López Úbeda | Flor Miriam Plaza del Arco | Manuel Carlos Díaz Galiano | L. Alfonso Urena Lopez | Maite Martin

Mental health is one of the main concerns of today’s society. Early detection of symptoms can greatly help people with mental disorders. People are using social networks more and more to express emotions, sentiments and mental states. Thus, the treatment of this information using NLP technologies can be applied to the automatic detection of mental problems such as eating disorders. However, the first step to solving the problem should be to provide a corpus in order to evaluate our systems. In this paper, we specifically focus on detecting anorexia messages on Twitter. Firstly, we have generated a new corpus of tweets extracted from different accounts including anorexia and non-anorexia messages in Spanish. The corpus is called SAD : Spanish Anorexia Detection corpus. In order to validate the effectiveness of the SAD corpus, we also propose several machine learning approaches for automatically detecting anorexia symptoms in the corpus. The good results obtained show that the application of textual classification methods is a promising option for developing this kind of system demonstrating that these tools could be used by professionals to help in the early detection of mental problems.

v-trel : Vocabulary Trainer for Tracing Word Relations-An Implicit Crowdsourcing Approach
Verena Lyding | Christos Rodosthenous | Federico Sangati | Umair ul Hassan | Lionel Nicolas | Alexander König | Jolita Horbacauskiene | Anisia Katinskaia

In this paper, we present our work on developing a vocabulary trainer that uses exercises generated from language resources such as ConceptNet and crowdsources the responses of the learners to enrich the language resource. We performed an empirical evaluation of our approach with 60 non-native speakers over two days, which shows that new entries to expand Concept-Net can efficiently be gathered through vocabulary exercises on word relations. We also report on the feedback gathered from the users and an expert from language teaching, and discuss the potential of the vocabulary trainer application from the user and language learner perspective. The feedback suggests that v-trel has educational potential, while in its current state some shortcomings could be identified.

Jointly Learning Author and Annotated Character N-gram Embeddings : A Case Study in Literary Text
Suraj Maharjan | Deepthi Mave | Prasha Shrestha | Manuel Montes | Fabio A. González | Thamar Solorio

An author’s way of presenting a story through his / her writing style has a great impact on whether the story will be liked by readers or not. In this paper, we learn representations for authors of literary texts together with representations for character n-grams annotated with their functional roles. We train a neural character n-gram based language model using an external corpus of literary texts and transfer learned representations for use in downstream tasks. We show that augmenting the knowledge from external works of authors produces results competitive with other style-based methods for book likability prediction, genre classification, and authorship attribution.

Generating Challenge Datasets for Task-Oriented Conversational Agents through Self-Play
Sourabh Majumdar | Serra Sinem Tekiroglu | Marco Guerini

End-to-end neural approaches are becoming increasingly common in conversational scenarios due to their promising performances when provided with sufficient amount of data. In this paper, we present a novel methodology to address the interpretability of neural approaches in such scenarios by creating challenge datasets using dialogue self-play over multiple tasks / intents. Dialogue self-play allows generating large amount of synthetic data ; by taking advantage of the complete control over the generation process, we show how neural approaches can be evaluated in terms of unseen dialogue patterns. We propose several out-of-pattern test cases each of which introduces a natural and unexpected user utterance phenomenon. As a proof of concept, we built a single and a multiple memory network, and show that these two architectures have diverse performances depending on the peculiar dialogue patterns.

Unsupervised Data Augmentation for Less-Resourced Languages with no Standardized Spelling
Alice Millour | Karën Fort

Building representative linguistic resources and NLP tools for non-standardized languages is challenging : when spelling is not determined by a norm, multiple written forms can be encountered for a given word, inducing a large proportion of out-of-vocabulary words. To embrace this diversity, we propose a methodology based on crowdsourced alternative spellings we use to extract rules applied to match OOV words with one of their spelling variants. This virtuous process enables the unsupervised augmentation of multi-variant lexicons without expert rule definition. We apply this multilingual methodology on Alsatian, a French regional language and provide an intrinsic evaluation of the correctness of the variants pairs, and an extrinsic evaluation on a downstream task. We show that in a low-resource scenario, 145 inital pairs can lead to the generation of 876 additional variant pairs, and a diminution of OOV words improving the part-of-speech tagging performance by 1 to 4 %.

Neural Feature Extraction for Contextual Emotion Detection
Elham Mohammadi | Hessam Amini | Leila Kosseim

This paper describes a new approach for the task of contextual emotion detection. The approach is based on a neural feature extractor, composed of a recurrent neural network with an attention mechanism, followed by a classifier, that can be neural or SVM-based. We evaluated the model with the dataset of the task 3 of SemEval 2019 (EmoContext), which includes short 3-turn conversations, tagged with 4 emotion classes. The best performing setup was achieved using ELMo word embeddings and POS tags as input, bidirectional GRU as hidden units, and an SVM as the final classifier. This configuration reached 69.93 % in terms of micro-average F1 score on the main 3 emotion classes, a score that outperformed the baseline system by 11.25 %.

A Fast and Accurate Partially Deterministic Morphological Analysis
Hajime Morita | Tomoya Iwakura

This paper proposes a partially deterministic morphological analysis method for improved processing speed. Maximum matching is a fast deterministic method for morphological analysis. However, the method tends to decrease performance due to lack of consideration of contextual information. In order to use maximum matching safely, we propose the use of Context Independent Strings (CISs), which are strings that do not have ambiguity in terms of morphological analysis. Our method first identifies CISs in a sentence using maximum matching without contextual information, then analyzes the unprocessed part of the sentence using a bi-gram-based morphological analysis model. We evaluate the method on a Japanese morphological analysis task. The experimental results show a 30 % reduction of running time while maintaining improved accuracy.

Marius Mosbach | Irina Stenger | Tania Avgustinova | Dietrich Klakow

Languages may be differently distant from each other and their mutual intelligibility may be asymmetric. In this paper we introduce, a toolbox for calculating linguistic distances and asymmetries between related languages. allows linguist experts to quickly and easily perform statistical analyses and compare those with experimental results. We demonstrate the efficacy of in an incomprehension experiment on two Slavic languages : Bulgarian and Russian. Using we were able to validate three methods to measure linguistic distances and asymmetries : Levenshtein distance, word adaptation surprisal, and conditional entropy as predictors of success in a reading intercomprehension experiment.

pdf bib
Axel-Cyrille Ngonga Ngomo | Diego Moussallem | Lorenz Bühmann

With the ever-growing generation of data for the Semantic Web comes an increasing demand for this data to be made available to non-semantic Web experts. One way of achieving this goal is to translate the languages of the Semantic Web into natural language. We present LD2NL, a framework that allows verbalizing the three key languages of the Semantic Web, i.e., RDF, OWL, and SPARQL. Our framework is based on a bottom-up approach to verbalization. We evaluated LD2NL in an open survey with 86 persons. Our results suggest that our framework can generate verbalizations that are close to natural languages and that can be easily understood by non-experts. Therewith, it enables non-domain experts to interpret Semantic Web data with more than 91 % of the accuracy of domain experts.

pdf bib
Nikola I. Nikolov | Richard Hahnloser

We propose a simple unsupervised method for extracting pseudo-parallel monolingual sentence pairs from comparable corpora representative of two different text styles, such as news articles and scientific papers. Our approach does not require a seed parallel corpus, but instead relies solely on hierarchical search over pre-trained embeddings of documents and sentences. We demonstrate the effectiveness of our method through automatic and extrinsic evaluation on text simplification from the normal to the Simple Wikipedia. We show that pseudo-parallel sentences extracted with our method not only supplement existing parallel data, but can even lead to competitive performance on their own.

pdf bib
Yutaro Omote | Akihiro Tamura | Takashi Ninomiya

This paper proposes a new Transformer neural machine translation model that incorporates syntactic distances between two source words into the relative position representations of the self-attention mechanism. In particular, the proposed model encodes pair-wise relative depths on a source dependency tree, which are differences between the depths of the two source words, in the encoder’s self-attention. The experiments show that our proposed model achieves 0.5 point gain in BLEU on the Asian Scientific Paper Excerpt Corpus Japanese-to-English translation task.

pdf bib
Esra Onal | Francis Tyers

This study is an attempt to contribute to documentation and revitalization efforts of endangered Laz language, a member of South Caucasian language family mainly spoken on northeastern coastline of Turkey. It constitutes the first steps to create a general computational model for word form recognition and production for Laz by building a rule-based morphological analyser using Helsinki Finite-State Toolkit (HFST). The evaluation results show that the analyser has a 64.9 % coverage over a corpus collected for this study with 111,365 tokens. We have also performed an error analysis on randomly selected 100 tokens from the corpus which are not covered by the analyser, and these results show that the errors mostly result from Turkish words in the corpus and missing stems in our lexicon.

pdf bib
Sean Papay | Sebastian Padó

The detection of quotations (i.e., reported speech, thought, and writing) has established itself as an NLP analysis task. However, state-of-the-art models have been developed on the basis of specific corpora and incorpo- rate a high degree of corpus-specific assumptions and knowledge, which leads to fragmentation. In the spirit of task-agnostic modeling, we present a corpus-agnostic neural model for quotation detection and evaluate it on three corpora that vary in language, text genre, and structural assumptions. The model (a) approaches the state-of-the-art on the corpora when using established feature sets and (b) shows reasonable performance even when us- ing solely word forms, which makes it applicable for non-standard (i.e., historical) corpora.

pdf bib
Vamsi Krishna Pendyala | Simran Sinha | Satya Prakash | Shriya Reddy | Anupam Jamatia

In today’s digital world of information, a fact verification system to disprove assertions made in speech, print media or online content is the need of the hour. We propose a system which would verify a claim against a source and classify the claim to be true, false, out-of-context or an inappropriate claim with respect to the textual source provided to the system. A true label is used if the claim is true, false if it is false, if the claim has no relation with the source then it is classified as out-of-context and if the claim can not be verified at all then it is classified as inappropriate. This would help us to verify a claim or a fact as well as know about the source or our knowledge base against which we are trying to verify our facts. We used a two-step approach to achieve our goal. At first, we retrieved evidence related to the claims from the textual source using the Term Frequency-Inverse Document Frequency(TF-IDF) vectors. Later we classified the claim-evidence pairs as true, false, inappropriate and out of context using a modified version of textual entailment module. Textual entailment module calculates the probability of each sentence supporting the claim, contradicting the claim or not providing any relevant information using Bi-LSTM network to assess the veracity of the claim. The accuracy of the best performing system is 64.49 %

pdf bib
Alejandro Piad-Morffis | Rafael Muñoz | Yoan Gutiérrez | Yudivian Almeida-Cruz | Suilan Estevez-Velarde | Andrés Montoyo

This paper presents Semantic Neural Networks (SNNs), a knowledge-aware component based on deep learning. SNNs can be trained to encode explicit semantic knowledge from an arbitrary knowledge base, and can subsequently be combined with other deep learning architectures. At prediction time, SNNs provide a semantic encoding extracted from the input data, which can be exploited by other neural network components to build extended representation models that can face alternative problems. The SNN architecture is defined in terms of the concepts and relations present in a knowledge base. Based on this architecture, a training procedure is developed. Finally, an experimental setup is presented to illustrate the behaviour and performance of a SNN for a specific NLP problem, in this case, opinion mining for the classification of movie reviews.

pdf bib
Artem Popov | Victor Bulatov | Darya Polyudova | Eugenia Veselova

One of the challenges during a task-oriented chatbot development is the scarce availability of the labeled training data. The best way of getting one is to ask the assessors to tag each dialogue according to its intent. Unfortunately, performing labeling without any provisional collection structure is difficult since the very notion of the intent is ill-defined. In this paper, we propose a hierarchical multimodal regularized topic model to obtain a first approximation of the intent set. Our rationale for hierarchical models usage is their ability to take into account several degrees of the dialogues relevancy. We attempt to build a model that can distinguish between subject-based (e.g. medicine and transport topics) and action-based (e.g. filing of an application and tracking application status) similarities. In order to achieve this, we divide set of all features into several groups according to part-of-speech analysis. Various feature groups are treated differently on different hierarchy levels.

pdf bib
Maja Popović | Sheila Castilho

The translation of ambiguous words still poses challenges for machine translation. In this work, we carry out a systematic quantitative analysis regarding the ability of different machine translation systems to disambiguate the source language conjunctions but and and. We evaluate specialised test sets focused on the translation of these two conjunctions. The test sets contain source languages that do not distinguish different variants of the given conjunction, whereas the target languages do. In total, we evaluate the conjunction but on 20 translation outputs, and the conjunction and on 10. All machine translation systems almost perfectly recognise one variant of the target conjunction, especially for the source conjunction but. The other target variant, however, represents a challenge for machine translation systems, with accuracy varying from 50 % to 95 % for but and from 20 % to 57 % for and. The major error for all systems is replacing the correct target variant with the opposite one.

pdf bib
Janarthanan Rajendran | Jatin Ganhotra | Xiaoxiao Guo | Mo Yu | Satinder Singh | Lazaros Polymenakos

Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources. While this is easy for humans, the present neural methods that rely on learned word embeddings may not perform well for these NLP tasks, especially in the presence of Out-Of-Vocabulary (OOV) or rare NEs. In this paper, we propose a solution for this problem, and present empirical evaluations on : a) a structured Question-Answering task, b) three related Goal-Oriented dialog tasks, and c) a Reading-Comprehension task, which show that the proposed method can be effective in dealing with both in-vocabulary and OOV NEs. We create extended versions of dialog bAbI tasks 1,2 and 4 and OOV versions of the CBT test set which are available at-

pdf bib
Tharindu Ranasinghe | Constantin Orasan | Ruslan Mitkov

Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. This paper evaluates Siamese recurrent architectures, a special type of neural networks, which are used here to measure STS. Several variants of the architecture are compared with existing methods

pdf bib
Mansour Saffar Mehrjardi | Amine Trabelsi | Osmar R. Zaiane

Self-attentional models are a new paradigm for sequence modelling tasks which differ from common sequence modelling methods, such as recurrence-based and convolution-based sequence learning, in the way that their architecture is only based on the attention mechanism. Self-attentional models have been used in the creation of the state-of-the-art models in many NLP task such as neural machine translation, but their usage has not been explored for the task of training end-to-end task-oriented dialogue generation systems yet. In this study, we apply these models on the DSTC2 dataset for training task-oriented chatbots. Our finding shows that self-attentional models can be exploited to create end-to-end task-oriented chatbots which not only achieve higher evaluation scores compared to recurrence-based models, but also do so more efficiently.

pdf bib
Giancarlo Salton | John Kelleher

Recurrent Neural Network Language Models composed of LSTM units, especially those augmented with an external memory, have achieved state-of-the-art results in Language Modeling. However, these models still struggle to process long sequences which are more likely to contain long-distance dependencies because of information fading. In this paper we demonstrate an effective mechanism for retrieving information in a memory augmented LSTM LM based on attending to information in memory in proportion to the number of timesteps the LSTM gating mechanism persisted the information.

pdf bib
Branislava Šandrih | Cvetana Krstev | Ranka Stankovic

In this paper we present a rule- and lexicon-based system for the recognition of Named Entities (NE) in Serbian newspaper texts that was used to prepare a gold standard annotated with personal names. It was further used to prepare training sets for four different levels of annotation, which were further used to train two Named Entity Recognition (NER) systems : Stanford and spaCy. All obtained models, together with a rule- and lexicon-based system were evaluated on two sample texts : a part of the gold standard and an independent newspaper text of approximately the same size. The results show that rule- and lexicon-based system outperforms trained models in all four scenarios (measured by F1), while Stanford models has the highest precision. All systems obtain best results in recognizing full names, while the recognition of first names only is rather poor. The produced models are incorporated into a Web platform NER&Beyond that provides various NE-related functions.

pdf bib
Wesley Santos | Ivandré Paraboni

We introduce a labelled corpus of stances about moral issues for the Brazilian Portuguese language, and present reference results for both the stance recognition and polarity classification tasks. The corpus is built from Twitter and further expanded with data elicited through crowd sourcing and labelled by their own authors. Put together, the corpus and reference results are expected to be taken as a baseline for further studies in the field of stance recognition and polarity classification from text.

pdf bib
Johannes Schäfer | Ben Burtenshaw

In recent years an increasing number of analyses of offensive language has been published, however, dealing mainly with the automatic detection and classification of isolated instances. In this paper we aim to understand the impact of offensive messages in online conversations diachronically, and in particular the change in offensiveness of dialogue turns. In turn, we aim to measure the progression of offence level as well as its direction-For example, whether a conversation is escalating or declining in offence. We present our method of extracting linear dialogues from tree-structured conversations in social media data and make our code publicly available. Furthermore, we discuss methods to analyse this dataset through changes in discourse offensiveness. Our paper includes two main contributions ; first, using a neural network to measure the level of offensiveness in conversations ; and second, the analysis of conversations around offensive comments using decoupling functions.

pdf bib
Lilia Simeonova | Kiril Simov | Petya Osenova | Preslav Nakov

We propose a morphologically informed model for named entity recognition, which is based on LSTM-CRF architecture and combines word embeddings, Bi-LSTM character embeddings, part-of-speech (POS) tags, and morphological information. While previous work has focused on learning from raw word input, using word and character embeddings only, we show that for morphologically rich languages, such as Bulgarian, access to POS information contributes more to the performance gains than the detailed morphological information. Thus, we show that named entity recognition needs only coarse-grained POS tags, but at the same time it can benefit from simultaneously using some POS information of different granularity. Our evaluation results over a standard dataset show sizeable improvements over the state-of-the-art for Bulgarian NER.

pdf bib
Sanja Štajner | Maja Popović

In this work, we investigate the possibility of using fully automatic text simplification system on the English source in machine translation (MT) for improving its translation into an under-resourced language. We use the state-of-the-art automatic text simplification (ATS) system for lexically and syntactically simplifying source sentences, which are then translated with two state-of-the-art English-to-Serbian MT systems, the phrase-based MT (PBMT) and the neural MT (NMT). We explore three different scenarios for using the ATS in MT : (1) using the raw output of the ATS ; (2) automatically filtering out the sentences with low grammaticality and meaning preservation scores ; and (3) performing a minimal manual correction of the ATS output. Our results show improvement in fluency of the translation regardless of the chosen scenario, and difference in success of the three scenarios depending on the MT approach used (PBMT or NMT) with regards to improving translation fluency and post-editing effort.

pdf bib
Kenneth Steimel | Daniel Dakota | Yue Chen | Sandra Kübler

Abusive language detection has received much attention in the last years, and recent approaches perform the task in a number of different languages. We investigate which factors have an effect on multilingual settings, focusing on the compatibility of data and annotations. In the current paper, we focus on English and German. Our findings show large differences in performance between the two languages. We find that the best performance is achieved by different classification algorithms. Sampling to address class imbalance issues is detrimental for German and beneficial for English. The only similarity that we find is that neither data set shows clear topics when we compare the results of topic modeling to the gold standard. Based on our findings, we can conclude that a multilingual optimization of classifiers is not possible even in settings where comparable data sets are used.

pdf bib
Taha Tobaili | Miriam Fernandez | Harith Alani | Sanaa Sharafeddine | Hazem Hajj | Goran Glavaš

Arabizi is an informal written form of dialectal Arabic transcribed in Latin alphanumeric characters. It has a proven popularity on chat platforms and social media, yet it suffers from a severe lack of natural language processing (NLP) resources. As such, texts written in Arabizi are often disregarded in sentiment analysis tasks for Arabic. In this paper we describe the creation of a sentiment lexicon for Arabizi that was enriched with word embeddings. The result is a new Arabizi lexicon consisting of 11.3 K positive and 13.3 K negative words. We evaluated this lexicon by classifying the sentiment of Arabizi tweets achieving an F1-score of 0.72. We provide a detailed error analysis to present the challenges that impact the sentiment analysis of Arabizi.

pdf bib
Ahmet Üstün | Gosse Bouma | Gertjan van Noord

Cross-lingual word embedding models learn a shared vector space for two or more languages so that words with similar meaning are represented by similar vectors regardless of their language. Although the existing models achieve high performance on pairs of morphologically simple languages, they perform very poorly on morphologically rich languages such as Turkish and Finnish. In this paper, we propose a morpheme-based model in order to increase the performance of cross-lingual word embeddings on morphologically rich languages. Our model includes a simple extension which enables us to exploit morphemes for cross-lingual mapping. We applied our model for the Turkish-Finnish language pair on the bilingual word translation task. Results show that our model outperforms the baseline models by 2 % in the nearest neighbour ranking.

pdf bib
Boris Velichkov | Ivan Koychev | Svetla Boytcheva

This paper presents an approach for prediction of results for sport events. Usually the sport forecasting approaches are based on structured data. We test the hypothesis that the sports results can be predicted by using natural language processing and machine learning techniques applied over interviews with the players shortly before the sport events. The proposed method uses deep learning contextual models, applied over unstructured textual documents. Several experiments were performed for interviews with players in individual sports like boxing, martial arts, and tennis. The results from the conducted experiment confirmed our initial assumption that an interview from a sportsman before a match contains information that can be used for prediction the outcome from it. Furthermore, the results provide strong evidence in support of our research hypothesis, that is, we can predict the outcome from a sport match analyzing an interview, given before it.

pdf bib
Martin Víta | Jakub Klímek

Natural language inference (NLI) is a key part of natural language understanding. The NLI task is defined as a decision problem whether a given sentence hypothesis can be inferred from a given text. Typically, we deal with a text consisting of just a single premise / single sentence, which is called a single premise entailment (SPE) task. Recently, a derived task of NLI from multiple premises (MPE) was introduced together with the first annotated corpus and corresponding several strong baselines. Nevertheless, the further development in MPE field requires accessibility of huge amounts of annotated data. In this paper we introduce a novel method for rapid deriving of MPE corpora from an existing NLI (SPE) annotated data that does not require any additional annotation work. This proposed approach is based on using an open information extraction system. We demonstrate the application of the method on a well known SNLI corpus. Over the obtained corpus, we provide the first evaluations as well as we state a strong baseline.

pdf bib
Son Vu Xuan | Thanh Vu | Son Tran | Lili Jiang

Given many recent advanced embedding models, selecting pre-trained word representation (i.e., word embedding) models best fit for a specific downstream NLP task is non-trivial. In this paper, we propose a systematic approach to extracting, evaluating, and visualizing multiple sets of pre-trained word embed- dings to determine which embeddings should be used in a downstream task. First, for extraction, we provide a method to extract a subset of the embeddings to be used in the downstream NLP tasks. Second, for evaluation, we analyse the quality of pre-trained embeddings using an input word analogy list. Finally, we visualize the embedding space to explore the embedded words interactively. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recogni- tion (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embed- dings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https : // / vietnlp / etnlp.

pdf bib
Fei Wang | Robert J. Ross | John D. Kelleher

The current state of the art for First Story Detection (FSD) are nearest neighbour-based models with traditional term vector representations ; however, one challenge faced by FSD models is that the document representation is usually defined by the vocabulary and term frequency from a background corpus. Consequently, the ideal background corpus should arguably be both large-scale to ensure adequate term coverage, and similar to the target domain in terms of the language distribution. However, given these two factors can not always be mutually satisfied, in this paper we examine whether the distributional similarity of common terms is more important than the scale of common terms for FSD. As a basis for our analysis we propose a set of metrics to quantitatively measure the scale of common terms and the distributional similarity between corpora. Using these metrics we rank different background corpora relative to a target corpus. We also apply models based on different background corpora to the FSD task. Our results show that term distributional similarity is more predictive of good FSD performance than the scale of common terms ; and, thus we demonstrate that a smaller recent domain-related corpus will be more suitable than a very large-scale general corpus for FSD.

pdf bib
Aleksander Wawer | Julita Sobiczewska

The goal of this paper is to use all available Polish language data sets to seek the best possible performance in supervised sentiment analysis of short texts. We use text collections with labelled sentiment such as tweets, movie reviews and a sentiment treebank, in three comparison modes. In the first, we examine the performance of models trained and tested on the same text collection using standard cross-validation (in-domain). In the second we train models on all available data except the given test collection, which we use for testing (one vs rest cross-domain). In the third, we train a model on one data set and apply it to another one (one vs one cross-domain). We compare wide range of methods including machine learning on bag-of-words representation, bidirectional recurrent neural networks as well as the most recent pre-trained architectures ELMO and BERT. We formulate conclusions as to cross-domain and in-domain performance of each method. Unsurprisingly, BERT turned out to be a strong performer, especially in the cross-domain setting. What is surprising however, is solid performance of the relatively simple multinomial Naive Bayes classifier, which performed equally well as BERT on several data sets.

pdf bib
Albert Weichselbraun | Adrian M.P. Brasoveanu | Philipp Kuntschik | Lyndon J.B. Nixon

Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards : (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results.

pdf bib
Olcay Taner Yıldız | Begüm Avar | Gökhan Ercan

In this paper, we present a two-level morphological analyzer for Turkish. The morphological analyzer consists of five main components : finite state transducer, rule engine for suffixation, lexicon, trie data structure, and LRU cache. We use Java language to implement finite state machine logic and rule engine, Xml language to describe the finite state transducer rules of the Turkish language, which makes the morphological analyzer both easily extendible and easily applicable to other languages. Empowered with the comprehensiveness of a lexicon of 54,000 bare-forms including 19,000 proper nouns, our morphological analyzer presents one of the most reliable analyzers produced so far. The analyzer is compared with Turkish morphological analyzers in the literature. By using LRU cache and a trie data structure, the system can analyze 100,000 words per second, which enables users to analyze huge corpora in a few hours.

pdf bib
Elaine Zosa | Mark Granroth-Wilding

Dynamic topic models (DTMs) capture the evolution of topics and trends in time series data. Current DTMs are applicable only to monolingual datasets. In this paper we present the multilingual dynamic topic model (ML-DTM), a novel topic model that combines DTM with an existing multilingual topic modeling method to capture cross-lingual topics that evolve across time. We present results of this model on a parallel German-English corpus of news articles and a comparable corpus of Finnish and Swedish news articles. We demonstrate the capability of ML-DTM to track significant events related to a topic and show that it finds distinct topics and performs as well as existing multilingual topic models in aligning cross-lingual topics.

pdf bib
Vilhjálmur Þorsteinsson | Hulda Óladóttir | Hrafn Loftsson

We present an open-source, wide-coverage context-free grammar (CFG) for Icelandic, and an accompanying parsing system. The grammar has over 5,600 nonterminals, 4,600 terminals and 19,000 productions in fully expanded form, with feature agreement constraints for case, gender, number and person. The parsing system consists of an enhanced Earley-based parser and a mechanism to select best-scoring parse trees from shared packed parse forests. Our parsing system is able to parse about 90 % of all sentences in articles published on the main Icelandic news websites. Preliminary evaluation with evalb shows an F-measure of 70.72 % on parsed sentences. Our system demonstrates that parsing a morphologically rich language using a wide-coverage CFG can be practical.


Proceedings of the Student Research Workshop Associated with RANLP 2019

Venelin Kovatchev | Irina Temnikova | Branislava Šandrih | Ivelina Nikolova

Classification Approaches to Identify Informative Tweets
Piush Aggarwal

Social media platforms have become prime forums for reporting news, with users sharing what they saw, heard or read on social media. News from social media is potentially useful for various stakeholders including aid organizations, news agencies, and individuals. However, social media also contains a vast amount of non-news content. For users to be able to draw on benefits from news reported on social media it is necessary to reliably identify news content and differentiate it from non-news. In this paper, we tackle the challenge of classifying a social post as news or not. To this end, we provide a new manually annotated dataset containing 2,992 tweets from 5 different topical categories. Unlike earlier datasets, it includes postings posted by personal users who do not promote a business or a product and are not affiliated with any organization. We also investigate various baseline systems and evaluate their performance on the newly generated dataset. Our results show that the best classifiers are the SVM and BERT models.

Multilingual Language Models for Named Entity Recognition in German and EnglishGerman and English
Antonia Baumann

We assess the language specificity of recent language models by exploring the potential of a multilingual language model. In particular, we evaluate Google’s multilingual BERT (mBERT) model on Named Entity Recognition (NER) in German and English. We expand the work on language model fine-tuning by Howard and Ruder (2018), applying it to the BERT architecture. We successfully reproduce the NER results published by Devlin et al. (2019).Our results show that the multilingual language model generalises well for NER in the chosen languages, matching the native model in English and comparing well with recent approaches for German. However, it does not benefit from the added fine-tuning methods.

Cross-Lingual Coreference : The Case of Bulgarian and EnglishBulgarian and English
Zara Kancheva

The paper presents several common approaches towards cross- and multi-lingual coreference resolution in a search of the most effective practices to be applied within the work on Bulgarian-English manual coreference annotation of a short story. The work aims at outlining the typology of the differences in the annotated parallel texts. The results of the research prove to be comparable with the tendencies observed in similar works on other Slavic languages and show surprising differences between the types of markables and their frequency in Bulgarian and English.

Evaluation of Stacked Embeddings for Bulgarian on the Downstream Tasks POS and NERCBulgarian on the Downstream Tasks POS and NERC
Iva Marinova

This paper reports on experiments with different stacks of word embeddings and evaluation of their usefulness for Bulgarian downstream tasks such as Named Entity Recognition and Classification (NERC) and Part-of-speech (POS) Tagging. Word embeddings stay in the core of the development of NLP, with several key language models being created over the last two years like FastText (CITATION), ElMo (CITATION), BERT (CITATION) and Flair (CITATION). Stacking or combining different word embeddings is another technique used in this paper and still not reported for Bulgarian NERC. Well-established architecture is used for the sequence tagging task such as BI-LSTM-CRF, and different pre-trained language models are combined in the embedding layer to decide which combination of them scores better.

Overview on NLP Techniques for Content-based Recommender Systems for BooksNLP Techniques for Content-based Recommender Systems for Books
Melania Berbatova

Recommender systems are an essential part of today’s largest websites. Without them, it would be hard for users to find the right products and content. One of the most popular methods for recommendations is content-based filtering. It relies on analysing product metadata, a great part of which is textual data. Despite their frequent use, there is still no standard procedure for developing and evaluating content-based recommenders. In this paper, we will first examine current approaches for designing, training and evaluating recommender systems based on textual data for books recommendations for GoodReads’ website. We will give critiques on existing methods and suggest how natural language techniques can be employed for the improvement of content-based recommenders.

Multilingual Complex Word Identification : Convolutional Neural Networks with Morphological and Linguistic Features
Kim Cheng Sheang

The paper is about our experiments with Complex Word Identification system using deep learning approach with word embeddings and engineered features.


Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)

Comparison between Automatic and Human Subtitling : A Case Study with Game of Thrones
Sabrina Baldo de Brébisson

In this submission, I would like to share my experiences with the software DeepL and the comparison analysis I have made with human subtitling offered by the DVD version of the corpus I have chosen as the topic of my study the eight Seasons of Game of Thrones. The idea is to study if the version proposed by an automatic translation program could be used as a first draft for the professional subtitler. It is expected that the latter would work on the form of the subtitles, that is to say mainly on their length, in a second step.

Parallel Corpus of Croatian-Italian Administrative TextsCroatian-Italian Administrative Texts
Marija Brkic Bakaric | Ivana Lalli Pacelat

Parallel corpora constitute a unique re-source for providing assistance to human translators. The selection and preparation of the parallel corpora also conditions the quality of the resulting MT engine. Since Croatian is a national language and Italian is officially recognized as a minority lan-guage in seven cities and twelve munici-palities of Istria County, a large amount of parallel texts is produced on a daily basis. However, there have been no attempts in using these texts for compiling a parallel corpus. A domain-specific sentence-aligned parallel Croatian-Italian corpus of administrative texts would be of high value in creating different language tools and resources. The aim of this paper is, therefore, to explore the value of parallel documents which are publicly available mostly in pdf format and to investigate the use of automatically-built dictionaries in corpus compilation. The effects that a document format and, consequently sentence splitting, and the dictionary input have on the sentence alignment process are manually evaluated.

What Influences the Features of Post-editese? A Preliminary Study
Sheila Castilho | Natália Resende | Ruslan Mitkov

While a number of studies have shown evidence of translationese phenomena, that is, statistical differences between original texts and translated texts (Gellerstam, 1986), results of studies searching for translationese features in postedited texts (what has been called posteditese (Daems et al., 2017)) have presented mixed results. This paper reports a preliminary study aimed at identifying the presence of post-editese features in machine-translated post-edited texts and at understanding how they differ from translationese features. We test the influence of factors such as post-editing (PE) levels (full vs. light), translation proficiency (professionals vs. students) and text domain (news vs. literary). Results show evidence of post-editese features, especially in light PE texts and in certain domains.

Human Evaluation of Neural Machine Translation : The Case of Deep Learning
Marie Escribe

Recent advances in artificial neural networks now have a great impact on translation technology. A considerable achievement was reached in this field with the publication of L’Apprentissage Profond. This book, originally written in English (Deep Learning), was entirely machine-translated into French and post-edited by several experts. In this context, it appears essential to have a clear vision of the performance of MT tools. Providing an evaluation of NMT is precisely the aim of the present research paper. To accomplish this objective, a framework for error categorisation was built and a comparative analysis of the raw translation output and the post-edited version was performed with the purpose of identifying recurring patterns of errors. The findings showed that even though some grammatical errors were spotted, the output was generally correct from a linguistic point of view. The most recurring errors are linked to the specialised terminology employed in this book. Further errors include parts of text that were not translated as well as edits based on stylistic preferences. The major part of the output was not acceptable as such and required several edits per segment, but some sentences were of publishable quality and were therefore left untouched in the final version.

The Chinese / English Political Interpreting Corpus (CEPIC): A New Electronic Resource for Translators and InterpretersChinese/English Political Interpreting Corpus (CEPIC): A New Electronic Resource for Translators and Interpreters
Jun Pan

The Chinese / English Political Interpreting Corpus (CEPIC) is a new electronic and open access resource developed for translators and interpreters, especially those working with political text types. Over 6 million word tokens in size, the online corpus consists of transcripts of Chinese (Cantonese & Putonghua) / English political speeches and their translated and interpreted texts. It includes rich meta-data and is POS-tagged and annotated with prosodic and paralinguistic features that are of concern to spoken language and interpreting. The online platform of the CEPIC features main functions including Keyword Search, Word Collocation and Expanded Keyword in Context, which are illustrated in the paper. The CEPIC can shed light on online translation and interpreting corpora development in the future.

Towards a Proactive MWE Terminological Platform for Cross-Lingual Mediation in the Age of Big DataMWE Terminological Platform for Cross-Lingual Mediation in the Age of Big Data
Benjamin K. Tsou | Kapo Chow | Junru Nie | Yuan Yuan

The emergence of China as a global economic power in the 21st Century has brought about surging needs for cross-lingual and cross-cultural mediation, typically performed by translators. Advances in Artificial Intelligence and Language Engineering have been bolstered by Machine learning and suitable Big Data cultivation. They have helped to meet some of the translator’s needs, though the technical specialists have not kept pace with the practical and expanding requirements in language mediation. One major technical and linguistic hurdle involves words outside the vocabulary of the translator or the lexical database he / she consults, especially Multi-Word Expressions (Compound Words) in technical subjects. A further problem is in the multiplicity of renditions of a term in the target language. This paper discusses a proactive approach following the successful extraction and application of sizable bilingual Multi-Word Expressions (Compound Words) for language mediation in technical subjects, which do not fall within the expertise of typical translators, who have inadequate appreciation of the range of new technical tools available to help him / her. Our approach draws on the personal reflections of translators and teachers of translation and is based on the prior R&D efforts relating to 300,000 comparable Chinese-English patents. The subsequent protocol we have developed aims to be proactive in meeting four identified practical challenges in technical translation (e.g. patents). It has broader economic implication in the Age of Big Data (Tsou et al, 2015) and Trade War, as the workload, if not, the challenges, increasingly can not be met by currently available front-line translators. We shall demonstrate how new tools can be harnessed to spearhead the application of language technology not only in language mediation but also in the teaching and learning of translation. It shows how a better appreciation of their needs may enhance the contributions of the technical specialists, and thus enhance the resultant synergetic benefits.

The Four Stages of Machine Translation Acceptance in a Freelancer’s Life
Maria Sgourou

Technology is a big challenge and raises many questions and issues when it comes to its application in the translation process, but translation’s biggest problem is not technology ; it is rather how technology is perceived by translators. MT developers and researchers should take into account this perception and move towards a more democratized approach to include the base of the translation industry and perhaps its more valuable asset, the translators.


Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources

George Giannakopoulos

RANLP 2019 Multilingual Headline Generation Task OverviewRANLP 2019 Multilingual Headline Generation Task Overview
Marina Litvak | John M. Conroy | Peter A. Rankel

The objective of the 2019 RANLP Multilingual Headline Generation (HG) Task is to explore some of the challenges highlighted by current state of the art approaches on creating informative headlines to news articles : non-descriptive headlines, out-of-domain training data, generating headlines from long documents which are not well represented by the head heuristic, and dealing with multilingual domain. This tasks makes available a large set of training data for headline generation and provides an evaluation methods for the task. Our data sets are drawn from Wikinews as well as Wikipedia. Participants were required to generate headlines for at least 3 languages, which were evaluated via automatic methods. A key aspect of the task is multilinguality. The task measures the performance of multilingual headline generation systems using the Wikipedia and Wikinews articles in multiple languages. The objective is to assess the performance of automatic headline generation techniques on text documents covering a diverse range of languages and topics outside the news domain.

MultiLing 2019 : Financial Narrative SummarisationMultiLing 2019: Financial Narrative Summarisation
Mahmoud El-Haj

The Financial Narrative Summarisation task at MultiLing 2019 aims to demonstrate the value and challenges of applying automatic text summarisation to financial text written in English, usually referred to as financial narrative disclosures. The task dataset has been extracted from UK annual reports published in PDF file format. The participants were asked to provide structured summaries, based on real-world, publicly available financial annual reports of UK firms by extracting information from different key sections. Participants were asked to generate summaries that reflects the analysis and assessment of the financial trend of the business over the past year, as provided by annual reports. The evaluation of the summaries was performed using AutoSummENG and Rouge automatic metrics. This paper focuses mainly on the data creation process.

The Summary Evaluation Task in the MultiLing-RANLP 2019 WorkshopMultiLing - RANLP 2019 Workshop
George Giannakopoulos | Nikiforos Pittaras

This report covers the summarization evaluation task, proposed to the summarization community via the MultiLing 2019 Workshop of the RANLP 2019 conference. The task aims to encourage the development of automatic summarization evaluation methods closely aligned with manual, human-authored summary grades and judgements. A multilingual setting is adopted, building upon a corpus of Wikinews articles across 6 languages (English, Arabic, Romanian, Greek, Spanish and Czech). The evaluation utilizes human (golden) and machine-generated (peer) summaries, which have been assigned human evaluation scores from previous MultiLing tasks. Using these resources, the original corpus is augmented with synthetic data, combining summary texts under three different strategies (reorder, merge and replace), each engineered to introduce noise in the summary in a controlled and quantifiable way. We estimate that the utilization of such data can extract and highlight useful attributes of summary quality estimation, aiding the creation of data-driven automatic methods with an increased correlation to human summary evaluations across domains and languages. This paper provides a brief description of the summary evaluation task, the data generation protocol and the resources made available by the MultiLing community, towards improving automatic summarization evaluation.

Multi-lingual Wikipedia Summarization and Title Generation On Low Resource CorpusWikipedia Summarization and Title Generation On Low Resource Corpus
Wei Liu | Lei Li | Zuying Huang | Yinan Liu

MultiLing 2019 Headline Generation Task on Wikipedia Corpus raised a critical and practical problem : multilingual task on low resource corpus. In this paper we proposed QDAS extractive summarization model enhanced by sentence2vec and try to apply transfer learning based on large multilingual pre-trained language model for Wikipedia Headline Generation task. We treat it as sequence labeling task and develop two schemes to handle with it. Experimental results have shown that large pre-trained model can effectively utilize learned knowledge to extract certain phrase using low resource supervised data.

Social Web Observatory : An entity-driven, holistic information summarization platform across sources
Leonidas Tsekouras | Georgios Petasis | Aris Kosmopoulos

The Social Web Observatory is an entity-driven, sentiment-aware, event summarization web platform, combining various methods and tools to overview trends across social media and news sources in Greek. SWO crawls, clusters and summarizes information following an entity-centric view of text streams, allowing to monitor the public sentiment towards a specific person, organization or other entity. In this paper, we overview the platform, outline the analysis pipeline and describe a user study aimed to quantify the usefulness of the system and especially the meaningfulness and coherence of discovered events.

EASY-M : Evaluation System for Multilingual SummarizersEASY-M: Evaluation System for Multilingual Summarizers

Automatic text summarization aims at producing a shorter version of a document (or a document set). Evaluation of summarization quality is a challenging task. Because human evaluations are expensive and evaluators often disagree between themselves, many researchers prefer to evaluate their systems automatically, with help of software tools. Such a tool usually requires a point of reference in the form of one or more human-written summaries for each text in the corpus. Then, a system-generated summary is compared to one or more human-written summaries, according to selected metrics. However, a single metric can not reflect all quality-related aspects of a summary. In this paper we present the EvAluation SYstem for Multilingual Summarization (EASY-M), which enables the evaluation of system-generated summaries in 17 different languages with several quality measures, based on comparison with their human-generated counterparts. The system also provides comparative results with two built-in baselines. The source code and both online and offline versions of EASY-M is freely available for the NLP community.