Other Workshops and Events (2019)


Contents



up

pdf (full)
bib (full)
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Long Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg

pdf bib
Projecting Temporal Properties, Events and Actions
Tim Fernando

Temporal notions based on a finite set A of properties are represented in strings, on which projections are defined that vary the granularity A. The structure of properties in A is elaborated to describe statives, events and actions, subject to a distinction in meaning (advocated by Levin and Rappaport Hovav) between what the lexicon prescribes and what a context of use supplies. The projections proposed are deployed as labels for records and record types amenable to finite-state methods.A of properties are represented in strings, on which projections are defined that vary the granularity A. The structure of properties in A is elaborated to describe statives, events and actions, subject to a distinction in meaning (advocated by Levin and Rappaport Hovav) between what the lexicon prescribes and what a context of use supplies. The projections proposed are deployed as labels for records and record types amenable to finite-state methods.

pdf bib
A Type-coherent, Expressive Representation as an Initial Step to Language Understanding
Gene Louis Kim | Lenhart Schubert

A growing interest in tasks involving language understanding by the NLP community has led to the need for effective semantic parsing and inference. Modern NLP systems use semantic representations that do not quite fulfill the nuanced needs for language understanding : adequately modeling language semantics, enabling general inferences, and being accurately recoverable. This document describes underspecified logical forms (ULF) for Episodic Logic (EL), which is an initial form for a semantic representation that balances these needs. ULFs fully resolve the semantic type structure while leaving issues such as quantifier scope, word sense, and anaphora unresolved ; they provide a starting point for further resolution into EL, and enable certain structural inferences without further resolution. This document also presents preliminary results of creating a hand-annotated corpus of ULFs for the purpose of training a precise ULF parser, showing a three-person pairwise interannotator agreement of 0.88 on confident annotations. We hypothesize that a divide-and-conquer approach to semantic parsing starting with derivation of ULFs will lead to semantic analyses that do justice to subtle aspects of linguistic meaning, and will enable construction of more accurate semantic parsers.

pdf bib
An Improved Approach for Semantic Graph Composition with CCGCCG
Austin Blodgett | Nathan Schneider

This paper builds on previous work using Combinatory Categorial Grammar (CCG) to derive a transparent syntax-semantics interface for Abstract Meaning Representation (AMR) parsing. We define new semantics for the CCG combinators that is better suited to deriving AMR graphs. In particular, we define relation-wise alternatives for the application and composition combinators : these require that the two constituents being combined overlap in one AMR relation. We also provide a new semantics for type raising, which is necessary for certain constructions. Using these mechanisms, we suggest an analysis of eventive nouns, which present a challenge for deriving AMR graphs. Our theoretical analysis will facilitate future work on robust and transparent AMR parsing using CCG.

pdf bib
Towards a Compositional Analysis of German Light Verb Constructions (LVCs) Combining Lexicalized Tree Adjoining Grammar (LTAG) with Frame SemanticsGerman Light Verb Constructions (LVCs) Combining Lexicalized Tree Adjoining Grammar (LTAG) with Frame Semantics
Jens Fleischhauer | Thomas Gamerschlag | Laura Kallmeyer | Simon Petitjean

Complex predicates formed of a semantically ‘light’ verbal head and a noun or verb which contributes the major part of the meaning are frequently referred to as ‘light verb constructions’ (LVCs). In the paper, we present a case study of LVCs with the German posture verb stehen ‘stand’. In our account, we model the syntactic as well as semantic composition of such LVCs by combining Lexicalized Tree Adjoining Grammar (LTAG) with frames. Starting from the analysis of the literal uses of posture verbs, we show how the meaning components of the literal uses are systematically exploited in the interpretation of stehen-LVCs. The paper constitutes an important step towards a compositional and computational analysis of LVCs. We show that LTAG allows us to separate constructional from lexical meaning components and that frames enable elegant generalizations over event types and related constraints.

pdf bib
Words are Vectors, Dependencies are Matrices : Learning Word Embeddings from Dependency Graphs
Paula Czarnowska | Guy Emerson | Ann Copestake

Distributional Semantic Models (DSMs) construct vector representations of word meanings based on their contexts. Typically, the contexts of a word are defined as its closest neighbours, but they can also be retrieved from its syntactic dependency relations. In this work, we propose a new dependency-based DSM. The novelty of our model lies in associating an independent meaning representation, a matrix, with each dependency-label. This allows it to capture specifics of the relations between words and contexts, leading to good performance on both intrinsic and extrinsic evaluation tasks. In addition to that, our model has an inherent ability to represent dependency chains as products of matrices which provides a straightforward way of handling further contexts of a word.

pdf bib
Temporal and Aspectual Entailment
Thomas Kober | Sander Bijl de Vroe | Mark Steedman

Inferences regarding Jane’s arrival in London from predications such as Jane is going to London or Jane has gone to London depend on tense and aspect of the predications. Tense determines the temporal location of the predication in the past, present or future of the time of utterance. The aspectual auxiliaries on the other hand specify the internal constituency of the event, i.e. whether the event of going to London is completed and whether its consequences hold at that time or not. While tense and aspect are among the most important factors for determining natural language inference, there has been very little work to show whether modern embedding models capture these semantic concepts. In this paper we propose a novel entailment dataset and analyse the ability of contextualised word representations to perform inference on predications across aspectual types and tenses. We show that they encode a substantial amount of information relating to tense and aspect, but fail to consistently model inferences that require reasoning with these semantic properties.

pdf bib
Aligning Open IE Relations and KB Relations using a Siamese Network Based on Word EmbeddingIE Relations and KB Relations using a Siamese Network Based on Word Embedding
Rifki Afina Putri | Giwon Hong | Sung-Hyon Myaeng

Open Information Extraction (Open IE) aims at generating entity-relation-entity triples from a large amount of text, aiming at capturing key semantics of the text. Given a triple, the relation expresses the type of semantic relation between the entities. Although relations from an Open IE system are more extensible than those used in a traditional Information Extraction system and a Knowledge Base (KB) such as Knowledge Graphs, the former lacks in semantics ; an Open IE relation is simply a sequence of words, whereas a KB relation has a predefined meaning. As a way to provide a meaning to an Open IE relation, we attempt to align it with one of the predefined set of relations used in a KB. Our approach is to use a Siamese network that compares two sequences of word embeddings representing an Open IE relation and a predefined KB relation. In order to make the approach practical, we automatically generate a training dataset using a distant supervision approach instead of relying on a hand-labeled dataset. Our experiment shows that the proposed method can capture the relational semantics better than the recent approaches.

pdf bib
The Effect of Context on Metaphor Paraphrase Aptness Judgments
Yuri Bizzoni | Shalom Lappin

We conduct two experiments to study the effect of context on metaphor paraphrase aptness judgments. The first is an AMT crowd source task in which speakers rank metaphor-paraphrase candidate sentence pairs in short document contexts for paraphrase aptness. In the second we train a composite DNN to predict these human judgments, first in binary classifier mode, and then as gradient ratings. We found that for both mean human judgments and our DNN’s predictions, adding document context compresses the aptness scores towards the center of the scale, raising low out-of-context ratings and decreasing high out-of-context scores. We offer a provisional explanation for this compression effect.

pdf bib
Predicting Word Concreteness and Imagery
Jean Charbonnier | Christian Wartena

Concreteness of words has been studied extensively in psycholinguistic literature. A number of datasets have been created with average values for perceived concreteness of words. We show that we can train a regression model on these data, using word embeddings and morphological features, that can predict these concreteness values with high accuracy. We evaluate the model on 7 publicly available datasets. Only for a few small subsets of these datasets prediction of concreteness values are found in the literature. Our results clearly outperform the reported results for these datasets.

pdf bib
Learning to Explicitate Connectives with Seq2Seq Network for Implicit Discourse Relation ClassificationSeq2Seq Network for Implicit Discourse Relation Classification
Wei Shi | Vera Demberg

Implicit discourse relation classification is one of the most difficult steps in discourse parsing. The difficulty stems from the fact that the coherence relation must be inferred based on the content of the discourse relational arguments. Therefore, an effective encoding of the relational arguments is of crucial importance. We here propose a new model for implicit discourse relation classification, which consists of a classifier, and a sequence-to-sequence model which is trained to generate a representation of the discourse relational arguments by trying to predict the relational arguments including a suitable implicit connective. Training is possible because such implicit connectives have been annotated as part of the PDTB corpus. Along with a memory network, our model could generate more refined representations for the task. And on the now standard 11-way classification, our method outperforms the previous state of the art systems on the PDTB benchmark on multiple settings including cross validation.

pdf bib
Using Multi-Sense Vector Embeddings for Reverse Dictionaries
Michael A. Hedderich | Andrew Yates | Dietrich Klakow | Gerard de Melo

Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically can not serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well.

pdf bib
Frame Identification as Categorization : Exemplars vs Prototypes in Embeddingland
Jennifer Sikos | Sebastian Padó

Categorization is a central capability of human cognition, and a number of theories have been developed to account for properties of categorization. Even though many tasks in semantics also involve categorization of some kind, theories of categorization do not play a major role in contemporary research in computational linguistics. This paper follows the idea that embedding-based models of semantics lend themselves well to being formulated in terms of classical categorization theories. The benefit is a space of model families that enables (a) the formulation of hypotheses about the impact of major design decisions, and (b) a transparent assessment of these decisions. We instantiate this idea on the task of frame-semantic frame identification. We define four models that cross two design variables : (a) the choice of prototype vs. exemplar categorization, corresponding to different degrees of generalization applied to the input ; and (b) the presence vs. absence of a fine-tuning step, corresponding to generic vs. task-adaptive categorization. We find that for frame identification, generalization and task-adaptive categorization both yield substantial benefits. Our prototype-based, fine-tuned model, which combines the best choices for these variables, establishes a new state of the art in frame identification.

up

pdf (full)
bib (full)
Proceedings of the 13th International Conference on Computational Semantics - Short Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Short Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg

pdf bib
Distributional Semantics in the Real World : Building Word Vector Representations from a Truth-Theoretic Model
Elizaveta Kuzmenko | Aurélie Herbelot

Distributional semantics models (DSMs) are known to produce excellent representations of word meaning, which correlate with a range of behavioural data. As lexical representations, they have been said to be fundamentally different from truth-theoretic models of semantics, where meaning is defined as a correspondence relation to the world. There are two main aspects to this difference : a) DSMs are built over corpus data which may or may not reflect ‘what is in the world’ ; b) they are built from word co-occurrences, that is, from lexical types rather than entities and sets. In this paper, we inspect the properties of a distributional model built over a set-theoretic approximation of ‘the real world’. To achieve this, we take the annotation a large database of images marked with objects, attributes and relations, convert the data into a representation akin to first-order logic and build several distributional models using various combinations of features. We evaluate those models over both relatedness and similarity datasets, demonstrating their effectiveness in standard evaluations. This allows us to conclude that, despite prior claims, truth-theoretic models are good candidates for building graded lexical representations of meaning.

pdf bib
Making Sense of Conflicting (Defeasible) Rules in the Controlled Natural Language ACE : Design of a System with Support for Existential Quantification Using SkolemizationACE: Design of a System with Support for Existential Quantification Using Skolemization
Martin Diller | Adam Wyner | Hannes Strass

We present the design of a system for making sense of conflicting rules expressed in a fragment of the prominent controlled natural language ACE, yet extended with means of expressing defeasible rules in the form of normality assumptions. The approach we describe is ultimately based on answer-set-programming (ASP) ; simulating existential quantification by using skolemization in a manner resembling a translation for ASP recently formalized in the context of -ASP. We discuss the advantages of this approach to building on the existing ACE interface to rule-systems, ACERules.

pdf bib
Distributional Interaction of Concreteness and Abstractness in VerbNoun Subcategorisation
Diego Frassinelli | Sabine Schulte im Walde

In recent years, both cognitive and computational research has provided empirical analyses of contextual co-occurrence of concrete and abstract words, partially resulting in inconsistent pictures. In this work we provide a more fine-grained description of the distributional nature in the corpus-based interaction of verbs and nouns within subcategorisation, by investigating the concreteness of verbs and nouns that are in a specific syntactic relationship with each other, i.e., subject, direct object, and prepositional object. Overall, our experiments show consistent patterns in the distributional representation of subcategorising and subcategorised concrete and abstract words. At the same time, the studies reveal empirical evidence why contextual abstractness represents a valuable indicator for automatic non-literal language identification.

up

pdf (full)
bib (full)
Proceedings of the 13th International Conference on Computational Semantics - Student Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Student Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg | Kathrein Abu Kwaik | Vladislav Maraev

pdf bib
A Dynamic Semantics for Causal Counterfactuals
Kenneth Lai | James Pustejovsky

Under the standard approach to counterfactuals, to determine the meaning of a counterfactual sentence, we consider the closest possible world(s) where the antecedent is true, and evaluate the consequent. Building on the standard approach, some researchers have found that the set of worlds to be considered is dependent on context ; it evolves with the discourse. Others have focused on how to define the distance between possible worlds, using ideas from causal modeling. This paper integrates the two ideas. We present a semantics for counterfactuals that uses a distance measure based on causal laws, that can also change over time. We show how our semantics can be implemented in the Haskell programming language.

pdf bib
Visual TTR-Modelling Visual Question Answering in Type Theory with RecordsTTR - Modelling Visual Question Answering in Type Theory with Records
Ronja Utescher

In this paper, I will describe a system that was developed for the task of Visual Question Answering. The system uses the rich type universe of Type Theory with Records (TTR) to model both the utterances about the image, the image itself and classifications made related to the two. At its most basic, the decision of whether any given predicate can be assigned to an object in the image is delegated to a CNN. Consequently, images can be judged as evidence for propositions. The end result is a model whose application of perceptual classifiers to a given image is guided by the accompanying utterance.

pdf bib
The Lexical Gap : An Improved Measure of Automated Image Description Quality
Austin Kershaw | Miroslaw Bober

The challenge of automatically describing images and videos has stimulated much research in Computer Vision and Natural Language Processing. In order to test the semantic abilities of new algorithms, we need reliable and objective ways of measuring progress. We show that standard evaluation measures do not take into account the semantic richness of a description, and give the impression that sparse machine descriptions outperform rich human descriptions. We introduce and test a new measure of semantic ability based on relative lexical diversity. We show how our measure can work alongside existing measures to achieve state of the art correlation with human judgement of quality. We also introduce a new dataset : Rich-Sparse Descriptions, which provides 2 K human and machine descriptions to stimulate interest into the semantic evaluation of machine descriptions.

pdf bib
Modeling language constructs with fuzzy sets : some approaches, examples and interpretations
Pavlo Kapustin | Michael Kapustin

We present and discuss a couple of approaches, including different types of projections, and some examples, discussing the use of fuzzy sets for modeling meaning of certain types of language constructs. We are mostly focusing on words other than adjectives and linguistic hedges as these categories are the most studied from before. We discuss logical and linguistic interpretations of membership functions. We argue that using fuzzy sets for modeling meaning of words and other natural language constructs, along with situations described with natural language is interesting both from purely linguistic perspective, and also as a knowledge representation for problems of computational linguistics and natural language processing.

pdf bib
Topological Data Analysis for Discourse Semantics?
Ketki Savle | Wlodek Zadrozny | Minwoo Lee

In this paper we present new results on applying topological data analysis to discourse structures. We show that topological information, extracted from the relationships between sentences can be used in inference, namely it can be applied to the very difficult legal entailment given in the COLIEE 2018 data set. Previous results of Doshi and Zadrozny (2018) and Gholizadeh et al. (2018) show that topological features are useful for classification. The applications of computational topology to entailment are novel in our view provide a new set of tools for discourse semantics : computational topology can perhaps provide a bridge between the brittleness of logic and the regression of neural networks. We discuss the advantages and disadvantages of using topological information, and some open problems such as explainability of the classifier decisions.

pdf bib
Semantic Frame Embeddings for Detecting Relations between Software Requirements
Waad Alhoshan | Riza Batista-Navarro | Liping Zhao

The early phases of requirements engineering (RE) deal with a vast amount of software requirements (i.e., requirements that define characteristics of software systems), which are typically expressed in natural language. Analysing such unstructured requirements, usually obtained from users’ inputs, is considered a challenging task due to the inherent ambiguity and inconsistency of natural language. To support such a task, methods based on natural language processing (NLP) can be employed. One of the more recent advances in NLP is the use of word embeddings for capturing contextual information, which can then be applied in word analogy tasks. In this paper, we describe a new resource, i.e., embedding-based representations of semantic frames in FrameNet, which was developed to support the detection of relations between software requirements. Our embeddings, which encapsulate contextual information at the semantic frame level, were trained on a large corpus of requirements (i.e., a collection of more than three million mobile application reviews). The similarity between these frame embeddings is then used as a basis for detecting semantic relatedness between software requirements. Compared with existing resources underpinned by word-level embeddings alone, and frame embeddings built upon pre-trained vectors, our proposed frame embeddings obtained better performance against judgements of an RE expert. These encouraging results demonstrate the strong potential of the resource in supporting RE analysis tasks (e.g., traceability), which we plan to investigate as part of our future work.

pdf bib
R-grams : Unsupervised Learning of Semantic Units in Natural LanguageR-grams: Unsupervised Learning of Semantic Units in Natural Language
Amaru Cuba Gyllensten | Ariel Ekgren | Magnus Sahlgren

This paper investigates data-driven segmentation using Re-Pair or Byte Pair Encoding-techniques. In contrast to previous work which has primarily been focused on subword units for machine translation, we are interested in the general properties of such segments above the word level. We call these segments r-grams, and discuss their properties and the effect they have on the token frequency distribution. The proposed approach is evaluated by demonstrating its viability in embedding techniques, both in monolingual and multilingual test settings. We also provide a number of qualitative examples of the proposed methodology, demonstrating its viability as a language-invariant segmentation procedure.




up

pdf (full)
bib (full)
Proceedings of the Sixth Workshop on Natural Language and Computer Science

pdf bib
Proceedings of the Sixth Workshop on Natural Language and Computer Science
Robin Cooper | Valeria de Paiva | Lawrence S. Moss

pdf bib
Distribution is not enough : going Firther
Andy Lücking | Robin Cooper | Staffan Larsson | Jonathan Ginzburg

Much work in contemporary computational semantics follows the distributional hypothesis (DH), which is understood as an approach to semantics according to which the meaning of a word is a function of its distribution over contexts which is represented as vectors (word embeddings) within a multi-dimensional semantic space. In practice, use is identified with occurrence in text corpora, though there are some efforts to use corpora containing multi-modal information. In this paper we argue that the distributional hypothesis is intrinsically misguided as a self-supporting basis for semantics, as Firth was entirely aware. We mention philosophical arguments concerning the lack of normativity within DH data. Furthermore, we point out the shortcomings of DH as a model of learning, by discussing a variety of linguistic classes that can not be learnt on a distributional basis, including indexicals, proper names, and wh-phrases. Instead of pursuing DH, we sketch an account of the problematic learning cases by integrating a rich, Firthian notion of dialogue context with interactive learning in signalling games backed by in probabilistic Type Theory with Records. We conclude that the success of the DH in computational semantics rests on a post hoc effect : DS presupposes a referential semantics on the basis of which utterances can be produced, comprehended and analysed in the first place.

pdf bib
Towards Natural Language Story Understanding with Rich Logical Schemas
Lane Lawley | Gene Louis Kim | Lenhart Schubert

Generating commonsense’’ knowledge for intelligent understanding and reasoning is a difficult, long-standing problem, whose scale challenges the capacity of any approach driven primarily by human input. Furthermore, approaches based on mining statistically repetitive patterns fail to produce the rich representations humans acquire, and fall far short of human efficiency in inducing knowledge from text. The idea of our approach to this problem is to provide a learning system with a head start consisting of a semantic parser, some basic ontological knowledge, and most importantly, a small set of very general schemas about the kinds of patterns of events (often purposive, causal, or socially conventional) that even a one- or two-year-old could reasonably be presumed to possess. We match these initial schemas to simple children’s stories, obtaining concrete instances, and combining and abstracting these into new candidate schemas. Both the initial and generated schemas are specified using a rich, expressive logical form. While modern approaches to schema reasoning often only use slot-and-filler structures, this logical form allows us to specify complex relations and constraints over the slots. Though formal, the representations are language-like, and as such readily relatable to NL text. The agents, objects, and other roles in the schemas are represented by typed variables, and the event variables can be related through partial temporal ordering and causal relations. To match natural language stories with existing schemas, we first parse the stories into an underspecified variant of the logical form used by the schemas, which is suitable for most concrete stories.

pdf bib
Questions in Dependent Type Semantics
Kazuki Watanabe | Koji Mineshima | Daisuke Bekki

Dependent Type Semantics (DTS ; Bekki and Mineshima, 2017) is a proof-theoretic compositional dynamic semantics based on Dependent Type Theory. The semantic representations for declarative sentences in DTS are types, based on the propositions-as-types paradigm. While type-theoretic semantics for natural language based on dependent type theory has been developed by many authors, how to assign semantic representations to interrogative sentences has been a non-trivial problem. In this study, we show how to provide the semantics of interrogative sentences in DTS. The basic idea is to assign the same type to both declarative sentences and interrogative sentences, partly building on the recent proposal in Inquisitive Semantics. We use Combinatory Categorial Grammar (CCG) as a syntactic component of DTS and implement our compositional semantics for interrogative sentences using ccg2lambda, a semantic parsing platform based on CCG. Based on the idea that the relationship between questions and answers can be formulated as the task of Recognizing Textual Entailment (RTE), we implement our inference system using proof assistant Coq and show that our system can deal with a wide range of question-answer relationships discussed in the formal semantics literature, including those with polar questions, alternative questions, and wh-questions.

up

pdf (full)
bib (full)
Proceedings of the IWCS Shared Task on Semantic Parsing

pdf bib
Proceedings of the IWCS Shared Task on Semantic Parsing
Lasha Abzianidze | Rik van Noord | Hessel Haagsma | Johan Bos

pdf bib
Neural Boxer at the IWCS Shared Task on DRS ParsingIWCS Shared Task on DRS Parsing
Rik van Noord

This paper describes our participation in the shared task of Discourse Representation Structure parsing. It follows the work of Van Noord et al. (2018), who employed a neural sequence-to-sequence model to produce DRSs, also exploiting linguistic information with multiple encoders. We provide a detailed look in the performance of this model and show that (i) the benefit of the linguistic features is evident across a number of experiments which vary the amount of training data and (ii) the model can be improved by applying a number of postprocessing methods to fix ill-formed output. Our model ended up in second place in the competition, with an F-score of 84.5.

up

pdf (full)
bib (full)
Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Alexandra Balahur | Roman Klinger | Veronique Hoste | Carlo Strapparava | Orphee De Clercq

pdf bib
A Soft Label Strategy for Target-Level Sentiment Classification
Da Yin | Xiao Liu | Xiuyu Wu | Baobao Chang

In this paper, we propose a soft label approach to target-level sentiment classification task, in which a history-based soft labeling model is proposed to measure the possibility of a context word as an opinion word. We also apply a convolution layer to extract local active features, and introduce positional weights to take relative distance information into consideration. In addition, we obtain more informative target representation by training with context tokens together to make deeper interaction between target and context tokens. We conduct experiments on SemEval 2014 datasets and the experimental results show that our approach significantly outperforms previous models and gives state-of-the-art results on these datasets.

pdf bib
Online abuse detection : the value of preprocessing and neural attention models
Dhruv Kumar | Robin Cohen | Lukasz Golab

We propose an attention-based neural network approach to detect abusive speech in online social networks. Our approach enables more effective modeling of context and the semantic relationships between words. We also empirically evaluate the value of text pre-processing techniques in addressing the challenge of out-of-vocabulary words in toxic content. Finally, we conduct extensive experiments on the Wikipedia Talk page datasets, showing improved predictive power over the previous state-of-the-art.

pdf bib
Using Structured Representation and Data : A Hybrid Model for Negation and Sentiment in Customer Service Conversations
Amita Misra | Mansurul Bhuiyan | Jalal Mahmud | Saurabh Tripathy

Twitter customer service interactions have recently emerged as an effective platform to respond and engage with customers. In this work, we explore the role of negation in customer service interactions, particularly applied to sentiment analysis. We define rules to identify true negation cues and scope more suited to conversational data than existing general review data. Using semantic knowledge and syntactic structure from constituency parse trees, we propose an algorithm for scope detection that performs comparable to state of the art BiLSTM. We further investigate the results of negation scope detection for the sentiment prediction task on customer service conversation data using both a traditional SVM and a Neural Network. We propose an antonym dictionary based method for negation applied to a combination CNN-LSTM for sentiment analysis. Experimental results show that the antonym-based method outperforms the previous lexicon-based and Neural Network methods.

pdf bib
When Numbers Matter ! : Detecting Sarcasm in Numerical Portions of Text
Abhijeet Dubey | Lakshya Kumar | Arpan Somani | Aditya Joshi | Pushpak Bhattacharyya

Research in sarcasm detection spans almost a decade. However a particular form of sarcasm remains unexplored : sarcasm expressed through numbers, which we estimate, forms about 11 % of the sarcastic tweets in our dataset. The sentence ‘Love waking up at 3 am’ is sarcastic because of the number. In this paper, we focus on detecting sarcasm in tweets arising out of numbers. Initially, to get an insight into the problem, we implement a rule-based and a statistical machine learning-based (ML) classifier. The rule-based classifier conveys the crux of the numerical sarcasm problem, namely, incongruity arising out of numbers. The statistical ML classifier uncovers the indicators i.e., features of such sarcasm. The actual system in place, however, are two deep learning (DL) models, CNN and attention network that obtains an F-score of 0.93 and 0.91 on our dataset of tweets containing numbers. To the best of our knowledge, this is the first line of research investigating the phenomenon of sarcasm arising out of numbers, culminating in a detector thereof.

pdf bib
Cross-lingual Subjectivity Detection for Resource Lean Languages
Ida Amini | Samane Karimi | Azadeh Shakery

Wide and universal changes in the web content due to the growth of web 2 applications increase the importance of user-generated content on the web. Therefore, the related research areas such as sentiment analysis, opinion mining and subjectivity detection receives much attention from the research community. Due to the diverse languages that web-users use to express their opinions and sentiments, research areas like subjectivity detection should present methods which are practicable on all languages. An important prerequisite to effectively achieve this aim is considering the limitations in resource-lean languages. In this paper, cross-lingual subjectivity detection on resource lean languages is investigated using two different approaches : a language-model based and a learning-to-rank approach. Experimental results show the impact of different factors on the performance of subjectivity detection methods using English resources to detect the subjectivity score of Persian documents. The experiments demonstrate that the proposed learning-to-rank method outperforms the baseline method in ranking documents based on their subjectivity degree.

up

pdf (full)
bib (full)
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

pdf bib
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Marcos Zampieri | Preslav Nakov | Shervin Malmasi | Nikola Ljubešić | Jörg Tiedemann | Ahmed Ali

pdf bib
Modeling Global Syntactic Variation in English Using Dialect ClassificationEnglish Using Dialect Classification
Jonathan Dunn

This paper evaluates global-scale dialect identification for 14 national varieties of English on both web-crawled data and Twitter data. The paper makes three main contributions : (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task ; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words ; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers.

pdf bib
Language Discrimination and Transfer Learning for Similar Languages : Experiments with Feature Combinations and Adaptation
Nianheng Wu | Eric DeMattos | Kwok Him So | Pin-zhen Chen | Çağrı Çöltekin

This paper describes the work done by team tearsofjoy participating in the VarDial 2019 Evaluation Campaign. We developed two systems based on Support Vector Machines : SVM with a flat combination of features and SVM ensembles. We participated in all language / dialect identification tasks, as well as the Moldavian vs. Romanian cross-dialect topic identification (MRC) task. Our team achieved first place in German Dialect identification (GDI) and MRC subtasks 2 and 3, second place in the simplified variant of Discriminating between Mainland and Taiwan variation of Mandarin Chinese (DMT) as well as Cuneiform Language Identification (CLI), and third and fifth place in DMT traditional and MRC subtask 1 respectively. In most cases, the SVM with a flat combination of features performed better than SVM ensembles. Besides describing the systems and the results obtained by them, we provide a tentative comparison between the feature combination methods, and present additional experiments with a method of adaptation to the test set, which may indicate potential pitfalls with some of the data sets.

pdf bib
Toward a deep dialectological representation of Indo-AryanIndo-Aryan
Chundra Cathcart

This paper presents a new approach to disentangling inter-dialectal and intra-dialectal relationships within one such group, the Indo-Aryan subgroup of Indo-European. We draw upon admixture models and deep generative models to tease apart historic language contact and language-specific behavior in the overall patterns of sound change displayed by Indo-Aryan languages. We show that a deep model of Indo-Aryan dialectology sheds some light on questions regarding inter-relationships among the Indo-Aryan languages, and performs better than a shallow model in terms of certain qualities of the posterior distribution (e.g., entropy of posterior distributions), and outline future pathways for model development.

pdf bib
Naive Bayes and BiLSTM Ensemble for Discriminating between Mainland and Taiwan Variation of Mandarin ChineseBayes and BiLSTM Ensemble for Discriminating between Mainland and Taiwan Variation of Mandarin Chinese
Li Yang | Yang Xiang

Automatic dialect identification is a more challengingctask than language identification, as it requires the ability to discriminate between varieties of one language. In this paper, we propose an ensemble based system, which combines traditional machine learning models trained on bag of n-gram fetures, with deep learning models trained on word embeddings, to solve the Discriminating between Mainland and Taiwan Variation of Mandarin Chinese (DMT) shared task at VarDial 2019. Our experiments show that a character bigram-trigram combination based Naive Bayes is a very strong model for identifying varieties of Mandarin Chinense. Through further ensemble of Navie Bayes and BiLSTM, our system (team : itsalexyang) achived an macro-averaged F1 score of 0.8530 and 0.8687 in two tracks.

pdf bib
BAM : A combination of deep and shallow models for German Dialect Identification.BAM: A combination of deep and shallow models for German Dialect Identification.
Andrei M. Butnaru

* This is a submission for the Third VarDial Evaluation Campaign * In this paper, we present a machine learning approach for the German Dialect Identification (GDI) Closed Shared Task of the DSL 2019 Challenge. The proposed approach combines deep and shallow models, by applying a voting scheme on the outputs resulted from a Character-level Convolutional Neural Networks (Char-CNN), a Long Short-Term Memory (LSTM) network, and a model based on String Kernels. The first model used is the Char-CNN model that merges multiple convolutions computed with kernels of different sizes. The second model is the LSTM network which applies a global max pooling over the returned sequences over time. Both models pass the activation maps to two fully-connected layers. The final model is based on String Kernels, computed on character p-grams extracted from speech transcripts. The model combines two blended kernel functions, one is the presence bits kernel, and the other is the intersection kernel. The empirical results obtained in the shared task prove that the approach can achieve good results. The system proposed in this paper obtained the fourth place with a macro-F1 score of 62.55 %

pdf bib
Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation
Vladislav Mikhailov | Lorenzo Tosi | Anastasia Khorosheva | Oleg Serikov

The paper describes initial experiments in data-driven cross-lingual morphological analysis of open-category words using a combination of unsupervised morpheme segmentation, annotation projection and an LSTM encoder-decoder model with attention. Our algorithm provides lemmatisation and morphological analysis generation for previously unseen low-resource language surface forms with only annotated data on the related languages given. Despite the inherently lossy annotation projection, we achieved the best lemmatisation F1-score in the VarDial 2019 Shared Task on Cross-Lingual Morphological Analysis for both Karachay-Balkar (Turkic languages, agglutinative morphology) and Sardinian (Romance languages, fusional morphology).

pdf bib
Neural and Linear Pipeline Approaches to Cross-lingual Morphological Analysis
Çağrı Çöltekin | Jeremy Barnes

This paper describes Tbingen-Oslo team’s participation in the cross-lingual morphological analysis task in the VarDial 2019 evaluation campaign. We participated in the shared task with a standard neural network model. Our model achieved analysis F1-scores of 31.48 and 23.67 on test languages Karachay-Balkar (Turkic) and Sardinian (Romance) respectively. The scores are comparable to the scores obtained by the other participants in both language families, and the analysis score on the Romance data set was also the best result obtained in the shared task. Besides describing the system used in our shared task participation, we describe another, simpler, model based on linear classifiers, and present further analyses using both models. Our analyses, besides revealing some of the difficult cases, also confirm that the usefulness of a source language in this task is highly correlated with the similarity of source and target languages.

pdf bib
SC-UPB at the VarDial 2019 Evaluation Campaign : Moldavian vs. Romanian Cross-Dialect Topic IdentificationSC-UPB at the VarDial 2019 Evaluation Campaign: Moldavian vs. Romanian Cross-Dialect Topic Identification
Cristian Onose | Dumitru-Clementin Cercel | Stefan Trausan-Matu

This paper describes our models for the Moldavian vs. Romanian Cross-Topic Identification (MRC) evaluation campaign, part of the VarDial 2019 workshop. We focus on the three subtasks for MRC : binary classification between the Moldavian (MD) and the Romanian (RO) dialects and two cross-dialect multi-class classification between six news topics, MD to RO and RO to MD. We propose several deep learning models based on long short-term memory cells, Bidirectional Gated Recurrent Unit (BiGRU) and Hierarchical Attention Networks (HAN). We also employ three word embedding models to represent the text as a low dimensional vector. Our official submission includes two runs of the BiGRU and HAN models for each of the three subtasks. The best submitted model obtained the following macro-averaged F1 scores : 0.708 for subtask 1, 0.481 for subtask 2 and 0.480 for the last one. Due to a read error caused by the quoting behaviour over the test file, our final submissions contained a smaller number of items than expected. More than 50 % of the submission files were corrupted. Thus, we also present the results obtained with the corrected labels for which the HAN model achieves the following results : 0.930 for subtask 1, 0.590 for subtask 2 and 0.687 for the third one.

pdf bib
Investigating Machine Learning Methods for Language and Dialect Identification of Cuneiform Texts
Ehsan Doostmohammadi | Minoo Nassajian

Identification of the languages written using cuneiform symbols is a difficult task due to the lack of resources and the problem of tokenization. The Cuneiform Language Identification task in VarDial 2019 addresses the problem of identifying seven languages and dialects written in cuneiform ; Sumerian and six dialects of Akkadian language : Old Babylonian, Middle Babylonian Peripheral, Standard Babylonian, Neo-Babylonian, Late Babylonian, and Neo-Assyrian. This paper describes the approaches taken by SharifCL team to this problem in VarDial 2019. The best result belongs to an ensemble of Support Vector Machines and a naive Bayes classifier, both working on character-level features, with macro-averaged F1-score of 72.10 %.

pdf bib
DTeam @ VarDial 2019 : Ensemble based on skip-gram and triplet loss neural networks for Moldavian vs. Romanian cross-dialect topic identificationDTeam @ VarDial 2019: Ensemble based on skip-gram and triplet loss neural networks for Moldavian vs. Romanian cross-dialect topic identification
Diana Tudoreanu

This paper presents the solution proposed by DTeam in the VarDial 2019 Evaluation Campaign for the Moldavian vs. Romanian cross-topic identification task. The solution proposed is a Support Vector Machines (SVM) ensemble composed of a two character-level neural networks. The first network is a skip-gram classification model formed of an embedding layer, three convolutional layers and two fully-connected layers. The second network has a similar architecture, but is trained using the triplet loss function.

pdf bib
Comparing Pipelined and Integrated Approaches to Dialectal Arabic Neural Machine TranslationArabic Neural Machine Translation
Pamela Shapiro | Kevin Duh

When translating diglossic languages such as Arabic, situations may arise where we would like to translate a text but do not know which dialect it is. A traditional approach to this problem is to design dialect identification systems and dialect-specific machine translation systems. However, under the recent paradigm of neural machine translation, shared multi-dialectal systems have become a natural alternative. Here we explore under which conditions it is beneficial to perform dialect identification for Arabic neural machine translation versus using a general system for all dialects.

up

pdf (full)
bib (full)
Proceedings of the Third Workshop on Structured Prediction for NLP

pdf bib
Proceedings of the Third Workshop on Structured Prediction for NLP
Andre Martins | Andreas Vlachos | Zornitsa Kozareva | Sujith Ravi | Gerasimos Lampouras | Vlad Niculae | Julia Kreutzer

pdf bib
Lightly-supervised Representation Learning with Global Interpretability
Andrew Zupon | Maria Alexeeva | Marco Valenzuela-Escárcega | Ajay Nagesh | Mihai Surdeanu

We propose a lightly-supervised approach for information extraction, in particular named entity classification, which combines the benefits of traditional bootstrapping, i.e., use of limited annotations and interpretability of extraction patterns, with the robust learning approaches proposed in representation learning. Our algorithm iteratively learns custom embeddings for both the multi-word entities to be extracted and the patterns that match them from a few example entities per category. We demonstrate that this representation-based approach outperforms three other state-of-the-art bootstrapping approaches on two datasets : CoNLL-2003 and OntoNotes. Additionally, using these embeddings, our approach outputs a globally-interpretable model consisting of a decision list, by ranking patterns based on their proximity to the average entity embedding in a given class. We show that this interpretable model performs close to our complete bootstrapping model, proving that representation learning can be used to produce interpretable models with small loss in performance. This decision list can be edited by human experts to mitigate some of that loss and in some cases outperform the original model.

pdf bib
Semi-Supervised Teacher-Student Architecture for Relation Extraction
Fan Luo | Ajay Nagesh | Rebecca Sharp | Mihai Surdeanu

Generating a large amount of training data for information extraction (IE) is either costly (if annotations are created manually), or runs the risk of introducing noisy instances (if distant supervision is used). On the other hand, semi-supervised learning (SSL) is a cost-efficient solution to combat lack of training data. In this paper, we adapt Mean Teacher (Tarvainen and Valpola, 2017), a denoising SSL framework to extract semantic relations between pairs of entities. We explore the sweet spot of amount of supervision required for good performance on this binary relation extraction task. Additionally, different syntax representations are incorporated into our models to enhance the learned representation of sentences. We evaluate our approach on the Google-IISc Distant Supervision (GDS) dataset, which removes test data noise present in all previous distance supervision datasets, which makes it a reliable evaluation benchmark (Jat et al., 2017). Our results show that the SSL Mean Teacher approach nears the performance of fully-supervised approaches even with only 10 % of the labeled corpus. Further, the syntax-aware model outperforms other syntax-free approaches across all levels of supervision.

up

pdf (full)
bib (full)
Proceedings of the Combined Workshop on Spatial Language Understanding (SpLU) and Grounded Communication for Robotics (RoboNLP)

pdf bib
Proceedings of the Combined Workshop on Spatial Language Understanding (SpLU) and Grounded Communication for Robotics (RoboNLP)
Archna Bhatia | Yonatan Bisk | Parisa Kordjamshidi | Jesse Thomason

pdf bib
From Virtual to Real : A Framework for Verbal Interaction with Robots
Eugene Joseph

A Natural Language Understanding (NLU) pipeline integrated with a 3D physics-based scene is a flexible way to develop and test language-based human-robot interaction, by virtualizing people, robot hardware and the target 3D environment. Here, interaction means both controlling robots using language and conversing with them about the user’s physical environment and her daily life. Such a virtual development framework was initially developed for the Bot Colony videogame launched on Steam in June 2014, and has been undergoing improvements since. The framework is focused of developing intuitive verbal interaction with various types of robots. Key robot functions (robot vision and object recognition, path planning and obstacle avoidance, task planning and constraints, grabbing and inverse kinematics), the human participants in the interaction, and the impact of gravity and other forces on the environment are all simulated using commercial 3D tools. The framework can be used as a robotics testbed : the results of our simulations can be compared with the output of algorithms in real robots, to validate such algorithms. A novelty of our framework is support for social interaction with robots-enabling robots to converse about people and objects in the user’s environment, as well as learning about human needs and everyday life topics from their owner.

pdf bib
Multi-modal Discriminative Model for Vision-and-Language Navigation
Haoshuo Huang | Vihan Jain | Harsh Mehta | Jason Baldridge | Eugene Ie

Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals. Successful agents must have the ability to parse natural language of varying linguistic styles, ground them in potentially unfamiliar scenes, plan and react with ambiguous environmental feedback. Generalization ability is limited by the amount of human annotated data. In particular, paired vision-language sequence data is expensive to collect. We develop a discriminator that evaluates how well an instruction explains a given path in VLN task using multi-modal alignment. Our study reveals that only a small fraction of the high-quality augmented data from Fried et al., as scored by our discriminator, is useful for training VLN agents with similar performance. We also show that a VLN agent warm-started with pre-trained components from the discriminator outperforms the benchmark success rates of 35.5 by 10 % relative measure.

up

pdf (full)
bib (full)
Proceedings of the Eighth Workshop on Speech and Language Processing for Assistive Technologies

pdf bib
Proceedings of the Eighth Workshop on Speech and Language Processing for Assistive Technologies
University of Sheffield Heidi Christensen | Florida Institute for Human Kristy Hollingshead | Machine Cognition | Boston College Emily Prud’hommeaux | University of Toronto Frank Rudzicz | Michigan Technological University Keith Vertanen

pdf bib
Permanent Magnetic Articulograph (PMA) vs Electromagnetic Articulograph (EMA) in Articulation-to-Speech Synthesis for Silent Speech InterfacePMA) vs Electromagnetic Articulograph (EMA) in Articulation-to-Speech Synthesis for Silent Speech Interface
Beiming Cao | Nordine Sebkhi | Ted Mau | Omer T. Inan | Jun Wang

Silent speech interfaces (SSIs) are devices that enable speech communication when audible speech is unavailable. Articulation-to-speech (ATS) synthesis is a software design in SSI that directly converts articulatory movement information into audible speech signals. Permanent magnetic articulograph (PMA) is a wireless articulator motion tracking technology that is similar to commercial, wired Electromagnetic Articulograph (EMA). PMA has shown great potential for practical SSI applications, because it is wireless. The ATS performance of PMA, however, is unknown when compared with current EMA. In this study, we compared the performance of ATS using a PMA we recently developed and a commercially available EMA (NDI Wave system). Datasets with same stimuli and size that were collected from tongue tip were used in the comparison. The experimental results indicated the performance of PMA was close to, although not as equally good as that of EMA. Furthermore, in PMA, converting the raw magnetic signals to positional signals did not significantly affect the performance of ATS, which support the future direction in PMA-based ATS can be focused on the use of positional signals to maximize the benefit of spatial analysis.

pdf bib
Investigating Speech Recognition for Improving Predictive AACAAC
Jiban Adhikary | Robbie Watling | Crystal Fletcher | Alex Stanage | Keith Vertanen

Making good letter or word predictions can help accelerate the communication of users of high-tech AAC devices. This is particularly important for real-time person-to-person conversations. We investigate whether per forming speech recognition on the speaking-side of a conversation can improve language model based predictions. We compare the accuracy of three plausible microphone deployment options and the accuracy of two commercial speech recognition engines (Google and IBM Watson). We found that despite recognition word error rates of 7-16 %, our ensemble of N-gram and recurrent neural network language models made predictions nearly as good as when they used the reference transcripts.

up

pdf (full)
bib (full)
Proceedings of the Second Workshop on Shortcomings in Vision and Language

pdf bib
Proceedings of the Second Workshop on Shortcomings in Vision and Language
Raffaella Bernardi | Raquel Fernandez | Spandana Gella | Kushal Kafle | Christopher Kanan | Stefan Lee | Moin Nabi

pdf bib
Referring to Objects in Videos Using Spatio-Temporal Identifying Descriptions
Peratham Wiriyathammabhum | Abhinav Shrivastava | Vlad Morariu | Larry Davis

This paper presents a new task, the grounding of spatio-temporal identifying descriptions in videos. Previous work suggests potential bias in existing datasets and emphasizes the need for a new data creation schema to better model linguistic structure. We introduce a new data collection scheme based on grammatical constraints for surface realization to enable us to investigate the problem of grounding spatio-temporal identifying descriptions in videos. We then propose a two-stream modular attention network that learns and grounds spatio-temporal identifying descriptions based on appearance and motion. We show that motion modules help to ground motion-related words and also help to learn in appearance modules because modular neural networks resolve task interference between modules. Finally, we propose a future challenge and a need for a robust system arising from replacing ground truth visual annotations with automatic video object detector and temporal event localization.

pdf bib
A Survey on Biomedical Image Captioning
John Pavlopoulos | Vasiliki Kougia | Ion Androutsopoulos

Image captioning applied to biomedical images can assist and accelerate the diagnosis process followed by clinicians. This article is the first survey of biomedical image captioning, discussing datasets, evaluation measures, and state of the art methods. Additionally, we suggest two baselines, a weak and a stronger one ; the latter outperforms all current state of the art systems on one of the datasets.

pdf bib
Learning Multilingual Word Embeddings Using Image-Text Data
Karan Singhal | Karthik Raman | Balder ten Cate

There has been significant interest recently in learning multilingual word embeddings in which semantically similar words across languages have similar embeddings. State-of-the-art approaches have relied on expensive labeled data, which is unavailable for low-resource languages, or have involved post-hoc unification of monolingual embeddings. In the present paper, we investigate the efficacy of multilingual embeddings learned from weakly-supervised image-text data. In particular, we propose methods for learning multilingual embeddings using image-text data, by enforcing similarity between the representations of the image and that of the text. Our experiments reveal that even without using any expensive labeled data, a bag-of-words-based embedding model trained on image-text data achieves performance comparable to the state-of-the-art on crosslingual semantic similarity tasks.

up

pdf (full)
bib (full)
Proceedings of the 2nd Clinical Natural Language Processing Workshop

pdf bib
Proceedings of the 2nd Clinical Natural Language Processing Workshop
Anna Rumshisky | Kirk Roberts | Steven Bethard | Tristan Naumann

pdf bib
An Analysis of Attention over Clinical Notes for Predictive Tasks
Sarthak Jain | Ramin Mohammadi | Byron C. Wallace

The shift to electronic medical records (EMRs) has engendered research into machine learning and natural language technologies to analyze patient records, and to predict from these clinical outcomes of interest. Two observations motivate our aims here. First, unstructured notes contained within EMR often contain key information, and hence should be exploited by models. Second, while strong predictive performance is important, interpretability of models is perhaps equally so for applications in this domain. Together, these points suggest that neural models for EMR may benefit from incorporation of attention over notes, which one may hope will both yield performance gains and afford transparency in predictions. In this work we perform experiments to explore this question using two EMR corpora and four different predictive tasks, that : (i) inclusion of attention mechanisms is critical for neural encoder modules that operate over notes fields in order to yield competitive performance, but, (ii) unfortunately, while these boost predictive performance, it is decidedly less clear whether they provide meaningful support for predictions.

pdf bib
Hierarchical Nested Named Entity Recognition
Zita Marinho | Afonso Mendes | Sebastião Miranda | David Nogueira

In the medical domain and other scientific areas, it is often important to recognize different levels of hierarchy in mentions, such as those related to specific symptoms or diseases associated with different anatomical regions. Unlike previous approaches, we build a transition-based parser that explicitly models an arbitrary number of hierarchical and nested mentions, and propose a loss that encourages correct predictions of higher-level mentions. We further introduce a set of modifier classes which introduces certain concepts that change the meaning of an entity, such as absence, or uncertainty about a given disease. Our proposed model achieves state-of-the-art results in medical entity recognition datasets, using both nested and hierarchical mentions.

pdf bib
Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models
Oren Melamud | Chaitanya Shivade

Large-scale clinical data is invaluable to driving many computational scientific advances today. However, understandable concerns regarding patient privacy hinder the open dissemination of such data and give rise to suboptimal siloed research. De-identification methods attempt to address these concerns but were shown to be susceptible to adversarial attacks. In this work, we focus on the vast amounts of unstructured natural language data stored in clinical notes and propose to automatically generate synthetic clinical notes that are more amenable to sharing using generative models trained on real de-identified records. To evaluate the merit of such notes, we measure both their privacy preservation properties as well as utility in training clinical NLP models. Experiments using neural language models yield notes whose utility is close to that of the real ones in some clinical NLP tasks, yet leave ample room for future improvements.

pdf bib
A Novel System for Extractive Clinical Note Summarization using EHR DataEHR Data
Jennifer Liang | Ching-Huei Tsou | Ananya Poddar

While much data within a patient’s electronic health record (EHR) is coded, crucial information concerning the patient’s care and management remain buried in unstructured clinical notes, making it difficult and time-consuming for physicians to review during their usual clinical workflow. In this paper, we present our clinical note processing pipeline, which extends beyond basic medical natural language processing (NLP) with concept recognition and relation detection to also include components specific to EHR data, such as structured data associated with the encounter, sentence-level clinical aspects, and structures of the clinical notes. We report on the use of this pipeline in a disease-specific extractive text summarization task on clinical notes, focusing primarily on progress notes by physicians and nurse practitioners. We show how the addition of EHR-specific components to the pipeline resulted in an improvement in our overall system performance and discuss the potential impact of EHR-specific components on other higher-level clinical NLP tasks.

pdf bib
Medical Entity Linking using Triplet Network
Ishani Mondal | Sukannya Purkayastha | Sudeshna Sarkar | Pawan Goyal | Jitesh Pillai | Amitava Bhattacharyya | Mahanandeeshwar Gattu

Entity linking (or Normalization) is an essential task in text mining that maps the entity mentions in the medical text to standard entities in a given Knowledge Base (KB). This task is of great importance in the medical domain. It can also be used for merging different medical and clinical ontologies. In this paper, we center around the problem of disease linking or normalization. This task is executed in two phases : candidate generation and candidate scoring. In this paper, we present an approach to rank the candidate Knowledge Base entries based on their similarity with disease mention. We make use of the Triplet Network for candidate ranking. While the existing methods have used carefully generated sieves and external resources for candidate generation, we introduce a robust and portable candidate generation scheme that does not make use of the hand-crafted rules. Experimental results on the standard benchmark NCBI disease dataset demonstrate that our system outperforms the prior methods by a significant margin.

pdf bib
Extracting Factual Min / Max Age Information from Clinical Trial StudiesMin/Max Age Information from Clinical Trial Studies
Yufang Hou | Debasis Ganguly | Léa Deleris | Francesca Bonin

Population age information is an essential characteristic of clinical trials. In this paper, we focus on extracting minimum and maximum (min / max) age values for the study samples from clinical research articles. Specifically, we investigate the use of a neural network model for question answering to address this information extraction task. The min / max age QA model is trained on the massive structured clinical study records from ClinicalTrials.gov. For each article, based on multiple min and max age values extracted from the QA model, we predict both actual min / max age values for the study samples and filter out non-factual age expressions. Our system improves the results over (i) a passage retrieval based IE system and (ii) a CRF-based system by a large margin when evaluated on an annotated dataset consisting of 50 research papers on smoking cessation.

pdf bib
Distinguishing Clinical Sentiment : The Importance of Domain Adaptation in Psychiatric Patient Health Records
Eben Holderness | Philip Cawkwell | Kirsten Bolton | James Pustejovsky | Mei-Hua Hall

Recently natural language processing (NLP) tools have been developed to identify and extract salient risk indicators in electronic health records (EHRs). Sentiment analysis, although widely used in non-medical areas for improving decision making, has been studied minimally in the clinical setting. In this study, we undertook, to our knowledge, the first domain adaptation of sentiment analysis to psychiatric EHRs by defining psychiatric clinical sentiment, performing an annotation project, and evaluating multiple sentence-level sentiment machine learning (ML) models. Results indicate that off-the-shelf sentiment analysis tools fail in identifying clinically positive or negative polarity, and that the definition of clinical sentiment that we provide is learnable with relatively small amounts of training data. This project is an initial step towards further refining sentiment analysis methods for clinical use. Our long-term objective is to incorporate the results of this project as part of a machine learning model that predicts inpatient readmission risk. We hope that this work will initiate a discussion concerning domain adaptation of sentiment analysis to the clinical setting.

pdf bib
Attention Neural Model for Temporal Relation Extraction
Sijia Liu | Liwei Wang | Vipin Chaudhary | Hongfang Liu

Neural network models have shown promise in the temporal relation extraction task. In this paper, we present the attention based neural network model to extract the containment relations within sentences from clinical narratives. The attention mechanism used on top of GRU model outperforms the existing state-of-the-art neural network models on THYME corpus in intra-sentence temporal relation extraction.

up

pdf (full)
bib (full)
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP

pdf bib
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP
Anna Rogers | Aleksandr Drozd | Anna Rumshisky | Yoav Goldberg

pdf bib
Characterizing the Impact of Geometric Properties of Word Embeddings on Task Performance
Brendan Whitaker | Denis Newman-Griffis | Aparajita Haldar | Hakan Ferhatosmanoglu | Eric Fosler-Lussier

Analysis of word embedding properties to inform their use in downstream NLP tasks has largely been studied by assessing nearest neighbors. However, geometric properties of the continuous feature space contribute directly to the use of embedding features in downstream models, and are largely unexplored. We consider four properties of word embedding geometry, namely : position relative to the origin, distribution of features in the vector space, global pairwise distances, and local pairwise distances. We define a sequence of transformations to generate new embeddings that expose subsets of these properties to downstream models and evaluate change in task performance to understand the contribution of each property to NLP models. We transform publicly available pretrained embeddings from three popular toolkits (word2vec, GloVe, and FastText) and evaluate on a variety of intrinsic tasks, which model linguistic information in the vector space, and extrinsic tasks, which use vectors as input to machine learning models. We find that intrinsic evaluations are highly sensitive to absolute position, while extrinsic tasks rely primarily on local similarity. Our findings suggest that future embedding models and post-processing techniques should focus primarily on similarity to nearby points in vector space.

pdf bib
The Influence of Down-Sampling Strategies on SVD Word Embedding StabilitySVD Word Embedding Stability
Johannes Hellrich | Bernd Kampe | Udo Hahn

The stability of word embedding algorithms, i.e., the consistency of the word representations they reveal when trained repeatedly on the same data set, has recently raised concerns. We here compare word embedding algorithms on three corpora of different sizes, and evaluate both their stability and accuracy. We find strong evidence that down-sampling strategies (used as part of their training procedures) are particularly influential for the stability of SVD-PPMI-type embeddings. This finding seems to explain diverging reports on their stability and lead us to a simple modification which provides superior stability as well as accuracy on par with skip-gram embedding

pdf bib
How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions
Navnita Nandakumar | Timothy Baldwin | Bahar Salehi

In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data. Our results from a pool of word-, character-, and document-level embbedings suggest that Word2vec performs the best, followed by FastText and Infersent. Moreover, we find that recently-proposed contextualised embedding models such as Bert and ELMo are not adept at handling non-compositionality in multiword expressions.

pdf bib
Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation TasksNMT with Paraphrase Recognition and Generation Tasks
Jörg Tiedemann | Yves Scherrer

In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones. We test this hypotheses by measuring the perplexity of such models when applied to paraphrases of the source language. The intuition is that an encoder produces better representations if a decoder is capable of recognizing synonymous sentences in the same language even though the model is never trained for that task. In our setup, we add 16 different auxiliary languages to a bidirectional bilingual baseline model (English-French) and test it with in-domain and out-of-domain paraphrases in English. The results show that the perplexity is significantly reduced in each of the cases, indicating that meaning can be grounded in translation. This is further supported by a study on paraphrase generation that we also include at the end of the paper.

pdf bib
CODAH : An Adversarially-Authored Question Answering Dataset for Common SenseCODAH: An Adversarially-Authored Question Answering Dataset for Common Sense
Michael Chen | Mike D’Arcy | Alisa Liu | Jared Fernandez | Doug Downey

Commonsense reasoning is a critical AI capability, but it is difficult to construct challenging datasets that test common sense. Recent neural question answering systems, based on large pre-trained models of language, have already achieved near-human-level performance on commonsense knowledge benchmarks. These systems do not possess human-level common sense, but are able to exploit limitations of the datasets to achieve human-level scores. We introduce the CODAH dataset, an adversarially-constructed evaluation dataset for testing common sense. CODAH forms a challenging extension to the recently-proposed SWAG dataset, which tests commonsense knowledge using sentence-completion questions that describe situations observed in video. To produce a more difficult dataset, we introduce a novel procedure for question acquisition in which workers author questions designed to target weaknesses of state-of-the-art neural question answering systems. Workers are rewarded for submissions that models fail to answer correctly both before and after fine-tuning (in cross-validation). We create 2.8k questions via this procedure and evaluate the performance of multiple state-of-the-art question answering systems on our dataset. We observe a significant gap between human performance, which is 95.3 %, and the performance of the best baseline accuracy of 65.3 % by the OpenAI GPT model.

pdf bib
Syntactic Interchangeability in Word Embedding Models
Daniel Hershcovich | Assaf Toledo | Alon Halfon | Noam Slonim

Nearest neighbors in word embedding models are commonly observed to be semantically similar, but the relations between them can vary greatly. We investigate the extent to which word embedding models preserve syntactic interchangeability, as reflected by distances between word vectors, and the effect of hyper-parameterscontext window size in particular. We use part of speech (POS) as a proxy for syntactic interchangeability, as generally speaking, words with the same POS are syntactically valid in the same contexts. We also investigate the relationship between interchangeability and similarity as judged by commonly-used word similarity benchmarks, and correlate the result with the performance of word embedding models on these benchmarks. Our results will inform future research and applications in the selection of word embedding model, suggesting a principle for an appropriate selection of the context window size parameter depending on the use-case.

pdf bib
Probing Biomedical Embeddings from Language Models
Qiao Jin | Bhuwan Dhingra | William Cohen | Xinghua Lu

Contextualized word embeddings derived from pre-trained language models (LMs) show significant improvements on downstream NLP tasks. Pre-training on domain-specific corpora, such as biomedical articles, further improves their performance. In this paper, we conduct probing experiments to determine what additional information is carried intrinsically by the in-domain trained contextualized embeddings. For this we use the pre-trained LMs as fixed feature extractors and restrict the downstream task models to not have additional sequence modeling layers. We compare BERT (Devlin et al. 2018), ELMo (Peters et al., 2018), BioBERT (Lee et al., 2019) and BioELMo, a biomedical version of ELMo trained on 10 M PubMed abstracts. Surprisingly, while fine-tuned BioBERT is better than BioELMo in biomedical NER and NLI tasks, as a fixed feature extractor BioELMo outperforms BioBERT in our probing tasks. We use visualization and nearest neighbor analysis to show that better encoding of entity-type and relational information leads to this superiority.

pdf bib
Dyr Bul Shchyl. Proxying Sound Symbolism With Word Embeddings
Ivan P. Yamshchikov | Viascheslav Shibaev | Alexey Tikhonov

This paper explores modern word embeddings in the context of sound symbolism. Using basic properties of the representations space one can construct semantic axes. A method is proposed to measure if the presence of individual sounds in a given word shifts its semantics of that word along a specific axis. It is shown that, in accordance with several experimental and statistical results, word embeddings capture symbolism for certain sounds.

up

pdf (full)
bib (full)
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science

pdf bib
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science
Svitlana Volkova | David Jurgens | Dirk Hovy | David Bamman | Oren Tsur

pdf bib
Geolocating Political Events in Text
Andrew Halterman

This work introduces a general method for automatically finding the locations where political events in text occurred. Using a novel set of 8,000 labeled sentences, I create a method to link automatically extracted events and locations in text. The model achieves human level performance on the annotation task and outperforms previous event geolocation systems. It can be applied to most event extraction systems across geographic contexts. I formalize the eventlocation linking task, describe the neural network model, describe the potential uses of such a system in political science, and demonstrate a workflow to answer an open question on the role of conventional military offensives in causing civilian casualties in the Syrian civil war.

pdf bib
Neural Network Prediction of Censorable Language
Kei Yin Ng | Anna Feldman | Jing Peng | Chris Leberknight

Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House’s annual Freedom on the Net report, more than half the world’s Internet users now live in a place where the Internet is censored or restricted. China has built the world’s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention ‘sensitive’ topics or authored by ‘sensitive’ users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50 % accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention.

pdf bib
Using time series and natural language processing to identify viral moments in the 2016 U.S. Presidential DebateU.S. Presidential Debate
Josephine Lukito | Prathusha K Sarma | Jordan Foley | Aman Abhishek

This paper proposes a method for identifying and studying viral moments or highlights during a political debate. Using a combined strategy of time series analysis and domain adapted word embeddings, this study provides an in-depth analysis of several key moments during the 2016 U.S. Presidential election. First, a time series outlier analysis is used to identify key moments during the debate. These moments had to result in a long-term shift in attention towards either Hillary Clinton or Donald Trump (i.e., a transient change outlier or an intervention, resulting in a permanent change in the time series). To assess whether these moments also resulted in a discursive shift, two corpora are produced for each potential viral moment (a pre-viral corpus and post-viral corpus). A domain adaptation layer learns weights to combine a generic and domain-specific (DS) word embedding into a domain adapted (DA) embedding. Words are then classified using a generic encoder+ classifier framework that relies on these word embeddings as inputs. Results suggest that both Clinton and Trump were able to induce discourse-shifting viral moments, though the former is much better at producing a topically-specific discursive shift.

pdf bib
Stance Classification, Outcome Prediction, and Impact Assessment : NLP Tasks for Studying Group Decision-MakingNLP Tasks for Studying Group Decision-Making
Elijah Mayfield | Alan Black

In group decision-making, the nuanced process of conflict and resolution that leads to consensus formation is closely tied to the quality of decisions made. Behavioral scientists rarely have rich access to process variables, though, as unstructured discussion transcripts are difficult to analyze. Here, we define ways for NLP researchers to contribute to the study of groups and teams. We introduce three tasks alongside a large new corpus of over 400,000 group debates on Wikipedia. We describe the tasks and their importance, then provide baselines showing that BERT contextualized word embeddings consistently outperform other language representations.

pdf bib
Modeling Behavioral Aspects of Social Media Discourse for Moral Classification
Kristen Johnson | Dan Goldwasser

Political discourse on social media microblogs, specifically Twitter, has become an undeniable part of mainstream U.S. politics. Given the length constraint of tweets, politicians must carefully word their statements to ensure their message is understood by their intended audience. This constraint often eliminates the context of the tweet, making automatic analysis of social media political discourse a difficult task. To overcome this challenge, we propose simultaneous modeling of high-level abstractions of political language, such as political slogans and framing strategies, with abstractions of how politicians behave on Twitter. These behavioral abstractions can be further leveraged as forms of supervision in order to increase prediction accuracy, while reducing the burden of annotation. In this work, we use Probabilistic Soft Logic (PSL) to build relational models to capture the similarities in language and behavior that obfuscate political messages on Twitter. When combined, these descriptors reveal the moral foundations underlying the discourse of U.S. politicians online, across differing governing administrations, showing how party talking points remain cohesive or change over time.across differing governing administrations, showing how party talking points remain cohesive or change over time.

up

pdf (full)
bib (full)
Proceedings of the Natural Legal Language Processing Workshop 2019

pdf bib
Proceedings of the Natural Legal Language Processing Workshop 2019
Nikolaos Aletras | Elliott Ash | Leslie Barrett | Daniel Chen | Adam Meyers | Daniel Preotiuc-Pietro | David Rosenberg | Amanda Stent

pdf bib
Scalable Methods for Annotating Legal-Decision Corpora
Lisa Ferro | John Aberdeen | Karl Branting | Craig Pfeifer | Alexander Yeh | Amartya Chakraborty

Recent research has demonstrated that judicial and administrative decisions can be predicted by machine-learning models trained on prior decisions. However, to have any practical application, these predictions must be explainable, which in turn requires modeling a rich set of features. Such approaches face a roadblock if the knowledge engineering required to create these features is not scalable. We present an approach to developing a feature-rich corpus of administrative rulings about domain name disputes, an approach which leverages a small amount of manual annotation and prototypical patterns present in the case documents to automatically extend feature labels to the entire corpus. To demonstrate the feasibility of this approach, we report results from systems trained on this dataset.

pdf bib
The Extent of Repetition in Contract Language
Dan Simonson | Daniel Broderick | Jonathan Herr

Contract language is repetitive (Anderson and Manns, 2017), but so is all language (Zipf, 1949). In this paper, we measure the extent to which contract language in English is repetitive compared with the language of other English language corpora. Contracts have much smaller vocabulary sizes compared with similarly sized non-contract corpora across multiple contract types, contain 1/5th as many hapax legomena, pattern differently on a log-log plot, use fewer pronouns, and contain sentences that are about 20 % more similar to one another than in other corpora. These suggest that the study of contracts in natural language processing controls for some linguistic phenomena and allows for more in depth study of others.

pdf bib
Sentence Boundary Detection in Legal Text
George Sanchez

In this paper, we examined several algorithms to detect sentence boundaries in legal text. Legal text presents challenges for sentence tokenizers because of the variety of punctuations and syntax of legal text. Out-of-the-box algorithms perform poorly on legal text affecting further analysis of the text. A novel and domain-specific approach is needed to detect sentence boundaries to further analyze legal text. We present the results of our investigation in this paper.

pdf bib
Litigation Analytics : Case Outcomes Extracted from US Federal Court DocketsUS Federal Court Dockets
Thomas Vacek | Ronald Teo | Dezhao Song | Timothy Nugent | Conner Cowling | Frank Schilder

Dockets contain a wealth of information for planning a litigation strategy, but the information is locked up in semi-structured text. Manually deriving the outcomes for each party (e.g., settlement, verdict) would be very labor intensive. Having such information available for every past court case, however, would be very useful for developing a strategy because it potentially reveals tendencies and trends of judges and courts and the opposing counsel. We used Natural Language Processing (NLP) techniques and deep learning methods allowing us to scale the automatic analysis of millions of US federal court dockets. The automatically extracted information is fed into a Litigation Analytics tool that is used by lawyers to plan how they approach concrete litigations.

pdf bib
Legal Area Classification : A Comparative Study of Text Classifiers on Singapore Supreme Court JudgmentsSingapore Supreme Court Judgments
Jerrold Soh | How Khang Lim | Ian Ernst Chai

This paper conducts a comparative study on the performance of various machine learning approaches for classifying judgments into legal areas. Using a novel dataset of 6,227 Singapore Supreme Court judgments, we investigate how state-of-the-art NLP methods compare against traditional statistical models when applied to a legal corpus that comprised few but lengthy documents. All approaches tested, including topic model, word embedding, and language model-based classifiers, performed well with as little as a few hundred judgments. However, more work needs to be done to optimize state-of-the-art methods for the legal domain.

up

pdf (full)
bib (full)
Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

pdf bib
Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation
Antoine Bosselut | Asli Celikyilmaz | Marjan Ghazvininejad | Srinivasan Iyer | Urvashi Khandelwal | Hannah Rashkin | Thomas Wolf

pdf bib
An Adversarial Learning Framework For A Persona-Based Multi-Turn Dialogue Model
Oluwatobi Olabiyi | Anish Khazane | Alan Salimov | Erik Mueller

In this paper, we extend the persona-based sequence-to-sequence (Seq2Seq) neural network conversation model to a multi-turn dialogue scenario by modifying the state-of-the-art hredGAN architecture to simultaneously capture utterance attributes such as speaker identity, dialogue topic, speaker sentiments and so on. The proposed system, phredGAN has a persona-based HRED generator (PHRED) and a conditional discriminator. We also explore two approaches to accomplish the conditional discriminator : (1) phredGANa, a system that passes the attribute representation as an additional input into a traditional adversarial discriminator, and (2) phredGANd, a dual discriminator system which in addition to the adversarial discriminator, collaboratively predicts the attribute(s) that generated the input utterance. To demonstrate the superior performance of phredGAN over the persona Seq2Seq model, we experiment with two conversational datasets, the Ubuntu Dialogue Corpus (UDC) and TV series transcripts from the Big Bang Theory and Friends. Performance comparison is made with respect to a variety of quantitative measures as well as crowd-sourced human evaluation. We also explore the trade-offs from using either variant of phredGAN on datasets with many but weak attribute modalities (such as with Big Bang Theory and Friends) and ones with few but strong attribute modalities (customer-agent interactions in Ubuntu dataset).

pdf bib
How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature
Simeng Sun | Ori Shapira | Ido Dagan | Ani Nenkova

We show that plain ROUGE F1 scores are not ideal for comparing current neural systems which on average produce different lengths. This is due to a non-linear pattern between ROUGE F1 and summary length. To alleviate the effect of length during evaluation, we have proposed a new method which normalizes the ROUGE F1 scores of a system by that of a random system with same average output length. A pilot human evaluation has shown that humans prefer short summaries in terms of the verbosity of a summary but overall consider longer summaries to be of higher quality. While human evaluations are more expensive in time and resources, it is clear that normalization, such as the one we proposed for automatic evaluation, will make human evaluations more meaningful.

pdf bib
BERT has a Mouth, and It Must Speak : BERT as a Markov Random Field Language ModelBERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model
Alex Wang | Kyunghyun Cho

We show that BERT (Devlin et al., 2018) is a Markov random field language model. This formulation gives way to a natural procedure to sample sentences from BERT. We generate from BERT and find that it can produce high quality, fluent generations. Compared to the generations of a traditional left-to-right language model, BERT generates sentences that are more diverse but of slightly worse quality.

pdf bib
Bilingual-GAN : A Step Towards Parallel Text GenerationGAN: A Step Towards Parallel Text Generation
Ahmad Rashid | Alan Do-Omri | Md. Akmal Haidar | Qun Liu | Mehdi Rezagholizadeh

Latent space based GAN methods and attention based sequence to sequence models have achieved impressive results in text generation and unsupervised machine translation respectively. Leveraging the two domains, we propose an adversarial latent space based model capable of generating parallel sentences in two languages concurrently and translating bidirectionally. The bilingual generation goal is achieved by sampling from the latent space that is shared between both languages. First two denoising autoencoders are trained, with shared encoders and back-translation to enforce a shared latent state between the two languages. The decoder is shared for the two translation directions. Next, a GAN is trained to generate synthetic ‘code’ mimicking the languages’ shared latent space. This code is then fed into the decoder to generate text in either language. We perform our experiments on Europarl and Multi30k datasets, on the English-French language pair, and document our performance using both supervised and unsupervised machine translation.

pdf bib
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings
Sarik Ghazarian | Johnny Wei | Aram Galstyan | Nanyun Peng

Despite advances in open-domain dialogue systems, automatic evaluation of such systems is still a challenging problem. Traditional reference-based metrics such as BLEU are ineffective because there could be many valid responses for a given context that share no common words with reference responses. A recent work proposed Referenced metric and Unreferenced metric Blended Evaluation Routine (RUBER) to combine a learning-based metric, which predicts relatedness between a generated response and a given query, with reference-based metric ; it showed high correlation with human judgments. In this paper, we explore using contextualized word embeddings to compute more accurate relatedness scores, thus better evaluation metrics. Experiments show that our evaluation metrics outperform RUBER, which is trained on static embeddings.

up

pdf (full)
bib (full)
Proceedings of the First Workshop on Narrative Understanding

pdf bib
Proceedings of the First Workshop on Narrative Understanding
David Bamman | Snigdha Chaturvedi | Elizabeth Clark | Madalina Fiterau | Mohit Iyyer

pdf bib
Extraction of Message Sequence Charts from Narrative History Text
Girish Palshikar | Sachin Pawar | Sangameshwar Patil | Swapnil Hingmire | Nitin Ramrakhiyani | Harsimran Bedi | Pushpak Bhattacharyya | Vasudeva Varma

In this paper, we advocate the use of Message Sequence Chart (MSC) as a knowledge representation to capture and visualize multi-actor interactions and their temporal ordering. We propose algorithms to automatically extract an MSC from a history narrative. For a given narrative, we first identify verbs which indicate interactions and then use dependency parsing and Semantic Role Labelling based approaches to identify senders (initiating actors) and receivers (other actors involved) for these interaction verbs. As a final step in MSC extraction, we employ a state-of-the art algorithm to temporally re-order these interactions. Our evaluation on multiple publicly available narratives shows improvements over four baselines.

up

pdf (full)
bib (full)
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

pdf bib
Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
Beatrice Alex | Stefania Degaetano-Ortlieb | Anna Kazantseva | Nils Reiter | Stan Szpakowicz

pdf bib
Modeling Word Emotion in Historical Language : Quantity Beats Supposed Stability in Seed Word Selection
Johannes Hellrich | Sven Buechel | Udo Hahn

To understand historical texts, we must be aware that languageincluding the emotional connotation attached to wordschanges over time. In this paper, we aim at estimating the emotion which is associated with a given word in former language stages of English and German. Emotion is represented following the popular Valence-Arousal-Dominance (VAD) annotation scheme. While being more expressive than polarity alone, existing word emotion induction methods are typically not suited for addressing it. To overcome this limitation, we present adaptations of two popular algorithms to VAD. To measure their effectiveness in diachronic settings, we present the first gold standard for historical word emotions, which was created by scholars with proficiency in the respective language stages and covers both English and German. In contrast to claims in previous work, our findings indicate that hand-selecting small sets of seed words with supposedly stable emotional meaning is actually harm- rather than helpful.

pdf bib
Are Fictional Voices Distinguishable? Classifying Character Voices in Modern Drama
Krishnapriya Vishnubhotla | Adam Hammond | Graeme Hirst

According to the literary theory of Mikhail Bakhtin, a dialogic novel is one in which characters speak in their own distinct voices, rather than serving as mouthpieces for their authors. We use text classification to determine which authors best achieve dialogism, looking at a corpus of plays from the late nineteenth and early twentieth centuries. We find that the SAGE model of text generation, which highlights deviations from a background lexical distribution, is an effective method of weighting the words of characters’ utterances. Our results show that it is indeed possible to distinguish characters by their speech in the plays of canonical writers such as George Bernard Shaw, whereas characters are clustered more closely in the works of lesser-known playwrights.

pdf bib
Automatic Alignment and Annotation Projection for Literary Texts
Uli Steinbach | Ines Rehbein

This paper presents a modular NLP pipeline for the creation of a parallel literature corpus, followed by annotation transfer from the source to the target language. The test case we use to evaluate our pipeline is the automatic transfer of quote and speaker mention annotations from English to German. We evaluate the different components of the pipeline and discuss challenges specific to literary texts. Our experiments show that after applying a reasonable amount of semi-automatic postprocessing we can obtain high-quality aligned and annotated resources for a new language.

pdf bib
Inferring missing metadata from environmental policy texts
Steven Bethard | Egoitz Laparra | Sophia Wang | Yiyun Zhao | Ragheb Al-Ghezi | Aaron Lien | Laura López-Hoffman

The National Environmental Policy Act (NEPA) provides a trove of data on how environmental policy decisions have been made in the United States over the last 50 years. Unfortunately, there is no central database for this information and it is too voluminous to assess manually. We describe our efforts to enable systematic research over US environmental policy by extracting and organizing metadata from the text of NEPA documents. Our contributions include collecting more than 40,000 NEPA-related documents, and evaluating rule-based baselines that establish the difficulty of three important tasks : identifying lead agencies, aligning document versions, and detecting reused text.

pdf bib
A framework for streamlined statistical prediction using topic models
Vanessa Glenny | Jonathan Tuke | Nigel Bean | Lewis Mitchell

In the Humanities and Social Sciences, there is increasing interest in approaches to information extraction, prediction, intelligent linkage, and dimension reduction applicable to large text corpora. With approaches in these fields being grounded in traditional statistical techniques, the need arises for frameworks whereby advanced NLP techniques such as topic modelling may be incorporated within classical methodologies. This paper provides a classical, supervised, statistical learning framework for prediction from text, using topic models as a data reduction method and the topics themselves as predictors, alongside typical statistical tools for predictive modelling. We apply this framework in a Social Sciences context (applied animal behaviour) as well as a Humanities context (narrative analysis) as examples of this framework. The results show that topic regression models perform comparably to their much less efficient equivalents that use individual words as predictors.

pdf bib
Graph convolutional networks for exploring authorship hypotheses
Tom Lippincott

This work considers a task from traditional literary criticism : annotating a structured, composite document with information about its sources. We take the Documentary Hypothesis, a prominent theory regarding the composition of the first five books of the Hebrew bible, extract stylistic features designed to avoid bias or overfitting, and train several classification models. Our main result is that the recently-introduced graph convolutional network architecture outperforms structurally-uninformed models. We also find that including information about the granularity of text spans is a crucial ingredient when employing hidden layers, in contrast to simple logistic regression. We perform error analysis at several levels, noting how some characteristic limitations of the models and simple features lead to misclassifications, and conclude with an overview of future work.

pdf bib
Semantics and Homothetic Clustering of Hafez Poetry
Arya Rahgozar | Diana Inkpen

We have created two sets of labels for Hafez (1315-1390) poems, using unsupervised learning. Our labels are the only semantic clustering alternative to the previously existing, hand-labeled, gold-standard classification of Hafez poems, to be used for literary research. We have cross-referenced, measured and analyzed the agreements of our clustering labels with Houman’s chronological classes. Our features are based on topic modeling and word embeddings. We also introduced a similarity of similarities’ features, we called homothetic clustering approach that proved effective, in case of Hafez’s small corpus of ghazals2. Although all our experiments showed different clusters when compared with Houman’s classes, we think they were valid in their own right to have provided further insights, and have proved useful as a contrasting alternative to Houman’s classes. Our homothetic clusterer and its feature design and engineering framework can be used for further semantic analysis of Hafez’s poetry and other similar literary research.

pdf bib
Computational Linguistics Applications for Multimedia Services
Kyeongmin Rim | Kelley Lynch | James Pustejovsky

We present Computational Linguistics Applications for Multimedia Services (CLAMS), a platform that provides access to computational content analysis tools for archival multimedia material that appear in different media, such as text, audio, image, and video. The primary goal of CLAMS is : (1) to develop an interchange format between multimodal metadata generation tools to ensure interoperability between tools ; (2) to provide users with a portable, user-friendly workflow engine to chain selected tools to extract meaningful analyses ; and (3) to create a public software development kit (SDK) for developers that eases deployment of analysis tools within the CLAMS platform. CLAMS is designed to help archives and libraries enrich the metadata associated with their mass-digitized multimedia collections, that would otherwise be largely unsearchable.

pdf bib
On the Feasibility of Automated Detection of Allusive Text Reuse
Enrique Manjavacas | Brian Long | Mike Kestemont

The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely commonly based on none or very few shared words. Arguably, lexical semantics can be resorted to since uncovering semantic relations between words has the potential to increase the support underlying the allusion and alleviate the lexical sparsity. A further obstacle is the lack of evaluation benchmark corpora, largely due to the highly interpretative character of the annotation process. In the present paper, we aim to elucidate the feasibility of automated allusion detection. We approach the matter from an Information Retrieval perspective in which referencing texts act as queries and referenced texts as relevant documents to be retrieved, and estimate the difficulty of benchmark corpus compilation by a novel inter-annotator agreement study on query segmentation. Furthermore, we investigate to what extent the integration of lexical semantic information derived from distributional models and ontologies can aid retrieving cases of allusive reuse. The results show that (i) despite low agreement scores, using manual queries considerably improves retrieval performance with respect to a windowing approach, and that (ii) retrieval performance can be moderately boosted with distributional semantics.

pdf bib
Sign Clustering and Topic Extraction in Proto-ElamiteProto-Elamite
Logan Born | Kate Kelley | Nishant Kambhatla | Carolyn Chen | Anoop Sarkar

We describe a first attempt at using techniques from computational linguistics to analyze the undeciphered proto-Elamite script. Using hierarchical clustering, n-gram frequencies, and LDA topic models, we both replicate results obtained by manual decipherment and reveal previously-unobserved relationships between signs. This demonstrates the utility of these techniques as an aid to manual decipherment.

up

pdf (full)
bib (full)
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications

pdf bib
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications
Vivi Nastase | Benjamin Roth | Laura Dietz | Andrew McCallum

pdf bib
Distantly Supervised Biomedical Knowledge Acquisition via Knowledge Graph Based Attention
Qin Dai | Naoya Inoue | Paul Reisert | Ryo Takahashi | Kentaro Inui

The increased demand for structured scientific knowledge has attracted considerable attention in extracting scientific relation from the ever growing scientific publications. Distant supervision is widely applied approach to automatically generate large amounts of labelled data with low manual annotation cost. However, distant supervision inevitably accompanies the wrong labelling problem, which will negatively affect the performance of Relation Extraction (RE). To address this issue, (Han et al., 2018) proposes a novel framework for jointly training both RE model and Knowledge Graph Completion (KGC) model to extract structured knowledge from non-scientific dataset. In this work, we firstly investigate the feasibility of this framework on scientific dataset, specifically on biomedical dataset. Secondly, to achieve better performance on the biomedical dataset, we extend the framework with other competitive KGC models. Moreover, we proposed a new end-to-end KGC model to extend the framework. Experimental results not only show the feasibility of the framework on the biomedical dataset, but also indicate the effectiveness of our extensions, because our extended model achieves significant and consistent improvements on distant supervised RE as compared with baselines.

pdf bib
Understanding the Polarity of Events in the Biomedical Literature : Deep Learning vs. Linguistically-informed Methods
Enrique Noriega-Atala | Zhengzhong Liang | John Bachman | Clayton Morrison | Mihai Surdeanu

An important task in the machine reading of biochemical events expressed in biomedical texts is correctly reading the polarity, i.e., attributing whether the biochemical event is a promotion or an inhibition. Here we present a novel dataset for studying polarity attribution accuracy. We use this dataset to train and evaluate several deep learning models for polarity identification, and compare these to a linguistically-informed model. The best performing deep learning architecture achieves 0.968 average F1 performance in a five-fold cross-validation study, a considerable improvement over the linguistically informed model average F1 of 0.862.

pdf bib
Dataset Mention Extraction and Classification
Animesh Prasad | Chenglei Si | Min-Yen Kan

Datasets are integral artifacts of empirical scientific research. However, due to natural language variation, their recognition can be difficult and even when identified, can often be inconsistently referred across and within publications. We report our approach to the Coleridge Initiative’s Rich Context Competition, which tasks participants with identifying dataset surface forms (dataset mention extraction) and associating the extracted mention to its referred dataset (dataset classification). In this work, we propose various neural baselines and evaluate these model on one-plus and zero-shot classification scenarios. We further explore various joint learning approaches-exploring the synergy between the tasks-and report the issues with such techniques.

pdf bib
Annotating with Pros and Cons of Technologies in Computer Science Papers
Hono Shirai | Naoya Inoue | Jun Suzuki | Kentaro Inui

This paper explores a task for extracting a technological expression and its pros / cons from computer science papers. We report ongoing efforts on an annotated corpus of pros / cons and an analysis of the nature of the automatic extraction task. Specifically, we show how to adapt the targeted sentiment analysis task for pros / cons extraction in computer science papers and conduct an annotation study. In order to identify the challenges of the automatic extraction task, we construct a strong baseline model and conduct an error analysis. The experiments show that pros / cons can be consistently annotated by several annotators, and that the task is challenging due to domain-specific knowledge. The annotated dataset is made publicly available for research purposes.

pdf bib
An Analysis of Deep Contextual Word Embeddings and Neural Architectures for Toponym Mention Detection in Scientific Publications
Matthew Magnusson | Laura Dietz

Toponym detection in scientific papers is an open task and a key first step in place entity enrichment of documents. We examine three common neural architectures in NLP : 1) convolutional neural network, 2) multi-layer perceptron (both applied in a sliding window context) and 3) bidirectional LSTM and apply contextual and non-contextual word embedding layers to these models. We find that deep contextual word embeddings improve the performance of the bi-LSTM with CRF neural architecture achieving the best performance when multiple layers of deep contextual embeddings are concatenated. Our best performing model achieves an average F1 of 0.910 when evaluated on overlap macro exceeding previous state-of-the-art models in the toponym detection task.

up

pdf (full)
bib (full)
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

pdf bib
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019
Amir Zeldes | Debopam Das | Erick Maziero Galani | Juliano Desiderato Antonio | Mikel Iruskieta

pdf bib
Nuclearity in RST and signals of coherence relationsRST and signals of coherence relations
Debopam Das

We investigate the relationship between the notion of nuclearity as proposed in Rhetorical Structure Theory (RST) and the signalling of coherence relations. RST relations are categorized as either mononuclear (comprising a nucleus and a satellite span) or multinuclear (comprising two or more nuclei spans). We examine how mononuclear relations (e.g., Antithesis, Condition) and multinuclear relations (e.g., Contrast, List) are indicated by relational signals, more particularly by discourse markers (e.g., because, however, if, therefore). We conduct a corpus study, examining the distribution of either type of relations in the RST Discourse Treebank (Carlson et al., 2002) and the distribution of discourse markers for those relations in the RST Signalling Corpus (Das et al., 2015). Our results show that discourse markers are used more often to signal multinuclear relations than mononuclear relations. The findings also suggest a complex relationship between the relation types and syntactic categories of discourse markers (subordinating and coordinating conjunctions).

pdf bib
Annotating Shallow Discourse Relations in Twitter ConversationsTwitter Conversations
Tatjana Scheffler | Berfin Aktaş | Debopam Das | Manfred Stede

We introduce our pilot study applying PDTB-style annotation to Twitter conversations. Lexically grounded coherence annotation for Twitter threads will enable detailed investigations of the discourse structure of conversations on social media. Here, we present our corpus of 185 threads and annotation, including an inter-annotator agreement study. We discuss our observations as to how Twitter discourses differ from written news text wrt. discourse connectives and relations. We confirm our hypothesis that discourse relations in written social media conversations are expressed differently than in (news) text. We find that in Twitter, connective arguments frequently are not full syntactic clauses, and that a few general connectives expressing EXPANSION and CONTINGENCY make up the majority of the explicit relations in our data.

pdf bib
A Discourse Signal Annotation System for RST TreesRST Trees
Luke Gessler | Yang Liu | Amir Zeldes

This paper presents a new system for open-ended discourse relation signal annotation in the framework of Rhetorical Structure Theory (RST), implemented on top of an online tool for RST annotation. We discuss existing projects annotating textual signals of discourse relations, which have so far not allowed simultaneously structuring and annotating words signaling hierarchical discourse trees, and demonstrate the design and applications of our interface by extending existing RST annotations in the freely available GUM corpus.

pdf bib
EusDisParser : improving an under-resourced discourse parser with cross-lingual dataEusDisParser: improving an under-resourced discourse parser with cross-lingual data
Mikel Iruskieta | Chloé Braud

Development of discourse parsers to annotate the relational discourse structure of a text is crucial for many downstream tasks. However, most of the existing work focuses on English, assuming a quite large dataset. Discourse data have been annotated for Basque, but training a system on these data is challenging since the corpus is very small. In this paper, we create the first demonstrator based on RST for Basque, and we investigate the use of data in another language to improve the performance of a Basque discourse parser. More precisely, we build a monolingual system using the small set of data available and investigate the use of multilingual word embeddings to train a system for Basque using data annotated for another language. We found that our approach to building a system limited to the small set of data available for Basque allowed us to get an improvement over previous approaches making use of many data annotated in other languages. At best, we get 34.78 in F1 for the full discourse structure. More data annotation is necessary in order to improve the results obtained with these techniques. We also describe which relations match with the gold standard, in order to understand these results.

pdf bib
Towards the Data-driven System for Rhetorical Parsing of Russian TextsRussian Texts
Artem Shelmanov | Dina Pisarevskaya | Elena Chistova | Svetlana Toldova | Maria Kobozeva | Ivan Smirnov

Results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank first Russian corpus annotated within RST framework are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.

pdf bib
The DISRPT 2019 Shared Task on Elementary Discourse Unit Segmentation and Connective DetectionDISRPT 2019 Shared Task on Elementary Discourse Unit Segmentation and Connective Detection
Amir Zeldes | Debopam Das | Erick Galani Maziero | Juliano Antonio | Mikel Iruskieta

In 2019, we organized the first iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms : the DISRPT Shared Task on Elementary Discourse Unit Segmentation and Connective Detection. In this paper we review the data included in the task, which cover 2.6 million manually annotated tokens from 15 datasets in 10 languages, survey and compare submitted systems and report on system performance on each task for both annotated and plain-tokenized versions of the data.

pdf bib
Multilingual segmentation based on neural networks and pre-trained word embeddings
Mikel Iruskieta | Kepa Bengoetxea | Aitziber Atutxa Salazar | Arantza Diaz de Ilarraza

The DISPRT 2019 workshop has organized a shared task aiming to identify cross-formalism and multilingual discourse segments. Elementary Discourse Units (EDUs) are quite similar across different theories. Segmentation is the very first stage on the way of rhetorical annotation. Still, each annotation project adopted several decisions with consequences not only on the annotation of the relational discourse structure but also at the segmentation stage. In this shared task, we have employed pre-trained word embeddings, neural networks (BiLSTM+CRF) to perform the segmentation. We report F1 results for 6 languages : Basque (0.853), English (0.919), French (0.907), German (0.913), Portuguese (0.926) and Spanish (0.868 and 0.769). Finally, we also pursued an error analysis based on clause typology for Basque and Spanish, in order to understand the performance of the segmenter.

pdf bib
Using Rhetorical Structure Theory to Assess Discourse Coherence for Non-native Spontaneous SpeechRhetorical Structure Theory to Assess Discourse Coherence for Non-native Spontaneous Speech
Xinhao Wang | Binod Gyawali | James V. Bruno | Hillary R. Molloy | Keelan Evanini | Klaus Zechner

This study aims to model the discourse structure of spontaneous spoken responses within the context of an assessment of English speaking proficiency for non-native speakers. Rhetorical Structure Theory (RST) has been commonly used in the analysis of discourse organization of written texts ; however, limited research has been conducted to date on RST annotation and parsing of spoken language, in particular, non-native spontaneous speech. Due to the fact that the measurement of discourse coherence is typically a key metric in human scoring rubrics for assessments of spoken language, we conducted research to obtain RST annotations on non-native spoken responses from a standardized assessment of academic English proficiency. Subsequently, automatic parsers were trained on these annotations to process non-native spontaneous speech. Finally, a set of features were extracted from automatically generated RST trees to evaluate the discourse structure of non-native spontaneous speech, which were then employed to further improve the validity of an automated speech scoring system.

up

pdf (full)
bib (full)
Proceedings of the Second Workshop on Computational Models of Reference, Anaphora and Coreference

pdf bib
Proceedings of the Second Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk | Sameer Pradhan | Yulia Grishina | Vincent Ng

pdf bib
Cross-lingual Incongruences in the Annotation of Coreference
Ekaterina Lapshinova-Koltunski | Sharid Loáiciga | Christian Hardmeier | Pauline Krielke

In the present paper, we deal with incongruences in English-German multilingual coreference annotation and present automated methods to discover them. More specifically, we automatically detect full coreference chains in parallel texts and analyse discrepancies in their annotations. In doing so, we wish to find out whether the discrepancies rather derive from language typological constraints, from the translation or the actual annotation process. The results of our study contribute to the referential analysis of similarities and differences across languages and support evaluation of cross-lingual coreference annotation. They are also useful for cross-lingual coreference resolution systems and contrastive linguistic studies.

pdf bib
Deep Cross-Lingual Coreference Resolution for Less-Resourced Languages : The Case of BasqueBasque
Gorka Urbizu | Ander Soraluze | Olatz Arregi

In this paper, we present a cross-lingual neural coreference resolution system for a less-resourced language such as Basque. To begin with, we build the first neural coreference resolution system for Basque, training it with the relatively small EPEC-KORREF corpus (45,000 words). Next, a cross-lingual coreference resolution system is designed. With this approach, the system learns from a bigger English corpus, using cross-lingual embeddings, to perform the coreference resolution for Basque. The cross-lingual system obtains slightly better results (40.93 F1 CoNLL) than the monolingual system (39.12 F1 CoNLL), without using any Basque language corpus to train it.

up

pdf (full)
bib (full)
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Emmanuele Chersoni | Cassandra Jacobs | Alessandro Lenci | Tal Linzen | Laurent Prévot | Enrico Santus

pdf bib
Priming vs. Inhibition of Optional Infinitival to
Robin Melnick | Thomas Wasow

The word to that precedes verbs in English infinitives is optional in at least two environments : in what Wasow et al. (2015) previously called the do-be construction, and in the complement of help, which we explore in the present work. In the do-be construction, Wasow et al. found that a preceding infinitival to increases the use of following optional to, but the use of to in the complement of help is reduced following to help. We examine two hypotheses regarding why the same function word is primed by prior use in one construction and inhibited in another. We then test predictions made by the two hypotheses, finding support for one of them.

pdf bib
Simulating Spanish-English Code-Switching : El Modelo Est Generating Code-SwitchesSpanish-English Code-Switching: El Modelo Está Generating Code-Switches
Chara Tsoukala | Stefan L. Frank | Antal van den Bosch | Jorge Valdés Kroff | Mirjam Broersma

Multilingual speakers are able to switch from one language to the other (code-switch) between or within sentences. Because the underlying cognitive mechanisms are not well understood, in this study we use computational cognitive modeling to shed light on the process of code-switching. We employed the Bilingual Dual-path model, a Recurrent Neural Network of bilingual sentence production (Tsoukala et al., 2017), and simulated sentence production in simultaneous Spanish-English bilinguals. Our first goal was to investigate whether the model would code-switch without being exposed to code-switched training input. The model indeed produced code-switches even without any exposure to such input and the patterns of code-switches are in line with earlier linguistic work (Poplack,1980). The second goal of this study was to investigate an auxiliary phrase asymmetry that exists in Spanish-English code-switched production. Using this cognitive model, we examined a possible cause for this asymmetry. To our knowledge, this is the first computational cognitive model that aims to simulate code-switched sentence production.

pdf bib
A Modeling Study of the Effects of Surprisal and Entropy in Perceptual Decision Making of an Adaptive Agent
Pyeong Whan Cho | Richard Lewis

Processing difficulty in online language comprehension has been explained in terms of surprisal and entropy reduction. Although both hypotheses have been supported by experimental data, we do not fully understand their relative contributions on processing difficulty. To develop a better understanding, we propose a mechanistic model of perceptual decision making that interacts with a simulated task environment with temporal dynamics. The proposed model collects noisy bottom-up evidence over multiple timesteps, integrates it with its top-down expectation, and makes perceptual decisions, producing processing time data directly without relying on any linking hypothesis. Temporal dynamics in the task environment was determined by a simple finite-state grammar, which was designed to create the situations where the surprisal and entropy reduction hypotheses predict different patterns. After the model was trained to maximize rewards, the model developed an adaptive policy and both surprisal and entropy effects were observed especially in a measure reflecting earlier processing.

pdf bib
Dependency Parsing with your Eyes : Dependency Structure Predicts Eye Regressions During Reading
Alessandro Lopopolo | Stefan L. Frank | Antal van den Bosch | Roel Willems

Backward saccades during reading have been hypothesized to be involved in structural reanalysis, or to be related to the level of text difficulty. We test the hypothesis that backward saccades are involved in online syntactic analysis. If this is the case we expect that saccades will coincide, at least partially, with the edges of the relations computed by a dependency parser. In order to test this, we analyzed a large eye-tracking dataset collected while 102 participants read three short narrative texts. Our results show a relation between backward saccades and the syntactic structure of sentences.

pdf bib
Testing a Minimalist Grammar Parser on Italian Relative Clause AsymmetriesMinimalist Grammar Parser on Italian Relative Clause Asymmetries
Aniello De Santo

Stabler’s (2013) top-down parser for Minimalist grammars has been used to account for off-line processing preferences across a variety of seemingly unrelated phenomena cross-linguistically, via complexity metrics measuring memory burden. This paper extends the empirical coverage of the model by looking at the processing asymmetries of Italian relative clauses, as I discuss the relevance of these constructions in evaluating plausible structure-driven models of processing difficulty.

pdf bib
The Development of Abstract Concepts in Children’s Early Lexical Networks
Abdellah Fourtassi | Isaac Scheinfeld | Michael Frank

How do children learn abstract concepts such as animal vs. artifact? Previous research has suggested that such concepts can partly be derived using cues from the language children hear around them. Following this suggestion, we propose a model where we represent the children’ developing lexicon as an evolving network. The nodes of this network are based on vocabulary knowledge as reported by parents, and the edges between pairs of nodes are based on the probability of their co-occurrence in a corpus of child-directed speech. We found that several abstract categories can be identified as the dense regions in such networks. In addition, our simulations suggest that these categories develop simultaneously, rather than sequentially, thanks to the children’s word learning trajectory which favors the exploration of the global conceptual space.

pdf bib
Verb-Second Effect on Quantifier Scope Interpretation
Asad Sayeed | Matthias Lindemann | Vera Demberg

Sentences like Every child climbed a tree have at least two interpretations depending on the precedence order of the universal quantifier and the indefinite. Previous experimental work explores the role that different mechanisms such as semantic reanalysis and world knowledge may have in enabling each interpretation. This paper discusses a web-based task that uses the verb-second characteristic of German main clauses to estimate the influence of word order variation over world knowledge.

pdf bib
Neural Models of the Psychosemantics of ‘Most’
Lewis O’Sullivan | Shane Steinert-Threlkeld

How are the meanings of linguistic expressions related to their use in concrete cognitive tasks? Visual identification tasks show human speakers can exhibit considerable variation in their understanding, representation and verification of certain quantifiers. This paper initiates an investigation into neural models of these psycho-semantic tasks. We trained two types of network a convolutional neural network (CNN) model and a recurrent model of visual attention (RAM) on the most verification task from Pietroski2009, manipulating the visual scene and novel notions of task duration. Our results qualitatively mirror certain features of human performance (such as sensitivity to the ratio of set sizes, indicating a reliance on approximate number) while differing in interesting ways (such as exhibiting a subtly different pattern for the effect of image type). We conclude by discussing the prospects for using neural models as cognitive models of this and other psychosemantic tasks.

pdf bib
The Role of Utterance Boundaries and Word Frequencies for Part-of-speech Learning in Brazilian Portuguese Through Distributional AnalysisBrazilian Portuguese Through Distributional Analysis
Pablo Picasso Feliciano de Faria

In this study, we address the problem of part-of-speech (or syntactic category) learning during language acquisition through distributional analysis of utterances. A model based on Redington et al.’s (1998) distributional learner is used to investigate the informativeness of distributional information in Brazilian Portuguese (BP). The data provided to the learner comes from two publicly available corpora of child directed speech. We present preliminary results from two experiments. The first one investigates the effects of different assumptions about utterance boundaries when presenting the input data to the learner. The second experiment compares the learner’s performance when counting contextual words’ frequencies versus just acknowledging their co-occurrence with a given target word. In general, our results indicate that explicit boundaries are more informative, frequencies are important, and that distributional information is useful to the child as a source of categorial information. These results are in accordance with Redington et al.’s findings for English.

pdf bib
Using Grounded Word Representations to Study Theories of Lexical Concepts
Dylan Ebert | Ellie Pavlick

The fields of cognitive science and philosophy have proposed many different theories for how humans represent concepts. Multiple such theories are compatible with state-of-the-art NLP methods, and could in principle be operationalized using neural networks. We focus on two particularly prominent theoriesClassical Theory and Prototype Theoryin the context of visually-grounded lexical representations. We compare when and how the behavior of models based on these theories differs in terms of categorization and entailment tasks. Our preliminary results suggest that Classical-based representations perform better for entailment and Prototype-based representations perform better for categorization. We discuss plans for additional experiments needed to confirm these initial observations.

up

pdf (full)
bib (full)
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

pdf bib
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology
Kate Niederhoffer | Kristy Hollingshead | Philip Resnik | Rebecca Resnik | Kate Loveys

pdf bib
Identifying therapist conversational actions across diverse psychotherapeutic approaches
Fei-Tzin Lee | Derrick Hull | Jacob Levine | Bonnie Ray | Kathy McKeown

While conversation in therapy sessions can vary widely in both topic and style, an understanding of the underlying techniques used by therapists can provide valuable insights into how therapists best help clients of different types. Dialogue act classification aims to identify the conversational action each speaker takes at each utterance, such as sympathizing, problem-solving or assumption checking. We propose to apply dialogue act classification to therapy transcripts, using a therapy-specific labeling scheme, in order to gain a high-level understanding of the flow of conversation in therapy sessions. We present a novel annotation scheme that spans multiple psychotherapeutic approaches, apply it to a large and diverse corpus of psychotherapy transcripts, and present and discuss classification results obtained using both SVM and neural network-based models. The results indicate that identifying the structure and flow of therapeutic actions is an obtainable goal, opening up the opportunity in the future to provide therapeutic recommendations tailored to specific client situations.

pdf bib
CLaC at CLPsych 2019 : Fusion of Neural Features and Predicted Class Probabilities for Suicide Risk Assessment Based on Online PostsCLaC at CLPsych 2019: Fusion of Neural Features and Predicted Class Probabilities for Suicide Risk Assessment Based on Online Posts
Elham Mohammadi | Hessam Amini | Leila Kosseim

This paper summarizes our participation to the CLPsych 2019 shared task, under the name CLaC. The goal of the shared task was to detect and assess suicide risk based on a collection of online posts. For our participation, we used an ensemble method which utilizes 8 neural sub-models to extract neural features and predict class probabilities, which are then used by an SVM classifier. Our team ranked first in 2 out of the 3 tasks (tasks A and C).

pdf bib
Suicide Risk Assessment with Multi-level Dual-Context Language and BERTBERT
Matthew Matero | Akash Idnani | Youngseo Son | Salvatore Giorgi | Huy Vu | Mohammad Zamani | Parth Limbachiya | Sharath Chandra Guntuku | H. Andrew Schwartz

Mental health predictive systems typically model language as if from a single context (e.g. Twitter posts, status updates, or forum posts) and often limited to a single level of analysis (e.g. either the message-level or user-level). Here, we bring these pieces together to explore the use of open-vocabulary (BERT embeddings, topics) and theoretical features (emotional expression lexica, personality) for the task of suicide risk assessment on support forums (the CLPsych-2019 Shared Task). We used dual context based approaches (modeling content from suicide forums separate from other content), built over both traditional ML models as well as a novel dual RNN architecture with user-factor adaptation. We find that while affect from the suicide context distinguishes with no-risk from those with any-risk, personality factors from the non-suicide contexts provide distinction of the levels of risk : low, medium, and high risk. Within the shared task, our dual-context approach (listed as SBU-HLAB in the official results) achieved state-of-the-art performance predicting suicide risk using a combination of suicide-context and non-suicide posts (Task B), achieving an F1 score of 0.50 over hidden test set labels.

pdf bib
Using natural conversations to classify autism with limited data : Age matters
Michael Hauser | Evangelos Sariyanidi | Birkan Tunc | Casey Zampella | Edward Brodkin | Robert Schultz | Julia Parish-Morris

Spoken language ability is highly heterogeneous in Autism Spectrum Disorder (ASD), which complicates efforts to identify linguistic markers for use in diagnostic classification, clinical characterization, and for research and clinical outcome measurement. Machine learning techniques that harness the power of multivariate statistics and non-linear data analysis hold promise for modeling this heterogeneity, but many models require enormous datasets, which are unavailable for most psychiatric conditions (including ASD). In lieu of such datasets, good models can still be built by leveraging domain knowledge. In this study, we compare two machine learning approaches : the first approach incorporates prior knowledge about language variation across middle childhood, adolescence, and adulthood to classify 6-minute naturalistic conversation samples from 140 age- and IQ-matched participants (81 with ASD), while the other approach treats all ages the same. We found that individual age-informed models were significantly more accurate than a single model tasked with building a common algorithm across age groups. Furthermore, predictive linguistic features differed significantly by age group, confirming the importance of considering age-related changes in language use when classifying ASD. Our results suggest that limitations imposed by heterogeneity inherent to ASD and from developmental change with age can be (at least partially) overcome using domain knowledge, such as understanding spoken language development from childhood through adulthood.

pdf bib
The importance of sharing patient-generated clinical speech and language data
Kathleen C. Fraser | Nicklas Linz | Hali Lindsay | Alexandra König

Increased access to large datasets has driven progress in NLP. However, most computational studies of clinically-validated, patient-generated speech and language involve very few datapoints, as such data are difficult (and expensive) to collect. In this position paper, we argue that we must find ways to promote data sharing across research groups, in order to build datasets of a more appropriate size for NLP and machine learning analysis. We review the benefits and challenges of sharing clinical language data, and suggest several concrete actions by both clinical and NLP researchers to encourage multi-site and multi-disciplinary data sharing. We also propose the creation of a collaborative data sharing platform, to allow NLP researchers to take a more active responsibility for data transcription, annotation, and curation.

pdf bib
Depressed Individuals Use Negative Self-Focused Language When Recalling Recent Interactions with Close Romantic Partners but Not Family or FriendsFriends
Taleen Nalabandian | Molly Ireland

Depression is characterized by a self-focused negative attentional bias, which is often reflected in everyday language use. In a prospective writing study, we explored whether the association between depressive symptoms and negative, self-focused language varies across social contexts. College students (N = 243) wrote about a recent interaction with a person they care deeply about. Depression symptoms positively correlated with negative emotion words and first-person singular pronouns (or negative self-focus) when writing about a recent interaction with romantic partners or, to a lesser extent, friends, but not family members. The pattern of results was more pronounced when participants perceived greater self-other overlap (i.e., interpersonal closeness) with their romantic partner. Findings regarding how the linguistic profile of depression differs by type of relationship may inform more effective methods of clinical diagnosis and treatment.

pdf bib
Semantic Characteristics of Schizophrenic Speech
Kfir Bar | Vered Zilberstein | Ido Ziv | Heli Baram | Nachum Dershowitz | Samuel Itzikowitz | Eiran Vadim Harel

Natural language processing tools are used to automatically detect disturbances in transcribed speech of schizophrenia inpatients who speak Hebrew. We measure topic mutation over time and show that controls maintain more cohesive speech than inpatients. We also examine differences in how inpatients and controls use adjectives and adverbs to describe content words and show that the ones used by controls are more common than the those of inpatients. We provide experimental results and show their potential for automatically detecting schizophrenia in patients by means only of their speech patterns.

pdf bib
Mental Health Surveillance over Social Media with Digital Cohorts
Silvio Amir | Mark Dredze | John W. Ayers

The ability to track mental health conditions via social media opened the doors for large-scale, automated, mental health surveillance. However, inferring accurate population-level trends requires representative samples of the underlying population, which can be challenging given the biases inherent in social media data. While previous work has adjusted samples based on demographic estimates, the populations were selected based on specific outcomes, e.g. specific mental health conditions. We depart from these methods, by conducting analyses over demographically representative digital cohorts of social media users. To validated this approach, we constructed a cohort of US based Twitter users to measure the prevalence of depression and PTSD, and investigate how these illnesses manifest across demographic subpopulations. The analysis demonstrates that cohort-based studies can help control for sampling biases, contextualize outcomes, and provide deeper insights into the data.

pdf bib
Analyzing the use of existing systems for the CLPsych 2019 Shared TaskCLPsych 2019 Shared Task
Alejandro González Hevia | Rebeca Cerezo Menéndez | Daniel Gayo-Avello

In this paper we describe the UniOvi-WESO classification systems proposed for the 2019 Computational Linguistics and Clinical Psychology (CLPsych) Shared Task. We explore the use of two systems trained with ReachOut data from the 2016 CLPsych task, and compare them to a baseline system trained with the data provided for this task. All the classifiers were trained with features extracted just from the text of each post, without using any other metadata. We found out that the baseline system performs slightly better than the pretrained systems, mainly due to the differences in labeling between the two tasks. However, they still work reasonably well and can detect if a user is at risk of suicide or not.

pdf bib
Similar Minds Post Alike : Assessment of Suicide Risk Using a Hybrid Model
Lushi Chen | Abeer Aldayel | Nikolay Bogoychev | Tao Gong

This paper describes our system submission for the CLPsych 2019 shared task B on suicide risk assessment. We approached the problem with three separate models : a behaviour model ; a language model and a hybrid model. For the behavioral model approach, we model each user’s behaviour and thoughts with four groups of features : posting behaviour, sentiment, motivation, and content of the user’s posting. We use these features as an input in a support vector machine (SVM). For the language model approach, we trained a language model for each risk level using all the posts from the users as the training corpora. Then, we computed the perplexity of each user’s posts to determine how likely his / her posts were to belong to each risk level. Finally, we built a hybrid model that combines both the language model and the behavioral model, which demonstrates the best performance in detecting the suicide risk level.

pdf bib
Suicide Risk Assessment on Social Media : USI-UPF at the CLPsych 2019 Shared TaskUSI-UPF at the CLPsych 2019 Shared Task
Esteban Ríssola | Diana Ramírez-Cifuentes | Ana Freire | Fabio Crestani

This paper describes the participation of the USI-UPF team at the shared task of the 2019 Computational Linguistics and Clinical Psychology Workshop (CLPsych2019). The goal is to assess the degree of suicide risk of social media users given a labelled dataset with their posts. An appropriate suicide risk assessment, with the usage of automated methods, can assist experts on the detection of people at risk and eventually contribute to prevent suicide. We propose a set of machine learning models with features based on lexicons, word embeddings, word level n-grams, and statistics extracted from users’ posts. The results show that the most effective models for the tasks are obtained integrating lexicon-based features, a selected set of n-grams, and statistical measures.

pdf bib
An Investigation of Deep Learning Systems for Suicide Risk Assessment
Michelle Morales | Prajjalita Dey | Thomas Theisen | Danny Belitz | Natalia Chernova

This work presents the systems explored as part of the CLPsych 2019 Shared Task. More specifically, this work explores the promise of deep learning systems for suicide risk assessment.

up

pdf (full)
bib (full)
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing

pdf bib
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing
Heiko Vogler | Andreas Maletti

pdf bib
On the Compression of Lexicon Transducers
Marco Cognetta | Cyril Allauzen | Michael Riley

In finite-state language processing pipelines, a lexicon is often a key component. It needs to be comprehensive to ensure accuracy, reducing out-of-vocabulary misses. However, in memory-constrained environments (e.g., mobile phones), the size of the component automata must be kept small. Indeed, a delicate balance between comprehensiveness, speed, and memory must be struck to conform to device requirements while providing a good user experience. In this paper, we describe a compression scheme for lexicons when represented as finite-state transducers. We efficiently encode the graph of the transducer while storing transition labels separately. The graph encoding scheme is based on the LOUDS (Level Order Unary Degree Sequence) tree representation, which has constant time tree traversal for queries while being information-theoretically optimal in space. We find that our encoding is near the theoretical lower bound for such graphs and substantially outperforms more traditional representations in space while remaining competitive in latency benchmarks.

pdf bib
Finite State Transducer Calculus for Whole Word Morphology
Maciej Janicki

The research on machine learning of morphology often involves formulating morphological descriptions directly on surface forms of words. As the established two-level morphology paradigm requires the knowledge of the underlying structure, it is not widely used in such settings. In this paper, we propose a formalism describing structural relationships between words based on theories of morphology that reject the notions of internal word structure and morpheme. The formalism covers a wide variety of morphological phenomena (including non-concatenative ones like stem vowel alternation) without the need of workarounds and extensions. Furthermore, we show that morphological rules formulated in such way can be easily translated to FSTs, which enables us to derive performant approaches to morphological analysis, generation and automatic rule discovery.

pdf bib
Weighted parsing for grammar-based language models
Richard Mörbitz | Heiko Vogler

We develop a general framework for weighted parsing which is built on top of grammar-based language models and employs flexible weight algebras. It generalizes previous work in that area (semiring parsing, weighted deductive parsing) and also covers applications outside the classical scope of parsing, e.g., algebraic dynamic programming. We show an algorithm which terminates and is correct for a large class of weighted grammar-based language models.

pdf bib
Using Meta-Morph Rules to develop Morphological Analysers : A case study concerning TamilTamil
Kengatharaiyer Sarveswaran | Gihan Dias | Miriam Butt

This paper describes a new and larger coverage Finite-State Morphological Analyser (FSM) and Generator for the Dravidian language Tamil. The FSM has been developed in the context of computational grammar engineering, adhering to the standards of the ParGram effort. Tamil is a morphologically rich language and the interaction between linguistic analysis and formal implementation is complex, resulting in a challenging task. In order to allow the development of the FSM to focus more on the linguistic analysis and less on the formal details, we have developed a system of meta-morph(ology) rules along with a script which translates these rules into FSM processable representations. The introduction of meta-morph rules makes it possible for computationally naive linguists to interact with the system and to expand it in future work. We found that the meta-morph rules help to express linguistic generalisations and reduce the manual effort of writing lexical classes for morphological analysis. Our Tamil FSM currently handles mainly the inflectional morphology of 3,300 verb roots and their 260 forms. Further, it also has a lexicon of approximately 100,000 nouns along with a guesser to handle out-of-vocabulary items. Although the Tamil FSM was primarily developed to be part of a computational grammar, it can also be used as a web or stand-alone application for other NLP tasks, as per general ParGram practice.

pdf bib
Distilling weighted finite automata from arbitrary probabilistic models
Ananda Theertha Suresh | Brian Roark | Michael Riley | Vlad Schogol

Weighted finite automata (WFA) are often used to represent probabilistic models, such as n-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on some tasks including distilling n-gram models from neural models.

pdf bib
Silent HMMs : Generalized Representation of Hidden Semi-Markov Models and Hierarchical HMMsHMMs: Generalized Representation of Hidden Semi-Markov Models and Hierarchical HMMs
Kei Wakabayashi

Modeling sequence data using probabilistic finite state machines (PFSMs) is a technique that analyzes the underlying dynamics in sequences of symbols. Hidden semi-Markov models (HSMMs) and hierarchical hidden Markov models (HHMMs) are PFSMs that have been successfully applied to a wide variety of applications by extending HMMs to make the extracted patterns easier to interpret. However, these models are independently developed with their own training algorithm, so that we can not combine multiple kinds of structures to build a PFSM for a specific application. In this paper, we prove that silent hidden Markov models (silent HMMs) are flexible models that have more expressive power than HSMMs and HHMMs. Silent HMMs are HMMs that contain silent states, which do not emit any observations. We show that we can obtain silent HMM equivalent to given HSMMs and HHMMs. We believe that these results form a firm foundation to use silent HMMs as a unified representation for PFSM modeling.

pdf bib
Transition-Based Coding and Formal Language Theory for Ordered Digraphs
Anssi Yli-Jyrä

Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterize properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics.

up

pdf (full)
bib (full)
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

pdf bib
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task
Davy Weissenbacher | Graciela Gonzalez-Hernandez

pdf bib
Extracting Kinship from Obituary to Enhance Electronic Health Records for Genetic Research
Kai He | Jialun Wu | Xiaoyong Ma | Chong Zhang | Ming Huang | Chen Li | Lixia Yao

Claims database and electronic health records database do not usually capture kinship or family relationship information, which is imperative for genetic research. We identify online obituaries as a new data source and propose a special named entity recognition and relation extraction solution to extract names and kinships from online obituaries. Built on 1,809 annotated obituaries and a novel tagging scheme, our joint neural model achieved macro-averaged precision, recall and F measure of 72.69 %, 78.54 % and 74.93 %, and micro-averaged precision, recall and F measure of 95.74 %, 98.25 % and 96.98 % using 57 kinships with 10 or more examples in a 10-fold cross-validation experiment. The model performance improved dramatically when trained with 34 kinships with 50 or more examples. Leveraging additional information such as age, death date, birth date and residence mentioned by obituaries, we foresee a promising future of supplementing EHR databases with comprehensive and accurate kinship information for genetic research.

pdf bib
Lexical Normalization of User-Generated Medical Text
Anne Dirkson | Suzan Verberne | Wessel Kraaij

In the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is complicated by colloquial language use and misspellings. Yet, lexical normalization of such data has not been addressed properly. This paper presents an unsupervised, data-driven spelling correction module for medical social media. Our method outperforms state-of-the-art spelling correction and can detect mistakes with an F0.5 of 0.888. Additionally, we present a novel corpus for spelling mistake detection and correction on a medical patient forum.

pdf bib
HITSZ-ICRC : A Report for SMM4H Shared Task 2019-Automatic Classification and Extraction of Adverse Effect Mentions in TweetsHITSZ-ICRC: A Report for SMM4H Shared Task 2019-Automatic Classification and Extraction of Adverse Effect Mentions in Tweets
Shuai Chen | Yuanhang Huang | Xiaowei Huang | Haoming Qin | Jun Yan | Buzhou Tang

This is the system description of the Harbin Institute of Technology Shenzhen (HITSZ) team for the first and second subtasks of the fourth Social Media Mining for Health Applications (SMM4H) shared task in 2019. The two subtasks are automatic classification and extraction of adverse effect mentions in tweets. The systems for the two subtasks are based on bidirectional encoder representations from transformers (BERT), and achieves promising results. Among the systems we developed for subtask1, the best F1-score was 0.6457, for subtask2, the best relaxed F1-score and the best strict F1-score were 0.614 and 0.407 respectively. Our system ranks first among all systems on subtask1.

pdf bib
Approaching SMM4H with Merged Models and Multi-task LearningSMM4H with Merged Models and Multi-task Learning
Tilia Ellendorff | Lenz Furrer | Nicola Colic | Noëmi Aepli | Fabio Rinaldi

We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks : Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps.

pdf bib
Identifying Adverse Drug Events Mentions in Tweets Using Attentive, Collocated, and Aggregated Medical Representation
Xinyan Zhao | Deahan Yu | V.G.Vinod Vydiswaran

Identifying mentions of medical concepts in social media is challenging because of high variability in free text. In this paper, we propose a novel neural network architecture, the Collocated LSTM with Attentive Pooling and Aggregated representation (CLAPA), that integrates a bidirectional LSTM model with attention and pooling strategy and utilizes the collocation information from training data to improve the representation of medical concepts. The collocation and aggregation layers improve the model performance on the task of identifying mentions of adverse drug events (ADE) in tweets. Using the dataset made available as part of the workshop shared task, we show that careful selection of neighborhood contexts can help uncover useful local information and improve the overall medical concept representation.

pdf bib
Correlating Twitter Language with Community-Level Health OutcomesTwitter Language with Community-Level Health Outcomes
Arno Schneuwly | Ralf Grubenmann | Séverine Rion Logean | Mark Cieliebak | Martin Jaggi

We study how language on social media is linked to mortal diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors.

pdf bib
Affective Behaviour Analysis of On-line User Interactions : Are On-line Support Groups More Therapeutic than Twitter?Twitter?
Giuliano Tortoreto | Evgeny Stepanov | Alessandra Cervone | Mateusz Dubiel | Giuseppe Riccardi

The increase in the prevalence of mental health problems has coincided with a growing popularity of health related social networking sites. Regardless of their therapeutic potential, on-line support groups (OSGs) can also have negative effects on patients. In this work we propose a novel methodology to automatically verify the presence of therapeutic factors in social networking websites by using Natural Language Processing (NLP) techniques. The methodology is evaluated on on-line asynchronous multi-party conversations collected from an OSG and Twitter. The results of the analysis indicate that therapeutic factors occur more frequently in OSG conversations than in Twitter conversations. Moreover, the analysis of OSG conversations reveals that the users of that platform are supportive, and interactions are likely to lead to the improvement of their emotional state. We believe that our method provides a stepping stone towards automatic analysis of emotional states of users of online platforms. Possible applications of the method include provision of guidelines that highlight potential implications of using such platforms on users’ mental health, and/or support in the analysis of their impact on specific individuals.

pdf bib
NLP@UNED at SMM4H 2019 : Neural Networks Applied to Automatic Classifications of Adverse Effects Mentions in TweetsNLP@UNED at SMM4H 2019: Neural Networks Applied to Automatic Classifications of Adverse Effects Mentions in Tweets
Javier Cortes-Tejada | Juan Martinez-Romo | Lourdes Araujo

This paper describes a system for automatically classifying adverse effects mentions in tweets developed for the task 1 at Social Media Mining for Health Applications (SMM4H) Shared Task 2019. We have developed a system based on LSTM neural networks inspired by the excellent results obtained by deep learning classifiers in the last edition of this task. The network is trained along with Twitter GloVe pre-trained word embeddings.

pdf bib
Detecting and Extracting of Adverse Drug Reaction Mentioning Tweets with Multi-Head Self Attention
Suyu Ge | Tao Qi | Chuhan Wu | Yongfeng Huang

This paper describes our system for the first and second shared tasks of the fourth Social Media Mining for Health Applications (SMM4H) workshop. We enhance tweet representation with a language model and distinguish the importance of different words with Multi-Head Self-Attention. In addition, transfer learning is exploited to make up for the data shortage. Our system achieved competitive results on both tasks with an F1-score of 0.5718 for task 1 and 0.653 (overlap) / 0.357 (strict) for task 2.

pdf bib
Deep Learning for Identification of Adverse Effect Mentions In Twitter DataTwitter Data
Paul Barry | Ozlem Uzuner

Social Media Mining for Health Applications (SMM4H) Adverse Effect Mentions Shared Task challenges participants to accurately identify spans of text within a tweet that correspond to Adverse Effects (AEs) resulting from medication usage (Weissenbacher et al., 2019). This task features a training data set of 2,367 tweets, in addition to a 1,000 tweet evaluation data set. The solution presented here features a bidirectional Long Short-term Memory Network (bi-LSTM) for the generation of character-level embeddings. It uses a second bi-LSTM trained on both character and token level embeddings to feed a Conditional Random Field (CRF) which provides the final classification. This paper further discusses the deep learning algorithms used in our solution.

pdf bib
Using Machine Learning and Deep Learning Methods to Find Mentions of Adverse Drug Reactions in Social Media
Pilar López Úbeda | Manuel Carlos Díaz Galiano | Maite Martin | L. Alfonso Urena Lopez

Over time the use of social networks is becoming very popular platforms for sharing health related information. Social Media Mining for Health Applications (SMM4H) provides tasks such as those described in this document to help manage information in the health domain. This document shows the first participation of the SINAI group. We study approaches based on machine learning and deep learning to extract adverse drug reaction mentions from Twitter. The results obtained in the tasks are encouraging, we are close to the average of all participants and even above in some cases.

pdf bib
Towards Text Processing Pipelines to Identify Adverse Drug Events-related Tweets : University of Michigan @ SMM4H 2019 Task 1University of Michigan @ SMM4H 2019 Task 1
V.G.Vinod Vydiswaran | Grace Ganzel | Bryan Romas | Deahan Yu | Amy Austin | Neha Bhomia | Socheatha Chan | Stephanie Hall | Van Le | Aaron Miller | Olawunmi Oduyebo | Aulia Song | Radhika Sondhi | Danny Teng | Hao Tseng | Kim Vuong | Stephanie Zimmerman

We participated in Task 1 of the Social Media Mining for Health Applications (SMM4H) 2019 Shared Tasks on detecting mentions of adverse drug events (ADEs) in tweets. Our approach relied on a text processing pipeline for tweets, and training traditional machine learning and deep learning models. Our submitted runs performed above average for the task.

pdf bib
Give It a Shot : Few-shot Learning to Normalize ADR Mentions in Social Media PostsADR Mentions in Social Media Posts
Emmanouil Manousogiannis | Sepideh Mesbah | Alessandro Bozzon | Selene Baez | Robert Jan Sips

This paper describes the system that team MYTOMORROWS-TU DELFT developed for the 2019 Social Media Mining for Health Applications (SMM4H) Shared Task 3, for the end-to-end normalization of ADR tweet mentions to their corresponding MEDDRA codes. For the first two steps, we reuse a state-of-the art approach, focusing our contribution on the final entity-linking step. For that we propose a simple Few-Shot learning approach, based on pre-trained word embeddings and data from the UMLS, combined with the provided training data. Our system (relaxed F1 : 0.337-0.345) outperforms the average (relaxed F1 0.2972) of the participants in this task, demonstrating the potential feasibility of few-shot learning in the context of medical text normalization.

pdf bib
Detection of Adverse Drug Reaction Mentions in Tweets Using ELMoELMo
Sarah Sarabadani

This paper describes the models used by our team in SMM4H 2019 shared task. We submitted results for subtasks 1 and 2. For task 1 which aims to detect tweets with Adverse Drug Reaction (ADR) mentions we used ELMo embeddings which is a deep contextualized word representation able to capture both syntactic and semantic characteristics. For task 2, which focuses on extraction of ADR mentions, first the same architecture as task 1 was used to identify whether or not a tweet contains ADR. Then, for tweets positively classified as mentioning ADR, the relevant text span was identified by similarity matching with 3 different lexicon sets.

pdf bib
Adverse Drug Effect and Personalized Health Mentions, CLaC at SMM4H 2019, Tasks 1 and 4CLaC at SMM4H 2019, Tasks 1 and 4
Parsa Bagherzadeh | Nadia Sheikh | Sabine Bergler

CLaC labs participated in Task 1 and 4 of SMM4H 2019. We pursed two main objectives in our submission. First we tried to use some textual features in a deep net framework, and second, the potential use of more than one word embedding was tested. The results seem positively affected by the proposed architectures.

pdf bib
MIDAS@SMM4H-2019 : Identifying Adverse Drug Reactions and Personal Health Experience Mentions from TwitterMIDAS@SMM4H-2019: Identifying Adverse Drug Reactions and Personal Health Experience Mentions from Twitter
Debanjan Mahata | Sarthak Anand | Haimin Zhang | Simra Shahid | Laiba Mehnaz | Yaman Kumar | Rajiv Ratn Shah

In this paper, we present our approach and the system description for the Social Media Mining for Health Applications (SMM4H) Shared Task 1,2 and 4 (2019). Our main contribution is to show the effectiveness of Transfer Learning approaches like BERT and ULMFiT, and how they generalize for the classification tasks like identification of adverse drug reaction mentions and reporting of personal health problems in tweets. We show the use of stacked embeddings combined with BLSTM+CRF tagger for identifying spans mentioning adverse drug reactions in tweets. We also show that these approaches perform well even with imbalanced dataset in comparison to undersampling and oversampling.

up

pdf (full)
bib (full)
Proceedings of the First International Workshop on Designing Meaning Representations

pdf bib
Proceedings of the First International Workshop on Designing Meaning Representations
Nianwen Xue | William Croft | Jan Hajic | Chu-Ren Huang | Stephan Oepen | Martha Palmer | James Pustejovksy

pdf bib
Modeling Quantification and Scope in Abstract Meaning RepresentationsAbstract Meaning Representations
James Pustejovsky | Ken Lai | Nianwen Xue

In this paper, we propose an extension to Abstract Meaning Representations (AMRs) to encode scope information of quantifiers and negation, in a way that overcomes the semantic gaps of the schema while maintaining its cognitive simplicity. Specifically, we address three phenomena not previously part of the AMR specification : quantification, negation (generally), and modality. The resulting representation, which we call Uniform Meaning Representation (UMR), adopts the predicative core of AMR and embeds it under a scope graph when appropriate. UMR representations differ from other treatments of quantification and modal scope phenomena in two ways : (a) they are more transparent ; and (b) they specify default scope when possible. ‘

pdf bib
Generating Discourse Inferences from Unscoped Episodic Logical Formulas
Gene Kim | Benjamin Kane | Viet Duong | Muskaan Mendiratta | Graeme McGuire | Sophie Sackstein | Georgiy Platonov | Lenhart Schubert

Abstract Unscoped episodic logical form (ULF) is a semantic representation capturing the predicate-argument structure of English within the episodic logic formalism in relation to the syntactic structure, while leaving scope, word sense, and anaphora unresolved. We describe how ULF can be used to generate natural language inferences that are grounded in the semantic and syntactic structure through a small set of rules defined over interpretable predicates and transformations on ULFs. The semantic restrictions placed by ULF semantic types enables us to ensure that the inferred structures are semantically coherent while the nearness to syntax enables accurate mapping to English. We demonstrate these inferences on four classes of conversationally-oriented inferences in a mixed genre dataset with 68.5 % precision from human judgments.

pdf bib
A Plea for Information Structure as a Part of Meaning Representation
Eva Hajicova

The view that the representation of information structure (IS) should be a part of (any type of) representation of meaning is based on the fact that IS is a semantically relevant phenomenon. In the contribution, three arguments supporting this view are briefly summarized, namely, the relation of IS to the interpretation of negation and presupposition, the relevance of IS to the understanding of discourse connectivity and for the establishment and interpretation of coreference relations. Afterwards, possible integration of the description of the main ingredient of IS into a meaning representation is illustrated.

pdf bib
TCL-a Lexicon of Turkish Discourse ConnectivesTCL - a Lexicon of Turkish Discourse Connectives
Deniz Zeyrek | Kezban Başıbüyük

It is known that discourse connectives are the most salient indicators of discourse relations. State-of-the-art parsers being developed to predict explicit discourse connectives exploit annotated discourse corpora but a lexicon of discourse connectives is also needed to enable further research in discourse structure and support the development of language technologies that use these structures for text understanding. This paper presents a lexicon of Turkish discourse connectives built by automatic means. The lexicon has the format of the German connective lexicon, DiMLex, where for each discourse connective, information about the connective‘s orthographic variants, syntactic category and senses are provided along with sample relations. In this paper, we describe the data sources we used and the development steps of the lexicon.

pdf bib
Ellipsis in Chinese AMR CorpusChinese AMR Corpus
Yihuan Liu | Bin Li | Peiyi Yan | Li Song | Weiguang Qu

Ellipsis is very common in language. It’s necessary for natural language processing to restore the elided elements in a sentence. However, there’s only a few corpora annotating the ellipsis, which draws back the automatic detection and recovery of the ellipsis. This paper introduces the annotation of ellipsis in Chinese sentences, using a novel graph-based representation Abstract Meaning Representation (AMR), which has a good mechanism to restore the elided elements manually. We annotate 5,000 sentences selected from Chinese TreeBank (CTB). We find that 54.98 % of sentences have ellipses. 92 % of the ellipses are restored by copying the antecedents’ concepts. and 12.9 % of them are the new added concepts. In addition, we find that the elided element is a word or phrase in most cases, but sometimes only the head of a phrase or parts of a phrase, which is rather hard for the automatic recovery of ellipsis.

pdf bib
Meaning Representation of Null Instantiated Semantic Roles in FrameNetFrameNet
Miriam R L Petruck

Humans have the unique ability to infer information about participants in a scene, even if they are not mentioned in a text about that scene. Computer systems can not do so without explicit information about those participants. This paper addresses the linguistic phenomenon of null-instantiated frame elements, i.e., implicit semantic roles, and their representation in FrameNet (FN). It motivates FN’s annotation practice, and illustrates three types of null-instantiated arguments that FrameNet tracks, noting that other lexical resources do not record such semantic-pragmatic information, despite its need in natural language understanding (NLU), and the elaborate efforts to create new datasets. It challenges the community to appeal to FN data to develop more sophisticated techniques for recognizing implicit semantic roles, and creating needed datasets. Although the annotation of null-instantiated roles was lexicographically motivated, FN provides useful information for text processing, and therefore must be considered in the design of any meaning representation for natural language understanding.

pdf bib
Copula and Case-Stacking Annotations for Korean AMRKorean AMR
Hyonsu Choe | Jiyoon Han | Hyejin Park | Hansaem Kim

This paper concerns the application of Abstract Meaning Representation (AMR) to Korean. In this regard, it focuses on the copula construction and its negation and the case-stacking phenomenon thereof. To illustrate this clearly, we reviewed the : domain annotation scheme from various perspectives. In this process, the existing annotation guidelines were improved to devise annotation schemes for each issue under the principle of pursuing consistency and efficiency of annotation without distorting the characteristics of Korean.

pdf bib
Preparing SNACS for Subjects and ObjectsSNACS for Subjects and Objects
Adi Shalev | Jena D. Hwang | Nathan Schneider | Vivek Srikumar | Omri Abend | Ari Rappoport

Research on adpositions and possessives in multiple languages has led to a small inventory of general-purpose meaning classes that disambiguate tokens. Importantly, that work has argued for a principled separation of the semantic role in a scene from the function coded by morphosyntax. Here, we ask whether this approach can be generalized beyond adpositions and possessives to cover all scene participantsincluding subjects and objectsdirectly, without reference to a frame lexicon. We present new guidelines for English and the results of an interannotator agreement study.

pdf bib
A Case Study on Meaning Representation for VietnameseVietnamese
Ha Linh | Huyen Nguyen

This paper presents a case study on meaning representation for Vietnamese. Having introduced several existing semantic representation schemes for different languages, we select as basis for our work on Vietnamese AMR (Abstract Meaning Representation). From it, we define a meaning representation label set by adapting the English schema and taking into account the specific characteristics of Vietnamese.

pdf bib
VerbNet Representations : Subevent Semantics for Transfer VerbsVerbNet Representations: Subevent Semantics for Transfer Verbs
Susan Windisch Brown | Julia Bonn | James Gung | Annie Zaenen | James Pustejovsky | Martha Palmer

This paper announces the release of a new version of the English lexical resource VerbNet with substantially revised semantic representations designed to facilitate computer planning and reasoning based on human language. We use the transfer of possession and transfer of information event representations to illustrate both the general framework of the representations and the types of nuances the new representations can capture. These representations use a Generative Lexicon-inspired subevent structure to track attributes of event participants across time, highlighting oppositions and temporal and causal relations among the subevents.

pdf bib
Semantically Constrained Multilayer Annotation : The Case of Coreference
Jakob Prange | Nathan Schneider | Omri Abend

We propose a coreference annotation scheme as a layer on top of the Universal Conceptual Cognitive Annotation foundational layer, treating units in predicate-argument structure as a basis for entity and event mentions. We argue that this allows coreference annotators to sidestep some of the challenges faced in other schemes, which do not enforce consistency with predicate-argument structure and vary widely in what kinds of mentions they annotate and how. The proposed approach is examined with a pilot annotation study and compared with annotations from other schemes.

pdf bib
Towards Universal Semantic Representation
Huaiyu Zhu | Yunyao Li | Laura Chiticariu

Natural language understanding at the semantic level and independent of language variations is of great practical value. Existing approaches such as semantic role labeling (SRL) and abstract meaning representation (AMR) still have features related to the peculiarities of the particular language. In this work we describe various challenges and possible solutions in designing a semantic representation that is universal across a variety of languages.

pdf bib
Augmenting Abstract Meaning Representation for Human-Robot DialogueAbstract Meaning Representation for Human-Robot Dialogue
Claire Bonial | Lucia Donatelli | Stephanie M. Lukin | Stephen Tratz | Ron Artstein | David Traum | Clare Voss

We detail refinements made to Abstract Meaning Representation (AMR) that make the representation more suitable for supporting a situated dialogue system, where a human remotely controls a robot for purposes of search and rescue and reconnaissance. We propose 36 augmented AMRs that capture speech acts, tense and aspect, and spatial information. This linguistic information is vital for representing important distinctions, for example whether the robot has moved, is moving, or will move. We evaluate two existing AMR parsers for their performance on dialogue data. We also outline a model for graph-to-graph conversion, in which output from AMR parsers is converted into our refined AMRs. The design scheme presented here, though task-specific, is extendable for broad coverage of speech acts using AMR in future task-independent work.

up

pdf (full)
bib (full)
Proceedings of the Second Workshop on Storytelling

pdf bib
Proceedings of the Second Workshop on Storytelling
Francis Ferraro | Ting-Hao ‘Kenneth’ Huang | Stephanie M. Lukin | Margaret Mitchell

pdf bib
A Hybrid Model for Globally Coherent Story Generation
Fangzhou Zhai | Vera Demberg | Pavel Shkadzko | Wei Shi | Asad Sayeed

Automatically generating globally coherent stories is a challenging problem. Neural text generation models have been shown to perform well at generating fluent sentences from data, but they usually fail to keep track of the overall coherence of the story after a couple of sentences. Existing work that incorporates a text planning module succeeded in generating recipes and dialogues, but appears quite data-demanding. We propose a novel story generation approach that generates globally coherent stories from a fairly small corpus. The model exploits a symbolic text planning module to produce text plans, thus reducing the demand of data ; a neural surface realization module then generates fluent text conditioned on the text plan. Human evaluation showed that our model outperforms various baselines by a wide margin and generates stories which are fluent as well as globally coherent.

pdf bib
Guided Neural Language Generation for Automated Storytelling
Prithviraj Ammanabrolu | Ethan Tien | Wesley Cheung | Zhaochen Luo | William Ma | Lara Martin | Mark Riedl

Neural network based approaches to automated story plot generation attempt to learn how to generate novel plots from a corpus of natural language plot summaries. Prior work has shown that a semantic abstraction of sentences called events improves neural plot generation and and allows one to decompose the problem into : (1) the generation of a sequence of events (event-to-event) and (2) the transformation of these events into natural language sentences (event-to-sentence). However, typical neural language generation approaches to event-to-sentence can ignore the event details and produce grammatically-correct but semantically-unrelated sentences. We present an ensemble-based model that generates natural language guided by events. Our method outperforms the baseline sequence-to-sequence model. Additionally, we provide results for a full end-to-end automated story generation system, demonstrating how our model works with existing systems designed for the event-to-event problem.

pdf bib
Narrative Generation in the Wild : Methods from NaNoGenMoNarrative Generation in the Wild: Methods from NaNoGenMo
Judith van Stegeren | Mariët Theune

In text generation, generating long stories is still a challenge. Coherence tends to decrease rapidly as the output length increases. Especially for generated stories, coherence of the narrative is an important quality aspect of the output text. In this paper we examine how narrative coherence is attained in the submissions of NaNoGenMo 2018, an online text generation event where participants are challenged to generate a 50,000 word novel. We list the main approaches that were used to generate coherent narratives and link them to scientific literature. Finally, we give recommendations on when to use which approach.

pdf bib
Lexical concreteness in narrative
Michael Flor | Swapna Somasundaran

This study explores the relation between lexical concreteness and narrative text quality. We present a methodology to quantitatively measure lexical concreteness of a text. We apply it to a corpus of student stories, scored according to writing evaluation rubrics. Lexical concreteness is weakly-to-moderately related to story quality, depending on story-type. The relation is mostly borne by adjectives and nouns, but also found for adverbs and verbs.

pdf bib
Personality Traits Recognition in Literary Texts
Daniele Pizzolli | Carlo Strapparava

Interesting stories often are built around interesting characters. Finding and detailing what makes an interesting character is a real challenge, but certainly a significant cue is the character personality traits. Our exploratory work tests the adaptability of the current personality traits theories to literal characters, focusing on the analysis of utterances in theatre scripts. And, at the opposite, we try to find significant traits for interesting characters. The preliminary results demonstrate that our approach is reasonable. Using machine learning for gaining insight into the personality traits of fictional characters can make sense.

pdf bib
WriterForcing : Generating more interesting story endingsWriterForcing: Generating more interesting story endings
Prakhar Gupta | Vinayshekhar Bannihatti Kumar | Mukul Bhutani | Alan W Black

We study the problem of generating interesting endings for stories. Neural generative models have shown promising results for various text generation problems. Sequence to Sequence (Seq2Seq) models are typically trained to generate a single output sequence for a given input sequence. However, in the context of a story, multiple endings are possible. Seq2Seq models tend to ignore the context and generate generic and dull responses. Very few works have studied generating diverse and interesting story endings for the same story context. In this paper, we propose models which generate more diverse and interesting outputs by 1) training models to focus attention on important keyphrases of the story, and 2) promoting generating nongeneric words. We show that the combination of the two leads to more interesting endings.

up

pdf (full)
bib (full)
Proceedings of the Third Workshop on Abusive Language Online

pdf bib
Proceedings of the Third Workshop on Abusive Language Online
Sarah T. Roberts | Joel Tetreault | Vinodkumar Prabhakaran | Zeerak Waseem

pdf bib
Subversive Toxicity Detection using Sentiment Information
Eloi Brassard-Gourdeau | Richard Khoury

The presence of toxic content has become a major problem for many online communities. Moderators try to limit this problem by implementing more and more refined comment filters, but toxic users are constantly finding new ways to circumvent them. Our hypothesis is that while modifying toxic content and keywords to fool filters can be easy, hiding sentiment is harder. In this paper, we explore various aspects of sentiment detection and their correlation to toxicity, and use our results to implement a toxicity detection tool. We then test how adding the sentiment information helps detect toxicity in three different real-world datasets, and incorporate subversion to these datasets to simulate a user trying to circumvent the system. Our results show sentiment information has a positive impact on toxicity detection.

pdf bib
Racial Bias in Hate Speech and Abusive Language Detection Datasets
Thomas Davidson | Debasmita Bhattacharya | Ingmar Weber

Technologies for abusive language detection are being developed and applied with little consideration of their potential biases. We examine racial bias in five different sets of Twitter data annotated for hate speech and abusive language. We train classifiers on these datasets and compare the predictions of these classifiers on tweets written in African-American English with those written in Standard American English. The results show evidence of systematic racial bias in all datasets, as classifiers trained on them tend to predict that tweets written in African-American English are abusive at substantially higher rates. If these abusive language detection systems are used in the field they will therefore have a disproportionate negative impact on African-American social media users. Consequently, these systems may discriminate against the groups who are often the targets of the abuse we are trying to detect.

pdf bib
Automated Identification of Verbally Abusive Behaviors in Online Discussions
Srecko Joksimovic | Ryan S. Baker | Jaclyn Ocumpaugh | Juan Miguel L. Andres | Ivan Tot | Elle Yuan Wang | Shane Dawson

Discussion forum participation represents one of the crucial factors for learning and often the only way of supporting social interactions in online settings. However, as much as sharing new ideas or asking thoughtful questions contributes learning, verbally abusive behaviors, such as expressing negative emotions in online discussions, could have disproportionate detrimental effects. To provide means for mitigating the potential negative effects on course participation and learning, we developed an automated classifier for identifying communication that show linguistic patterns associated with hostility in online forums. In so doing, we employ several well-established automated text analysis tools and build on the common practices for handling highly imbalanced datasets and reducing the sensitivity to overfitting. Although still in its infancy, our approach shows promising results (ROC AUC.73) towards establishing a robust detector of abusive behaviors. We further provide an overview of the classification (linguistic and contextual) features most indicative of online aggression.

pdf bib
Pay Attention to your Context when Classifying Abusive Language
Tuhin Chakrabarty | Kilol Gupta | Smaranda Muresan

The goal of any social media platform is to facilitate healthy and meaningful interactions among its users. But more often than not, it has been found that it becomes an avenue for wanton attacks. We propose an experimental study that has three aims : 1) to provide us with a deeper understanding of current data sets that focus on different types of abusive language, which are sometimes overlapping (racism, sexism, hate speech, offensive language, and personal attacks) ; 2) to investigate what type of attention mechanism (contextual vs. self-attention) is better for abusive language detection using deep learning architectures ; and 3) to investigate whether stacked architectures provide an advantage over simple architectures for this task.

pdf bib
Challenges and frontiers in abusive content detection
Bertie Vidgen | Alex Harris | Dong Nguyen | Rebekah Tromble | Scott Hale | Helen Margetts

Online abusive content detection is an inherently difficult task. It has received considerable attention from academia, particularly within the computational linguistics community, and performance appears to have improved as the field has matured. However, considerable challenges and unaddressed frontiers remain, spanning technical, social and ethical dimensions. These issues constrain the performance, efficiency and generalizability of abusive content detection systems. In this article we delineate and clarify the main challenges and frontiers in the field, critically evaluate their implications and discuss potential solutions. We also highlight ways in which social scientific insights can advance research. We discuss the lack of support given to researchers working with abusive content and provide guidelines for ethical research.

pdf bib
A System to Monitor Cyberbullying based on Message Classification and Social Network Analysis
Stefano Menini | Giovanni Moretti | Michele Corazza | Elena Cabrio | Sara Tonelli | Serena Villata

Social media platforms like Twitter and Instagram face a surge in cyberbullying phenomena against young users and need to develop scalable computational methods to limit the negative consequences of this kind of abuse. Despite the number of approaches recently proposed in the Natural Language Processing (NLP) research area for detecting different forms of abusive language, the issue of identifying cyberbullying phenomena at scale is still an unsolved problem. This is because of the need to couple abusive language detection on textual message with network analysis, so that repeated attacks against the same person can be identified. In this paper, we present a system to monitor cyberbullying phenomena by combining message classification and social network analysis. We evaluate the classification module on a data set built on Instagram messages, and we describe the cyberbullying monitoring user interface.

pdf bib
L-HSAB : A Levantine Twitter Dataset for Hate Speech and Abusive LanguageL-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language
Hala Mulki | Hatem Haddad | Chedi Bechikh Ali | Halima Alshabani

Hate speech and abusive language have become a common phenomenon on Arabic social media. Automatic hate speech and abusive detection systems can facilitate the prohibition of toxic textual contents. The complexity, informality and ambiguity of the Arabic dialects hindered the provision of the needed resources for Arabic abusive / hate speech detection research. In this paper, we introduce the first publicly-available Levantine Hate Speech and Abusive (L-HSAB) Twitter dataset with the objective to be a benchmark dataset for automatic detection of online Levantine toxic contents. We, further, provide a detailed review of the data collection steps and how we design the annotation guidelines such that a reliable dataset annotation is guaranteed. This has been later emphasized through the comprehensive evaluation of the annotations as the annotation agreement metrics of Cohen’s Kappa (k) and Krippendorff’s alpha () indicated the consistency of the annotations.

pdf bib
Preemptive Toxic Language Detection in Wikipedia Comments Using Thread-Level ContextWikipedia Comments Using Thread-Level Context
Mladen Karan | Jan Šnajder

We address the task of automatically detecting toxic content in user generated texts. We fo cus on exploring the potential for preemptive moderation, i.e., predicting whether a particular conversation thread will, in the future, incite a toxic comment. Moreover, we perform preliminary investigation of whether a model that jointly considers all comments in a conversation thread outperforms a model that considers only individual comments. Using an existing dataset of conversations among Wikipedia contributors as a starting point, we compile a new large-scale dataset for this task consisting of labeled comments and comments from their conversation threads.

pdf bib
A Platform Agnostic Dual-Strand Hate Speech Detector
Johannes Skjeggestad Meyer | Björn Gambäck

Hate speech detectors must be applicable across a multitude of services and platforms, and there is hence a need for detection approaches that do not depend on any information specific to a given platform. For instance, the information stored about the text’s author may differ between services, and so using such data would reduce a system’s general applicability. The paper thus focuses on using exclusively text-based input in the detection, in an optimised architecture combining Convolutional Neural Networks and Long Short-Term Memory-networks. The hate speech detector merges two strands with character n-grams and word embeddings to produce the final classification, and is shown to outperform comparable previous approaches.

pdf bib
Online aggression from a sociological perspective : An integrative view on determinants and possible countermeasures
Sebastian Weingartner | Lea Stahel

The present paper introduces a theoretical model for explaining aggressive online comments from a sociological perspective. It is innovative as it combines individual, situational, and social-structural determinants of online aggression and tries to theoretically derive their interplay. Moreover, the paper suggests an empirical strategy for testing the model. The main contribution will be to match online commenting data with survey data containing rich background data of non- /aggressive online commentators.

up

bib (full) Proceedings of the 2019 Workshop on Widening NLP

pdf bib
Proceedings of the 2019 Workshop on Widening NLP
Amittai Axelrod | Diyi Yang | Rossana Cunha | Samira Shaikh | Zeerak Waseem

bib
Development of a General Purpose Sentiment Lexicon for Igbo Language
Emeka Ogbuju | Moses Onyesolu

There are publicly available general purpose sentiment lexicons in some high resource languages but very few exist in the low resource languages. This makes it difficult to directly perform sentiment analysis tasks in such languages. The objective of this work is to create a general purpose sentiment lexicon for Igbo language that can determine the sentiment of documents written in Igbo language without having to translate it to English language. The material used was an automatically translated Liu’s lexicon and manual addition of Igbo native words. The result of this work is a general purpose lexicon – IgboSentilex. The performance was tested on the BBC Igbo news channel. It returned an average polarity agreement of 95% with other general purpose sentiment lexicons.

bib
Towards a Resource Grammar for Runyankore and Rukiga
David Bamutura | Peter Ljunglöf

Currently, there is a lack of computational grammar resources for many under-resourced languages which limits the ability to develop Natural Language Processing (NLP) tools and applications such as Multilingual Document Authoring, Computer-Assisted Language Learning (CALL) and Low-Coverage Machine Translation (MT) for these languages. In this paper, we present our attempt to formalise the grammar of two such languages: Runyankore and Rukiga. For this formalisation we use the Grammatical Framework (GF) and its Resource Grammar Library (GF-RGL).

bib
Speech Recognition for Tigrinya language Using Deep Neural Network Approach
Hafte Abera | Sebsibe H/mariam

This work presents a speech recognition model for Tigrinya language .The Deep Neural Network is used to make the recognition model. The Long Short-Term Memory Network (LSTM), which is a special kind of Recurrent Neural Network composed of Long Short-Term Memory blocks, is the primary layer of our neural network model. The 40-dimensional features are MFCC-LDA-MLLT-fMLLR with CMN were used. The acoustic models are trained on features that are obtained by projecting down to 40 dimensions using linear discriminant analysis (LDA). Moreover, speaker adaptive training (SAT) is done using a single feature-space maximum likelihood linear regression (FMLLR) transform estimated per speaker. We train and compare LSTM and DNN models at various numbers of parameters and configurations. We show that LSTM models converge quickly and give state of the art speech recognition performance for relatively small sized models. Finally, the accuracy of the model is evaluated based on the recognition rate.

bib
Knowledge-Based Word Sense Disambiguation with Distributional Semantic Expansion
Hossein Rouhizadeh | Mehrnoush Shamsfard | Masoud Rouhizadeh

In this paper, we presented a WSD system that uses LDA topics for semantic expansion of document words. Our system also uses sense frequency information from SemCor to give higher priority to the senses which are more probable to happen.

bib
AspeRa: Aspect-Based Rating Prediction Based on User Reviews
Elena Tutubalina | Valentin Malykh | Sergey Nikolenko | Anton Alekseev | Ilya Shenbin

We propose a novel Aspect-based Rating Prediction model (AspeRa) that estimates user rating based on review texts for the items. It is based on aspect extraction with neural networks and combines the advantages of deep learning and topic modeling. It is mainly designed for recommendations, but an important secondary goal of AspeRa is to discover coherent aspects of reviews that can be used to explain predictions or for user profiling. We conduct a comprehensive empirical study of AspeRa, showing that it outperforms state-of-the-art models in terms of recommendation quality and produces interpretable aspects. This paper is an abridged version of our work (Nikolenko et al., 2019)

bib
Recognizing Arrow Of Time In The Short Stories
Fahimeh Hosseini | Hosein Fooladi | Mohammad Reza Samsami

Recognizing the arrow of time in the context of paragraphs in short stories is a challenging task. i.e., given only two paragraphs (excerpted from a random position in a short story), determining which comes first and which comes next is a difficult task even for humans. In this paper, we have collected and curated a novel dataset for tackling this challenging task. We have shown that a pre-trained BERT architecture achieves reasonable accuracy on the task, and outperforms RNN-based architectures.

bib
Amharic Word Sequence Prediction
Nuniyat Kifle

The significance of computers and handheld devices are not deniable in the modern world of today. Texts are entered to these devices using word processing programs as well as other techniques and word prediction is one of the techniques. Word Prediction is the action of guessing or forecasting what word comes after, based on some current information, and it is the main focus of this study. Even though Amharic is used by a large number of populations, no significant work is done on the topic of word sequence prediction. In this study, Amharic word sequence prediction model is developed with statistical methods using Hidden Markov Model by incorporating detailed Part of speech tag. Evaluation of the model is performed using developed prototype and keystroke savings (KSS) as a metrics. According to our experiment, prediction result using a bi-gram with detailed Part of Speech tag model has higher KSS and it is better compared to tri-gram model and better than those without Part of Speech tag. Therefore, statistical approach with Detailed POS has quite good potential on word sequence prediction for Amharic language. This research deals with designing word sequence prediction model in Amharic language. It is a language that is spoken in eastern Africa. One of the needs for Amharic word sequence prediction for mobile use and other digital devices is in order to facilitate data entry and communication in our language. Word sequence prediction is a challenging task for inflected languages. (Arora, 2007) These kinds of languages are morphologically rich and have enormous word forms. i.e. one word can have different forms. As Amharic language is highly inflected language and morphologically rich it shares this problem. (prediction, 2008) This problem makes word prediction system much more difficult and results poor performance. Due to this reason storing all forms in dictionary won’t solve the problem as in English and other less inflected languages. But considering other techniques that could help the predictor to suggest the next word like a POS based prediction should be used. Previous researches used dictionary approach with no consideration of context information. Hence storing all forms of words in dictionary for inflected languages such as Amharic language has been less effective. The main goal of this thesis is to implement Amharic word prediction model that works with better prediction speed and with narrowed search space as much as possible. We introduced two models; tags and words and linear interpolation that use part of speech tag information in addition to word n-grams in order to maximize the likelihood of syntactic appropriateness of the suggestions. We believe the results found reflect this. Amharic word sequence prediction using bi-gram model with higher POS weight and detailed Part of speech tag gave better keystroke savings in all scenarios of our experiment. The study followed Design Science Research Methodology (DSRM). Since DSRM includes approaches, techniques, tools, algorithms and evaluation mechanisms in the process, we followed statistical approach with statistical language modeling and built Amharic prediction model based on information from Part of Speech tagger. The statistics included in the systems varies from single word frequencies to part-of-speech tag n-grams. That means it included the statistics of Word frequencies, Word sequence frequencies, Part-of-speech sequence frequencies and other important information. Later on the system was evaluated using Keystroke Savings. (Lindh, 011). Linux mint was used as the main Operation System during the frame work design. We used corpus of 680,000 tagged words that has 31 tag sets, python programming language and its libraries for both the part of speech tagger and the predictor module. Other Tool that was used is the SRILIM (The SRI language modeling toolkit) in order to generate unigram bigram and trigram count as an input for the language model. SRILIM is toolkit that uses to build and apply statistical language modeling. This thesis presented Amharic word sequence prediction model using the statistical approach. We described a combined statistical and lexical word prediction system for handling inflected languages by making use of POS tags to build the language model. We developed Amharic language models of bigram and trigram for the training purpose. We obtained 29% of KSS using bigram model with detailed part ofspeech tag. Hence, Based on the experiments carried out for this study and the results obtained, the following conclusions were made. We concluded that employing syntactic information in the form of Part-of-Speech (POS) n-grams promises more effective predictions. We also can conclude data quantity, performance of POS tagger and data quality highly affects the keystroke savings. Here in our study the tests were done on a small collection of 100 phrases. According to our evaluation better Keystroke saving (KSS) is achieved when using bi-gram model than the tri-gram models. We believe the results obtained using the experiment of detailed Part of speech tags were effective Since speed and search space are the basic issues in word sequence prediction

bib
A Framework for Relation Extraction Across Multiple Datasets in Multiple Domains
Geeticka Chauhan | Matthew McDermott | Peter Szolovits

In this work, we aim to build a unifying framework for relation extraction (RE), applying this on 3 highly used datasets with the ability to be extendable to new datasets. At the moment, the domain suffers from lack of reproducibility as well as a lack of consensus on generalizable techniques. Our framework will be open-sourced and will aid in performing systematic exploration on the effect of different modeling techniques, pre-processing, training methodologies and evaluation metrics on the 3 datasets to help establish a consensus.

bib
Learning and Understanding Different Categories of Sexism Using Convolutional Neural Network’s Filters
Sima Sharifirad | Alon Jacovi

Sexism is very common in social media and makes the boundaries of free speech tighter for female users. Automatically flagging and removing sexist content requires niche identification and description of the categories. In this study, inspired by social science work, we propose three categories of sexism toward women as follows: “Indirect sexism”, “Sexual sexism” and “Physical sexism”. We build classifiers such as Convolutional Neural Network (CNN) to automatically detect different types of sexism and address problems of annotation. Even though inherent non-interpretability of CNN is a challenge for users who detect sexism, as the reason classifying a given speech instance with regard to sexism is difficult to glance from a CNN. However, recent research developed interpretable CNN filters for text data. In a CNN, filters followed by different activation patterns along with global max-pooling can help us tease apart the most important ngrams from the rest. In this paper, we interpret a CNN model trained to classify sexism in order to understand different categories of sexism by detecting semantic categories of ngrams and clustering them. Then, these ngrams in each category are used to improve the performance of the classification task. It is a preliminary work using machine learning and natural language techniques to learn the concept of sexism and distinguishes itself by looking at more precise categories of sexism in social media along with an in-depth investigation of CNN’s filters.

bib
Modeling Five Sentence Quality Representations by Finding Latent Spaces Produced with Deep Long Short-Memory Models
Pablo Rivas

We present a study in which we train neural models that approximate rules that assess the quality of English sentences. We modeled five rules using deep LSTMs trained over a dataset of sentences whose quality is evaluated under such rules. Preliminary results suggest the neural architecture can model such rules to high accuracy.

bib
English-Ethiopian Languages Statistical Machine Translation
Solomon Teferra Abate | Michael Melese | Martha Yifiru Tachbelie | Million Meshesha | Solomon Atinafu | Wondwossen Mulugeta | Yaregal Assabie | Hafte Abera | Biniyam Ephrem | Tewodros Gebreselassie | Wondimagegnhue Tsegaye Tufa | Amanuel Lemma | Tsegaye Andargie | Seifedin Shifaw

In this paper, we describe an attempt towards the development of parallel corpora for English and Ethiopian Languages, such as Amharic, Tigrigna, Afan-Oromo, Wolaytta and Ge’ez. The corpora are used for conducting bi-directional SMT experiments. The BLEU scores of the bi-directional SMT systems show a promising result. The morphological richness of the Ethiopian languages has a great impact on the performance of SMT especially when the targets are Ethiopian languages.

bib
An automatic discourse relation alignment experiment on TED-MDB
Sibel Ozer | Deniz Zeyrek

This paper describes an automatic discourse relation alignment experiment as an empirical justification of the planned annotation projection approach to enlarge the 3600-word multilingual corpus of TED Multilingual Discourse Bank (TED-MDB). The experiment is carried out on a single language pair (English-Turkish) included in TED-MDB. The paper first describes the creation of a large corpus of English-Turkish bi-sentences, then it presents a sense-based experiment that automatically aligns the relations in the English sentences of TED-MDB with the Turkish sentences. The results are very close to the results obtained from an earlier semi-automatic post-annotation alignment experiment validated by human annotators and are encouraging for future annotation projection tasks.

bib
The Design and Construction of the Corpus of China English
Lixin Xia | Yun Xia

The paper describes the development a corpus of an English variety, i.e. China English, in or-der to provide a linguistic resource for researchers in the field of China English. The Corpus of China English (CCE) was built with due consideration given to its representativeness and authenticity. It was composed of more than 13,962,102 tokens in 15,333 texts evenly divided between the following four genres: newspapers, magazines, fiction and academic writings. The texts cover a wide range of domains, such as news, financial, politics, environment, social, culture, technology, sports, education, philosophy, literary, etc. It is a helpful resource for research on China English, computational linguistics, natural language processing, corpus linguistics and English language education.

bib
Learning Trilingual Dictionaries for Urdu – Roman Urdu – English
Moiz Rauf | Sebastian Padó

In this paper, we present an effort to generate a joint Urdu, Roman Urdu and English trilingual lexicon using automated methods. We make a case for using statistical machine translation approaches and parallel corpora for dictionary creation. To this purpose, we use word alignment tools on the corpus and evaluate translations using human evaluators. Despite different writing script and considerable noise in the corpus our results show promise with over 85% accuracy of Roman Urdu–Urdu and 45% English–Urdu pairs.

bib
Joint Inference on Bilingual Parse Trees for PP-attachment Disambiguation
Geetanjali Rakshit

Prepositional Phrase (PP) attachment is a classical problem in NLP for languages like English, which suffer from structural ambiguity. In this work, we solve this problem with the help of another language free from such ambiguities, using the parse tree of the parallel sentence in the other language, and word alignments. We formulate an optimization framework that encourages agreement between the parse trees for two languages, and solve it using a novel Dual Decomposition (DD) based algorithm. Experiments on the English-Hindi language pair show promising improvements over the baseline.

bib
Using Attention-based Bidirectional LSTM to Identify Different Categories of Offensive Language Directed Toward Female Celebrities
Sima Sharifirad | Stan Matwin

Social media posts reflect the emotions, intentions and mental state of the users. Twitter users who harass famous female figures may do so with different intentions and intensities. Recent studies have published datasets focusing on different types of online harassment, vulgar language, and emotional intensities. We trained, validate and test our proposed model, attention-based bidirectional neural network, on the three datasets:”online harassment”, “vulgar language” and “valance” and achieved state of the art performance in two of the datasets. We report F1 score for each dataset separately along with the final precision, recall and macro-averaged F1 score. In addition, we identify ten female figures from different professions and racial backgrounds who have experienced harassment on Twitter. We tested the trained models on ten collected corpuses each related to one famous female figure to predict the type of harassing language, the type of vulgar language and the degree of intensity of language occurring on their social platforms. Interestingly, the achieved results show different patterns of linguistic use targeting different racial background and occupations. The contribution of this study is two-fold. From the technical perspective, our proposed methodology is shown to be effective with a good margin in comparison to the previous state-of-the-art results on one of the two available datasets. From the social perspective, we introduce a methodology which can unlock facts about the nature of offensive language targeting women on online social platforms. The collected dataset will be shared publicly for further investigation.

bib
Sentiment Analysis Model for Opinionated Awngi Text: Case of Music Reviews
Melese Mihret | Muluneh Atinaf

Abstract The analysis of sentiments is imperative to make a decision for individuals, organizations, and governments. Due to the rapid growth of Awngi (Agew) text on the web, there is no available corpus annotated for sentiment analysis. In this paper, we present a SA model for the Awngi language spoken in Ethiopia, by using a supervised machine learning approach. We developed our corpus by collecting around 1500 posts from online sources. This research is begun to build and evaluate the model for opinionated Awngi music reviews. Thus, pre-processing techniques have been employed to clean the data, to convert transliterations to the native Ethiopic script for accessibility and convenience to typing and to change the words to their base form by removing the inflectional morphemes. After pre-processing, the corpus is manually annotated by three the language professional for giving polarity, and rate, their level of confidence in their selection and sentiment intensity scale values. To improve the calculation method of feature selection and weighting and proposed a more suitable SA algorithm for feature extraction named CHI and weight calculation named TF IDF, increasing the proportion and weight of sentiment words in the feature words. We employed Support Vector Machines (SVM), Naïve Bayes (NB) and Maximum Entropy (MxEn) machine learning algorithms. Generally, the results are encouraging, despite the morphological challenge in Awngi, the data cleanness and small size of data. We are believed that the results could improve further with a larger corpus.

bib
A compositional view of questions
Maria Boritchev | Maxime Amblard

We present a research on compositional treatment of questions in neo-davidsonian event semantics style. Our work is based on (Champollion, 2011) where only declarative sentences were considered. Our research is based on complex formal examples, paving the way towards further research in this domain and further testing on real-life corpora.

bib
Controlling the Specificity of Clarification Question Generation
Yang Trista Cao | Sudha Rao | Hal Daumé III

Unlike comprehension-style questions, clarification questions look for some missing information in a given context. However, without guidance, neural models for question generation, similar to dialog generation models, lead to generic and bland questions that cannot elicit useful information. We argue that controlling the level of specificity of the generated questions can have useful applications and propose a neural clarification question generation model for the same. We first train a classifier that annotates a clarification question with its level of specificity (generic or specific) to the given context. Our results on the Amazon questions dataset demonstrate that training a clarification question generation model on specificity annotated data can generate questions with varied levels of specificity to the given context.

bib
Non-Monotonic Sequential Text Generation
Kiante Brantley | Kyunghyun Cho | Hal Daumé | Sean Welleck

Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation. Our framework operates by generating a word at an arbitrary position, and then recursively generating words to its left and then words to its right, yielding a binary tree. Learning is framed as imitation learning, including a coaching method which moves from imitating an oracle to reinforcing the policy’s own preferences. Experimental results demonstrate that using the proposed method, it is possible to learn policies which generate text without pre-specifying a generation order while achieving competitive performance with conventional left-to-right generation.

bib
Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them
Hila Gonen | Yoav Goldberg

Word embeddings are widely used in NLP for a vast range of tasks. It was shown that word embeddings derived from text corpora reflect gender biases in society, causing serious concern. Several recent works tackle this problem, and propose methods for significantly reducing this gender bias in word embeddings, demonstrating convincing results. However, we argue that this removal is superficial. While the bias is indeed substantially reduced according to the provided bias definition, the actual effect is mostly hiding the bias, not removing it. The gender bias information is still reflected in the distances between “gender-neutralized” words in the debiased embeddings, and can be recovered from them. We present a series of experiments to support this claim, for two debiasing methods. We conclude that existing bias removal techniques are insufficient, and should not be trusted for providing gender-neutral modeling.

bib
How does Grammatical Gender Affect Noun Representations in Gender-Marking Languages?
Hila Gonen | Yova Kementchedjhieva | Yoav Goldberg

Many natural languages assign grammatical gender also to inanimate nouns in the language. In such languages, words that relate to the gender-marked nouns are inflected to agree with the noun’s gender. We show that this affects the word representations of inanimate nouns, resulting in nouns with the same gender being closer to each other than nouns with different gender. While “embedding debiasing” methods fail to remove the effect, we demonstrate that a careful application of methods that neutralize grammatical gender signals from the words’ context when training word embeddings is effective in removing it. Fixing the grammatical gender bias results in a positive effect on the quality of the resulting word embeddings, both in monolingual and cross lingual settings. We note that successfully removing gender signals, while achievable, is not trivial to do and that a language-specific morphological analyzer, together with careful usage of it, are essential for achieving good results.

bib
Automatic Product Categorization for Official Statistics
Andrea Roberson

The North American Product Classification System (NAPCS) is a comprehensive, hierarchical classification system for products (goods and services) that is consistent across the three North American countries. Beginning in 2017, the Economic Census will use NAPCS to produce economy-wide product tabulations. Respondents are asked to report data from a long, pre-specified list of potential products in a given industry, with some lists containing more than 50 potential products. Businesses have expressed the desire to alternatively supply Universal Product Codes (UPC) to the U. S. Census Bureau. Much work has been done around the categorization of products using product descriptions. No study has applied these efforts for the calculation of official statistics (statistics published by government agencies) using only the text of UPC product descriptions. The question we address in this paper is: Given UPC codes and their associated product descriptions, can we accurately predict NAPCS? We tested the feasibility of businesses submitting a spreadsheet with Universal Product Codes and their associated text descriptions. This novel strategy classified text with very high accuracy rates, all of our algorithms surpassed over 90 percent.

bib
An Online Topic Modeling Framework with Topics Automatically Labeled
Jin Fenglei | Gao Cuiyun | Lyu Michael R.

In this paper, we propose a novel online topic tracking framework, named IEDL, for tracking the topic changes related to deep learning techniques on Stack Exchange and automatically interpreting each identified topic. The proposed framework combines the prior topic distributions in a time window during inferring the topics in current time slice, and introduces a new ranking scheme to select most representative phrases and sentences for the inferred topics. Experiments on 7,076 Stack Exchange posts show the effectiveness of IEDL in tracking topic changes.

bib
Construction and Alignment of Multilingual Entailment Graphs for Semantic Inference
Sabine Weber | Mark Steedman

This paper presents ongoing work on the construction and alignment of predicate entailment graphs in English and German. We extract predicate-argument pairs from large corpora of monolingual English and German news text and construct monolingual paraphrase clusters and entailment graphs. We use an aligned subset of entities to derive the bilingual alignment of entities and relations, and achieve better than baseline results on a translated subset of a predicate entailment data set (Levy and Dagan, 2016) and the German portion of XNLI (Conneau et al., 2018).

bib
KB-NLG: From Knowledge Base to Natural Language Generation
Wen Cui | Minghui Zhou | Rongwen Zhao | Narges Norouzi

We perform the natural language generation (NLG) task by mapping sets of Resource Description Framework (RDF) triples into text. First we investigate the impact of increasing the number of entity types in delexicalisaiton on the generation quality. Second we conduct different experiments to evaluate two widely applied language generation systems, encoder-decoder with attention and the Transformer model on a large benchmark dataset. We evaluate different models on automatic metrics, as well as the training time. To our knowledge, we are the first to apply Transformer model to this task.

bib
Acoustic Characterization of Singaporean Children’s English: Comparisons to American and British Counterparts
Yuling Gu | Nancy Chen

We investigate English pronunciation patterns in Singaporean children in relation to their American and British counterparts by conducting archetypal analysis on selected vowel pairs. Given that Singapore adopts British English as the institutional standard, one might expect Singaporean children to follow British pronunciation patterns, but we observe that Singaporean children also present similar patterns to Americans for TRAP-BATH spilt vowels: (1) British and Singaporean children both produce these vowels with a relatively lowered tongue height. (2) These vowels are more fronted for American and Singaporean children (p < 0.001). In addition, when comparing /æ/ and /ε/ productions, British speakers show the clearest distinction between the two vowels; Singaporean and American speakers exhibit a higher and more fronted tongue position for /æ/ (p < 0.001), causing /æ/ to be acoustically more similar to /ε/.

bib
Rethinking Phonotactic Complexity
Tiago Pimentel | Brian Roark | Ryan Cotterell

In this work, we propose the use of phone-level language models to estimate phonotactic complexity—measured in bits per phoneme—which makes cross-linguistic comparison straightforward. We compare the entropy across languages using this simple measure, gaining insight on how complex different language’s phonotactics are. Finally, we show a very strong negative correlation between phonotactic complexity and the average length of words—Spearman rho=-0.744—when analysing a collection of 106 languages with 1016 basic concepts each.

bib
Implementing a Multi-lingual Chatbot for Positive Reinforcement in Young Learners
Francisca Oladipo | Abdulmalik Rufai

This is a humanitarian work –a counter-terrorism effort. The presentation describes the experiences of developing a multi-lingua, interactive chatbot trained on the corpus of two Nigerian Languages (Hausa and Fulfude), with simultaneous translation to a third (Kanuri), to stimulate conversations, deliver tailored contents to the users thereby aiding in the detection of the probability and degree of radicalization in young learners through data analysis of the games moves and vocabularies. As chatbots have the ability to simulate a human conversation based on rhetorical behavior, the system is able to learn the need of individual user through constant interaction and deliver tailored contents that promote good behavior in Hausa, Fulfulde and Kanuri languages.

bib
A Deep Learning Approach to Language-independent Gender Prediction on Twitter
Reyhaneh Hashempour

This work presents a set of experiments conducted to predict the gender of Twitter users based on language-independent features extracted from the text of the users’ tweets. The experiments were performed on a version of TwiSty dataset including tweets written by the users of six different languages: Portuguese, French, Dutch, English, German, and Italian. Logistic regression (LR), and feed-forward neural networks (FFNN) with back-propagation were used to build models in two different settings: Inter-Lingual (IL) and Cross-Lingual (CL). In the IL setting, the training and testing were performed on the same language whereas in the CL, Italian and German datasets were set aside and only used as test sets and the rest were combined to compose training and development sets. In the IL, the highest accuracy score belongs to LR whereas, in the CL, FFNN with three hidden layers yields the highest score. The results show that neural network based models underperform traditional models when the size of the training set is small; however, they beat traditional models by a non-trivial margin, when they are fed with large enough data. Finally, the feature analysis confirms that men and women have different writing styles independent of their language.

bib
Isolating the Effects of Modeling Recursive Structures: A Case Study in Pronunciation Prediction of Chinese Characters
Minh Nguyen | Gia H Ngo | Nancy Chen

Finding that explicitly modeling structures leads to better generalization, we consider the task of predicting Cantonese pronunciations of logographs (Chinese characters) using logographs’ recursive structures. This task is a suitable case study for two reasons. First, logographs’ pronunciations depend on structures (i.e. the hierarchies of sub-units in logographs) Second, the quality of logographic structures is consistent since the structures are constructed automatically using a set of rules. Thus, this task is less affected by confounds such as varying quality between annotators. Empirical results show that modeling structures explicitly using treeLSTM outperforms LSTM baseline, reducing prediction error by 6.0% relative.

bib
Benchmarking Neural Machine Translation for Southern African Languages
Jade Abbott | Laura Martinus

Unlike major Western languages, most African languages are very low-resourced. Furthermore, the resources that do exist are often scattered and difficult to obtain and discover. As a result, the data and code for existing research has rarely been shared, meaning researchers struggle to reproduce reported results, and almost no publicly available benchmarks or leaderboards for African machine translation models exist. To start to address these problems, we trained neural machine translation models for a subset of Southern African languages on publicly-available datasets. We provide the code for training the models and evaluate the models on a newly released evaluation set, with the aim of starting a leaderboard for Southern African languages and spur future research in the field.

bib
OCR Quality and NLP Preprocessing
Margot Mieskes | Stefan Schmunk

We present initial experiments to evaluate the performance of tasks such as Part of Speech Tagging on data corrupted by Optical Character Recognition (OCR). Our results, based on English and German data, using artificial experiments as well as initial real OCRed data indicate that already a small drop in OCR quality considerably increases the error rates, which would have a significant impact on subsequent processing steps.

bib
Developing a Fine-grained Corpus for a Less-resourced Language: the case of Kurdish
Roshna Abdulrahman | Hossein Hassani | Sina Ahmadi

Kurdish is a less-resourced language consisting of different dialects written in various scripts. Approximately 30 million people in different countries speak the language. The lack of corpora is one of the main obstacles in Kurdish language processing. In this paper, we present KTC-the Kurdish Textbooks Corpus, which is composed of 31 K-12 textbooks in Sorani dialect. The corpus is normalized and categorized into 12 educational subjects containing 693,800 tokens (110,297 types). Our resource is publicly available for non-commercial use under the CC BY-NC-SA 4.0 license.

bib
Amharic Question Answering for Biography, Definition, and Description Questions
Tilahun Abedissa Taffa | Mulugeta Libsie

A broad range of information needs can often be stated as a question. Question Answering (QA) systems attempt to provide users concise answer(s) to natural language questions. The existing Amharic QA systems handle fact-based questions that usually take named entities as an answer. To deal with more complex information needs we developed an Amharic non-factoid QA for biography, definition, and description questions. A hybrid approach has been used for the question classification. For document filtering and answer extraction we have used lexical patterns. On the other hand to answer biography questions we have used a summarizer and the generated summary is validated using a text classifier. Our QA system is evaluated and has shown a promising result.

bib
Polysemous Language in Child Directed Speech
Sammy Floyd | Libby Barak | Adele Goldberg | Casey Lew-Williams

Polysemous Language in Child Directed Speech Learning the meaning of words is one of the fundamental building blocks of verbal communication. Models of child language acquisition have generally made the simplifying assumption that each word appears in child-directed speech with a single meaning. To understand naturalistic word learning during childhood, it is essential to know whether children hear input that is in fact constrained to single meaning per word, or whether the environment naturally contains multiple senses.In this study, we use a topic modeling approach to automatically induce word senses from child-directed speech. Our results confirm the plausibility of our automated analysis approach and reveal an increasing rate of using multiple senses in child-directed speech, starting with corpora from children as early as the first year of life.

bib
Principled Frameworks for Evaluating Ethics in NLP Systems
Shrimai Prabhumoye | Elijah Mayfield | Alan W Black

We critique recent work on ethics in natural language processing. Those discussions have focused on data collection, experimental design, and interventions in modeling. But we argue that we ought to first understand the frameworks of ethics that are being used to evaluate the fairness and justice of algorithmic systems. Here, we begin that discussion by outlining deontological and consequentialist ethics, and make predictions on the research agenda prioritized by each.

bib
Understanding the Shades of Sexism in Popular TV Series
Nayeon Lee | Yejin Bang | Jamin Shin | Pascale Fung

[Multiple-submission] In the midst of a generation widely exposed to and influenced by media entertainment, the NLP research community has shown relatively little attention on the sexist comments in popular TV series. To understand sexism in TV series, we propose a way of collecting distant supervision dataset using Character Persona information with the psychological theories on sexism. We assume that sexist characters from TV shows are more prone to making sexist comments when talking about women, and show that this hypothesis is valid through experiment. Finally, we conduct an interesting analysis on popular TV show characters and successfully identify different shades of sexism that is often overlooked.

bib
Evaluating Ways of Adapting Word Similarity
Libby Barak | Adele Goldberg

People judge pairwise similarity by deciding which aspects of the words’ meanings are relevant for the comparison of the given pair. However, computational representations of meaning rely on dimensions of the vector representation for similarity comparisons, without considering the specific pairing at hand. Prior work has adapted computational similarity judgments by using the softmax function in order to address this limitation by capturing asymmetry in human judgments. We extend this analysis by showing that a simple modification of cosine similarity offers a better correlation with human judgments over a comprehensive dataset. The modification performs best when the similarity between two words is calculated with reference to other words that are most similar and dissimilar to the pair.

bib
Exploring the Use of Lexicons to aid Deep Learning towards the Detection of Abusive Language
Anna Koufakou | Jason Scott

Detecting abusive language is a significant research topic, which has received a lot of attention recently. Our work focused on detecting personal attacks in online conversations. State-of-the-art research on this task has largely used deep learning with word embeddings. We explored the use of sentiment lexicons as well as semantic lexicons towards improving the accuracy of the baseline Convolutional Neural Network (CNN) using regular word embeddings. This is a work in progress, limited by time constraints and appropriate infrastructure. Our preliminary results showed promise for utilizing lexicons, especially semantic lexicons, for the task of detecting abusive language.

bib
Entity-level Classification of Adverse Drug Reactions: a Comparison of Neural Network Models
Ilseyar Alimova | Elena Tutubalina

This paper presents our experimental work on exploring the potential of neural network models developed for aspect-based sentiment analysis for entity-level adverse drug reaction (ADR) classification. Our goal is to explore how to represent local context around ADR mentions and learn an entity representation, interacting with its context. We conducted extensive experiments on various sources of text-based information, including social media, electronic health records, and abstracts of scientific articles from PubMed. The results show that Interactive Attention Neural Network (IAN) outperformed other models on four corpora in terms of macro F-measure. This work is an abridged version of our recent paper accepted to Programming and Computer Software journal in 2019.

bib
Context Effects on Human Judgments of Similarity
Libby Barak | Noe Kong-Johnson | Adele Goldberg

The semantic similarity of words forms the basis of many natural language processing methods. These computational similarity measures are often based on a mathematical comparison of vector representations of word meanings, while human judgments of similarity differ in lacking geometrical properties, e.g., symmetric similarity and triangular similarity. In this study, we propose a novel task design to further explore human behavior by asking whether a pair of words is deemed more similar depending on an immediately preceding judgment. Results from a crowdsourcing experiment show that people consistently judge words as more similar when primed by a judgment that evokes a relevant relationship. Our analysis further shows that word2vec similarity correlated significantly better with the out-of-context judgments, thus confirming the methodological differences in human-computer judgments, and offering a new testbed for probing the differences.

bib
NLP Automation to Read Radiological Reports to Detect the Stage of Cancer Among Lung Cancer Patients
Khushbu Gupta | Ratchainant Thammasudjarit | Ammarin Thakkinstian

A common challenge in the healthcare industry today is physicians have access to massive amounts of healthcare data but have little time and no appropriate tools. For instance, the risk prediction model generated by logistic regression could predict the probability of diseases occurrence and thus prioritizing patients’ waiting list for further investigations. However, many medical reports available in current clinical practice system are not yet ready for analysis using either statistics or machine learning as they are in unstructured text format. The complexity of medical information makes the annotation or validation of data very challenging and thus acts as a bottleneck to apply machine learning techniques in medical data. This study is therefore conducted to create such annotations automatically where the computer can read radiological reports for oncologists and mark the staging of lung cancer. This staging information is obtained using the rule-based method implemented using the standards of Tumor Node Metastasis (TNM) staging along with deep learning technology called Long Short Term Memory (LSTM) to extract clinical information from the Computed Tomography (CT) text report. The empirical experiment shows promising results being the accuracy of up to 85%.

bib
Augmenting Named Entity Recognition with Commonsense Knowledge
Gaith Dekhili | Tan Ngoc Le | Fatiha Sadat

Commonsense can be vital in some applications like Natural Language Understanding (NLU), where it is often required to resolve ambiguity arising from implicit knowledge and underspecification. In spite of the remarkable success of neural network approaches on a variety of Natural Language Processing tasks, many of them struggle to react effectively in cases that require commonsense knowledge. In the present research, we take advantage of the availability of the open multilingual knowledge graph ConceptNet, by using it as an additional external resource in Named Entity Recognition (NER). Our proposed architecture involves BiLSTM layers combined with a CRF layer that was augmented with some features such as pre-trained word embedding layers and dropout layers. Moreover, apart from using word representations, we used also character-based representation to capture the morphological and the orthographic information. Our experiments and evaluations showed an improvement in the overall performance with +2.86 in the F1-measure. Commonsense reasonnig has been employed in other studies and NLP tasks but to the best of our knowledge, there is no study relating the integration of a commonsense knowledge base in NER.

bib
Pardon the Interruption: Automatic Analysis of Gender and Competitive Turn-Taking in United States Supreme Court Hearings
Haley Lepp

The United States Supreme Court plays a key role in defining the legal basis for gender discrimination throughout the country, yet there are few checks on gender bias within the court itself. In conversational turn-taking, interruptions have been documented as a marker of bias between speakers of different genders. The goal of this study is to automatically differentiate between respectful and disrespectful conversational turns taken during official hearings, which could help in detecting bias and finding remediation techniques for discourse in the courtroom. In this paper, I present a corpus of turns annotated by legal professionals, and describe the design of a semi-supervised classifier that will use acoustic and lexical features to analyze turn-taking at scale. On completion of annotations, this classifier will be trained to extract the likelihood that turns are respectful or disrespectful for use in studies of speech trends.

bib
Evaluating Coherence in Dialogue Systems using Entailment
Nouha Dziri | Ehsan Kamalloo | Kory Mathewson | Osmar Zaiane

Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some researchers resort to human judgment experimentation for assessing response quality, which is expensive, time consuming, and not scalable. Moreover, judges tend to evaluate a small number of dialogues, meaning that minor differences in evaluation configuration may lead to dissimilar results. In this paper, we present interpretable metrics for evaluating topic coherence by making use of distributed sentence representations. Furthermore, we introduce calculable approximations of human judgment based on conversational coherence by adopting state-of-the-art entailment techniques. Results show that our metrics can be used as a surrogate for human judgment, making it easy to evaluate dialogue systems on large-scale datasets and allowing an unbiased estimate for the quality of the responses. This paper has been accepted in NAACL 2019.

bib
Exploiting machine algorithms in vocalic quantification of African English corpora
Lasisi Adeiza Isiaka

Towards procedural fidelity in the processing of African English speech corpora, this work demonstrates how the adaptation of machine-assisted segmentation of phonemes and automatic extraction of acoustic values can significantly speed up the processing of naturalistic data and make the vocalic analysis of the varieties less impressionistic. Research in African English phonology has, till date, been least data-driven – much less the use of comparative corpora for cross-varietal assessments. Using over 30 hours of naturalistic data (from 28 speakers in 5 Nigerian cities), the procedures for segmenting audio files into phonemic units via the Munich Automatic Segmentation System (MAUS), and the extraction of their spectral values in Praat are explained. Evidence from the speech corpora supports a more complex vocalic inventory than attested in previous auditory/manual-based accounts – thus reinforcing the resourcefulness of the algorithms for the current data and cognate varieties. Keywords: machine algorithms; naturalistic data; African English phonology; vowel segmentation

bib
Assessing the Ability of Neural Machine Translation Models to Perform Syntactic Rewriting
Jahkel Robin | Alvin Grissom II | Matthew Roselli

We describe work in progress for evaluating performance of sequence-to-sequence neural networks on the task of syntax-based reordering for rules applicable to simultaneous machine translation. We train models that attempt to rewrite English sentences using rules that are commonly used by human interpreters. We examine the performance of these models to determine which forms of rewriting are more difficult for them to learn and which architectures are the best at learning them.

bib
Authorship Recognition with Short-Text using Graph-based Techniques
Laura Cruz

In recent years, studies of authorship recognition has aroused great interest in graph-based analysis. Modeling the writing style of each author using a network of co-occurrence words. However, short texts can generate some changes in the topology of network that cause impact on techniques of feature extraction based on graph topology. In this work, we evaluate the robustness of global-strategy and local-strategy based on complex network measurements comparing with graph2vec a graph embedding technique based on skip-gram model. The experiment consists of evaluating how each modification in the length of text affects the accuracy of authorship recognition on both techniques using cross-validation and machine learning techniques.

bib
A Parallel Corpus Mixtec-Spanish
Cynthia Montaño | Gerardo Sierra Martínez | Gemma Bel-Enguix | Helena Gomez

This work is about the compilation process of parallel documents Spanish-Mixtec. There are not many Spanish-Mixec parallel texts and most of the sources are non-digital books. Due to this, we need to face the errors when digitizing the sources and difficulties in sentence alignment, as well as the fact that does not exist a standard orthography. Our parallel corpus consists of sixty texts coming from books and digital repositories. These documents belong to different domains: history, traditional stories, didactic material, recipes, ethnographical descriptions of each town and instruction manuals for disease prevention. We have classified this material in five major categories: didactic (6 texts), educative (6 texts), interpretative (7 texts), narrative (39 texts), and poetic (2 texts). The final total of tokens is 49,814 Spanish words and 47,774 Mixtec words. The texts belong to the states of Oaxaca (48 texts), Guerrero (9 texts) and Puebla (3 texts). According to this data, we see that the corpus is unbalanced in what refers to the representation of the different territories. While 55% of speakers are in Oaxaca, 80% of texts come from this region. Guerrero has the 30% of speakers and the 15% of texts and Puebla, with the 15% of the speakers has a representation of the 5% in the corpus.

bib
Emoji Usage Across Platforms: A Case Study for the Charlottesville Event
Khyati Mahajan | Samira Shaikh

We study emoji usage patterns across two social media platforms, one of them considered a fringe community called Gab, and the other Twitter. We find that Gab tends to comparatively use more emotionally charged emoji, but also seems more apathetic towards the violence during the event, while Twitter takes a more empathetic approach to the event.

bib
Reading KITTY: Pitch Range as an Indicator of Reading Skill
Alfredo Gomez | Alicia Ngo | Alessandra Otondo | Julie Medero

While affective outcomes are generally positive for the use of eBooks and computer-based reading tutors in teaching children to read, learning outcomes are often poorer (Korat and Shamir, 2004). We describe the first iteration of Reading Kitty, an iOS application that uses NLP and speech processing to focus children’s time on close reading and prosody in oral reading, while maintaining an emphasis on creativity and artifact creation. We also share preliminary results demonstrating that pitch range can be used to automatically predict readers’ skill level.

bib
Adversarial Attack on Sentiment Classification
Yi-Ting (Alicia) Tsai | Min-Chu Yang | Han-Yu Chen

In this paper, we propose a white-box attack algorithm called “Global Search” method and compare it with a simple misspelling noise and a more sophisticated and common white-box attack approach called “Greedy Search”. The attack methods are evaluated on the Convolutional Neural Network (CNN) sentiment classifier trained on the IMDB movie review dataset. The attack success rate is used to evaluate the effectiveness of the attack methods and the perplexity of the sentences is used to measure the degree of distortion of the generated adversarial examples. The experiment results show that the proposed “Global Search” method generates more powerful adversarial examples with less distortion or less modification to the source text.

bib
CSI Peru News: finding the culprit, victim and location in news articles
Gina Bustamante | Arturo Oncevay

We introduce a shift on the DS method over the domain of crime-related news from Peru, attempting to find the culprit, victim and location of a crime description from a RE perspective. Obtained results are highly promising and show that proposed modifications are effective in non-traditional domains.

bib
Exploring Social Bias in Chatbots using Stereotype Knowledge
Nayeon Lee | Andrea Madotto | Pascale Fung

Exploring social bias in chatbot is an important, yet relatively unexplored problem. In this paper, we propose an approach to understand social bias in chatbots by leveraging stereotype knowledge. It allows interesting comparison of bias between chatbots and humans, and provides intuitive analysis of existing chatbots by borrowing the finer-grain concepts of sexism and racism.

bib
Cross-Sentence Transformations in Text Simplification
Fernando Alva-Manchego | Carolina Scarton | Lucia Specia

Current approaches to Text Simplification focus on simplifying sentences individually. However, certain simplification transformations span beyond single sentences (e.g. joining and re-ordering sentences). In this paper, we motivate the need for modelling the simplification task at the document level, and assess the performance of sequence-to-sequence neural models in this setup. We analyse parallel original-simplified documents created by professional editors and show that there are frequent rewriting transformations that are not restricted to sentence boundaries. We also propose strategies to automatically evaluate the performance of a simplification model on these cross-sentence transformations. Our experiments show the inability of standard sequence-to-sequence neural models to learn these transformations, and suggest directions towards document-level simplification.

up

pdf (full)
bib (full)
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

pdf bib
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing
Tomaž Erjavec | Michał Marcińczuk | Preslav Nakov | Jakub Piskorski | Lidia Pivovarova | Jan Šnajder | Josef Steinberger | Roman Yangarber

pdf bib
Multiple Admissibility : Judging Grammaticality using Unlabeled Data in Language Learning
Anisia Katinskaia | Sardana Ivanova

We present our work on the problem of Multiple Admissibility (MA) in language learning. Multiple Admissibility occurs in many languages when more than one grammatical form of a word fits syntactically and semantically in a given context. In second language (L2) education-in particular, in intelligent tutoring systems / computer-aided language learning (ITS / CALL) systems, which generate exercises automatically-this implies that multiple alternative answers are possible. We treat the problem as a grammaticality judgement task. We train a neural network with an objective to label sentences as grammatical or ungrammatical, using a simulated learner corpus : a dataset with correct text, and with artificial errors generated automatically. While MA occurs commonly in many languages, this paper focuses on learning Russian. We present a detailed classification of the types of constructions in Russian, in which MA is possible, and evaluate the model using a test set built from answers provided by the users of a running language learning system.

pdf bib
AGRR 2019 : Corpus for Gapping Resolution in RussianAGRR 2019: Corpus for Gapping Resolution in Russian
Maria Ponomareva | Kira Droganova | Ivan Smurov | Tatiana Shavrina

This paper provides a comprehensive overview of the gapping dataset for Russian that consists of 7.5k sentences with gapping (as well as 15k relevant negative sentences) and comprises data from various genres : news, fiction, social media and technical texts. The dataset was prepared for the Automatic Gapping Resolution Shared Task for Russian (AGRR-2019)-a competition aimed at stimulating the development of NLP tools and methods for processing of ellipsis. In this paper, we pay special attention to the gapping resolution methods that were introduced within the shared task as well as an alternative test set that illustrates that our corpus is a diverse and representative subset of Russian language gapping sufficient for effective utilization of machine learning techniques.

pdf bib
Data Set for Stance and Sentiment Analysis from User Comments on Croatian NewsCroatian News
Mihaela Bošnjak | Mladen Karan

Nowadays it is becoming more important than ever to find new ways of extracting useful information from the evergrowing amount of user-generated data available online. In this paper, we describe the creation of a data set that contains news articles and corresponding comments from Croatian news outlet 24 sata. Our annotation scheme is specifically tailored for the task of detecting stances and sentiment from user comments as well as assessing if commentator claims are verifiable. Through this data, we hope to get a better understanding of the publics viewpoint on various events. In addition, we also explore the potential of applying supervised machine learning models toautomate annotation of more data.

pdf bib
A Dataset for Noun Compositionality Detection for a Slavic LanguageSlavic Language
Dmitry Puzyrev | Artem Shelmanov | Alexander Panchenko | Ekaterina Artemova

This paper presents the first gold-standard resource for Russian annotated with compositionality information of noun compounds. The compound phrases are collected from the Universal Dependency treebanks according to part of speech patterns, such as ADJ+NOUN or NOUN+NOUN, using the gold-standard annotations. Each compound phrase is annotated by two experts and a moderator according to the following schema : the phrase can be either compositional, non-compositional, or ambiguous (i.e., depending on the context it can be interpreted both as compositional or non-compositional). We conduct an experimental evaluation of models and methods for predicting compositionality of noun compounds in unsupervised and supervised setups. We show that methods from previous work evaluated on the proposed Russian-language resource achieve the performance comparable with results on English corpora.

pdf bib
The Second Cross-Lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic LanguagesSlavic Languages
Jakub Piskorski | Laska Laskova | Michał Marcińczuk | Lidia Pivovarova | Pavel Přibáň | Josef Steinberger | Roman Yangarber

We describe the Second Multilingual Named Entity Challenge in Slavic languages. The task is recognizing mentions of named entities in Web documents, their normalization, and cross-lingual linking. The Challenge was organized as part of the 7th Balto-Slavic Natural Language Processing Workshop, co-located with the ACL-2019 conference. Eight teams participated in the competition, which covered four languages and five entity types. Performance for the named entity recognition task reached 90 % F-measure, much higher than reported in the first edition of the Challenge. Seven teams covered all four languages, and five teams participated in the cross-lingual entity linking task. Detailed evaluation information is available on the shared task web page.

pdf bib
TLR at BSNLP2019 : A Multilingual Named Entity Recognition SystemTLR at BSNLP2019: A Multilingual Named Entity Recognition System
Jose G. Moreno | Elvys Linhares Pontes | Mickael Coustaty | Antoine Doucet

This paper presents our participation at the shared task on multilingual named entity recognition at BSNLP2019. Our strategy is based on a standard neural architecture for sequence labeling. In particular, we use a mixed model which combines multilingualcontextual and language-specific embeddings. Our only submitted run is based on a voting schema using multiple models, one for each of the four languages of the task (Bulgarian, Czech, Polish, and Russian) and another for English. Results for named entity recognition are encouraging for all languages, varying from 60 % to 83 % in terms of Strict and Relaxed metrics, respectively.

pdf bib
JRC TMA-CC : Slavic Named Entity Recognition and Linking. Participation in the BSNLP-2019 shared taskJRC TMA-CC: Slavic Named Entity Recognition and Linking. Participation in the BSNLP-2019 shared task
Guillaume Jacquet | Jakub Piskorski | Hristo Tanev | Ralf Steinberger

We report on the participation of the JRC Text Mining and Analysis Competence Centre (TMA-CC) in the BSNLP-2019 Shared Task, which focuses on named-entity recognition, lemmatisation and cross-lingual linking. We propose a hybrid system combining a rule-based approach and light ML techniques. We use multilingual lexical resources such as JRC-NAMES and BABELNET together with a named entity guesser to recognise names. In a second step, we combine known names with wild cards to increase recognition recall by also capturing inflection variants. In a third step, we increase precision by filtering these name candidates with automatically learnt inflection patterns derived from name occurrences in large news article collections. Our major requirement is to achieve high precision. We achieved an average of 65 % F-measure with 93 % precision on the four languages.

pdf bib
Improving Sentiment Classification in Slovak LanguageSlovak Language
Samuel Pecar | Marian Simko | Maria Bielikova

Using different neural network architectures is widely spread for many different NLP tasks. Unfortunately, most of the research is performed and evaluated only in English language and minor languages are often omitted. We believe using similar architectures for other languages can show interesting results. In this paper, we present our study on methods for improving sentiment classification in Slovak language. We performed several experiments for two different datasets, one containing customer reviews, the other one general Twitter posts. We show comparison of performance of different neural network architectures and also different word representations. We show that another improvement can be achieved by using a model ensemble. We performed experiments utilizing different methods of model ensemble. Our proposed models achieved better results than previous models for both datasets. Our experiments showed also other potential research areas.

up

pdf (full)
bib (full)
Proceedings of the First Workshop on Gender Bias in Natural Language Processing

pdf bib
Proceedings of the First Workshop on Gender Bias in Natural Language Processing
Marta R. Costa-jussà | Christian Hardmeier | Will Radford | Kellie Webster

pdf bib
Relating Word Embedding Gender Biases to Gender Gaps : A Cross-Cultural Analysis
Scott Friedman | Sonja Schmer-Galunder | Anthony Chen | Jeffrey Rye

Modern models for common NLP tasks often employ machine learning techniques and train on journalistic, social media, or other culturally-derived text. These have recently been scrutinized for racial and gender biases, rooting from inherent bias in their training text. These biases are often sub-optimal and recent work poses methods to rectify them ; however, these biases may shed light on actual racial or gender gaps in the culture(s) that produced the training text, thereby helping us understand cultural context through big data. This paper presents an approach for quantifying gender bias in word embeddings, and then using them to characterize statistical gender gaps in education, politics, economics, and health. We validate these metrics on 2018 Twitter data spanning 51 U.S. regions and 99 countries. We correlate state and country word embedding biases with 18 international and 5 U.S.-based statistical gender gaps, characterizing regularities and predictive strength.

pdf bib
Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories
Kaytlin Chaloner | Alfredo Maldonado

Prior work has shown that word embeddings capture human stereotypes, including gender bias. However, there is a lack of studies testing the presence of specific gender bias categories in word embeddings across diverse domains. This paper aims to fill this gap by applying the WEAT bias detection method to four sets of word embeddings trained on corpora from four different domains : news, social networking, biomedical and a gender-balanced corpus extracted from Wikipedia (GAP). We find that some domains are definitely more prone to gender bias than others, and that the categories of gender bias present also vary for each set of word embeddings. We detect some gender bias in GAP. We also propose a simple but novel method for discovering new bias categories by clustering word embeddings. We validate this method through WEAT’s hypothesis testing mechanism and find it useful for expanding the relatively small set of well-known gender bias word categories commonly used in the literature.

pdf bib
Evaluating the Underlying Gender Bias in Contextualized Word Embeddings
Christine Basta | Marta R. Costa-jussà | Noe Casas

Gender bias is highly impacting natural language processing applications. Word embeddings have clearly been proven both to keep and amplify gender biases that are present in current data sources. Recently, contextualized word embeddings have enhanced previous word embedding techniques by computing word vector representations dependent on the sentence they appear in. In this paper, we study the impact of this conceptual change in the word embedding computation in relation with gender bias. Our analysis includes different measures previously applied in the literature to standard word embeddings. Our findings suggest that contextualized word embeddings are less biased than standard ones even when the latter are debiased.

pdf bib
Conceptor Debiasing of Word Representations Evaluated on WEATWEAT
Saket Karve | Lyle Ungar | João Sedoc

Bias in word representations, such as Word2Vec, has been widely reported and investigated, and efforts made to debias them. We apply the debiasing conceptor for post-processing both traditional and contextualized word embeddings. Our method can simultaneously remove racial and gender biases from word representations. Unlike standard debiasing methods, the debiasing conceptor can utilize heterogeneous lists of biased words without loss in performance. Finally, our empirical experiments show that the debiasing conceptor diminishes racial and gender bias of word representations as measured using the Word Embedding Association Test (WEAT) of Caliskan et al.

pdf bib
Filling Gender & Number Gaps in Neural Machine Translation with Black-box Context Injection
Amit Moryossef | Roee Aharoni | Yoav Goldberg

When translating from a language that does not morphologically mark information such as gender and number into a language that does, translation systems must guess this missing information, often leading to incorrect translations in the given context. We propose a black-box approach for injecting the missing information to a pre-trained neural machine translation system, allowing to control the morphological variations in the generated translations without changing the underlying model or training data. We evaluate our method on an English to Hebrew translation task, and show that it is effective in injecting the gender and number information and that supplying the correct information improves the translation accuracy in up to 2.3 BLEU on a female-speaker test set for a state-of-the-art online black-box system. Finally, we perform a fine-grained syntactic analysis of the generated translations that shows the effectiveness of our method.

pdf bib
BERT Masked Language Modeling for Co-reference ResolutionBERT Masked Language Modeling for Co-reference Resolution
Felipe Alfaro | Marta R. Costa-jussà | José A. R. Fonollosa

This paper explains the TALP-UPC participation for the Gendered Pronoun Resolution shared-task of the 1st ACL Workshop on Gender Bias for Natural Language Processing. We have implemented two models for mask language modeling using pre-trained BERT adjusted to work for a classification problem. The proposed solutions are based on the word probabilities of the original BERT model, but using common English names to replace the original test names.

pdf bib
Look Again at the Syntax : Relational Graph Convolutional Network for Gendered Ambiguous Pronoun Resolution
Yinchuan Xu | Junlin Yang

Gender bias has been found in existing coreference resolvers. In order to eliminate gender bias, a gender-balanced dataset Gendered Ambiguous Pronouns (GAP) has been released and the best baseline model achieves only 66.9 % F1. Bidirectional Encoder Representations from Transformers (BERT) has broken several NLP task records and can be used on GAP dataset. However, fine-tune BERT on a specific task is computationally expensive. In this paper, we propose an end-to-end resolver by combining pre-trained BERT with Relational Graph Convolutional Network (R-GCN). R-GCN is used for digesting structural syntactic information and learning better task-specific embeddings. Empirical results demonstrate that, under explicit syntactic supervision and without the need to fine tune BERT, R-GCN’s embeddings outperform the original BERT embeddings on the coreference task. Our work significantly improves the snippet-context baseline F1 score on GAP dataset from 66.9 % to 80.3 %. We participated in the Gender Bias for Natural Language Processing 2019 shared task, and our codes are available online.

pdf bib
Anonymized BERT : An Augmentation Approach to the Gendered Pronoun Resolution ChallengeBERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge
Bo Liu

We present our 7th place solution to the Gendered Pronoun Resolution challenge, which uses BERT without fine-tuning and a novel augmentation strategy designed for contextual embedding token-level tasks. Our method anonymizes the referent by replacing candidate names with a set of common placeholder names. Besides the usual benefits of effectively increasing training data size, this approach diversifies idiosyncratic information embedded in names. Using same set of common first names can also help the model recognize names better, shorten token length, and remove gender and regional biases associated with names. The system scored 0.1947 log loss in stage 2, where the augmentation contributed to an improvements of 0.04. Post-competition analysis shows that, when using different embedding layers, the system scores 0.1799 which would be third place.

up

pdf (full)
bib (full)
Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges

pdf bib
Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges
Jason Eisner | Matthias Gallé | Jeffrey Heinz | Ariadna Quattoni | Guillaume Rabusseau

pdf bib
Relating RNN Layers with the Spectral WFA Ranks in Sequence ModellingRNN Layers with the Spectral WFA Ranks in Sequence Modelling
Farhana Ferdousi Liza | Marek Grzes

We analyse Recurrent Neural Networks (RNNs) to understand the significance of multiple LSTM layers. We argue that the Weighted Finite-state Automata (WFA) trained using a spectral learning algorithm are helpful to analyse RNNs. Our results suggest that multiple LSTM layers in RNNs help learning distributed hidden states, but have a smaller impact on the ability to learn long-term dependencies. The analysis is based on the empirical results, however relevant theory (whenever possible) was discussed to justify and support our conclusions.

pdf bib
Multi-Element Long Distance Dependencies : Using SPk Languages to Explore the Characteristics of Long-Distance DependenciesSPk Languages to Explore the Characteristics of Long-Distance Dependencies
Abhijit Mahalunkar | John Kelleher

In order to successfully model Long Distance Dependencies (LDDs) it is necessary to under-stand the full-range of the characteristics of the LDDs exhibited in a target dataset. In this paper, we use Strictly k-Piecewise languages to generate datasets with various properties. We then compute the characteristics of the LDDs in these datasets using mutual information and analyze the impact of factors such as (i) k, (ii) length of LDDs, (iii) vocabulary size, (iv) forbidden strings, and (v) dataset size. This analysis reveal that the number of interacting elements in a dependency is an important characteristic of LDDs. This leads us to the challenge of modelling multi-element long-distance dependencies. Our results suggest that attention mechanisms in neural networks may aide in modeling datasets with multi-element long-distance dependencies. However, we conclude that there is a need to develop more efficient attention mechanisms to address this issue.

pdf bib
LSTM Networks Can Perform Dynamic CountingLSTM Networks Can Perform Dynamic Counting
Mirac Suzgun | Yonatan Belinkov | Stuart Shieber | Sebastian Gehrmann

In this paper, we systematically assess the ability of standard recurrent networks to perform dynamic counting and to encode hierarchical representations. All the neural models in our experiments are designed to be small-sized networks both to prevent them from memorizing the training sets and to visualize and interpret their behaviour at test time. Our results demonstrate that the Long Short-Term Memory (LSTM) networks can learn to recognize the well-balanced parenthesis language (Dyck-1) and the shuffles of multiple Dyck-1 languages, each defined over different parenthesis-pairs, by emulating simple real-time k-counter machines. To the best of our knowledge, this work is the first study to introduce the shuffle languages to analyze the computational power of neural networks. We also show that a single-layer LSTM with only one hidden unit is practically sufficient for recognizing the Dyck-1 language. However, none of our recurrent networks was able to yield a good performance on the Dyck-2 language learning task, which requires a model to have a stack-like mechanism for recognition.

up

pdf (full)
bib (full)
Proceedings of the 13th Linguistic Annotation Workshop

pdf bib
Proceedings of the 13th Linguistic Annotation Workshop
Annemarie Friedrich | Deniz Zeyrek | Jet Hoek

pdf bib
WiRe57 : A Fine-Grained Benchmark for Open Information ExtractionWiRe57 : A Fine-Grained Benchmark for Open Information Extraction
William Lechelle | Fabrizio Gotti | Phillippe Langlais

We build a reference for the task of Open Information Extraction, on five documents. We tentatively resolve a number of issues that arise, including coreference and granularity, and we take steps toward addressing inference, a significant problem. We seek to better pinpoint the requirements for the task. We produce our annotation guidelines specifying what is correct to extract and what is not. In turn, we use this reference to score existing Open IE systems. We address the non-trivial problem of evaluating the extractions produced by systems against the reference tuples, and share our evaluation script. Among seven compared extractors, we find the MinIE system to perform best.

pdf bib
Crowdsourcing Discourse Relation Annotations by a Two-Step Connective Insertion Task
Frances Yung | Vera Demberg | Merel Scholman

The perspective of being able to crowd-source coherence relations bears the promise of acquiring annotations for new texts quickly, which could then increase the size and variety of discourse-annotated corpora. It would also open the avenue to answering new research questions : Collecting annotations from a larger number of individuals per instance would allow to investigate the distribution of inferred relations, and to study individual differences in coherence relation interpretation. However, annotating coherence relations with untrained workers is not trivial. We here propose a novel two-step annotation procedure, which extends an earlier method by Scholman and Demberg (2017a). In our approach, coherence relation labels are inferred from connectives that workers insert into the text. We show that the proposed method leads to replicable coherence annotations, and analyse the agreement between the obtained relation labels and annotations from PDTB and RSTDT on the same texts.

pdf bib
Annotating and analyzing the interactions between meaning relations
Darina Gold | Venelin Kovatchev | Torsten Zesch

Pairs of sentences, phrases, or other text pieces can hold semantic relations such as paraphrasing, textual entailment, contradiction, specificity, and semantic similarity. These relations are usually studied in isolation and no dataset exists where they can be compared empirically. Here we present a corpus annotated with these relations and the analysis of these results. The corpus contains 520 sentence pairs, annotated with these relations. We measure the annotation reliability of each individual relation and we examine their interactions and correlations. Among the unexpected results revealed by our analysis is that the traditionally considered direct relationship between paraphrasing and bi-directional entailment does not hold in our data.

pdf bib
Tagging modality in Oceanic languages of Melanesia
Annika Tjuka | Lena Weißmann | Kilu von Prince

Primary data from small, low-resource languages of Oceania have only recently become available through language documentation. In our study, we explore corpus data of five Oceanic languages of Melanesia which are known to be mood-prominent (in the sense of Bhat, 1999). In order to find out more about tense, aspect, modality, and polarity, we tagged these categories in a subset of our corpora. For the category of modality, we developed a novel tag set (MelaTAMP, 2017), which categorizes clauses into factual, possible, and counterfactual. Based on an analysis of the inter-annotator consistency, we argue that our tag set for the modal domain is efficient for our subject languages and might be useful for other languages and purposes.

pdf bib
Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for LatinLatin
Francesco Mambrini | Marco Passarotti

The interoperability between lemmatized corpora of Latin and other resources that use the lemma as indexing key is hampered by the multiple lemmatization strategies that different projects adopt. In this paper we discuss how we tackle the challenges raised by harmonizing different lemmatization criteria in the context of a project that aims to connect linguistic resources for Latin using the Linked Data paradigm. The paper introduces the architecture supporting an open-ended, lemma-based Knowledge Base, built to make textual and lexical resources for Latin interoperable. Particularly, the paper describes the inclusion into the Knowledge Base of its lexical basis, of a word formation lexicon and of a lemmatized and syntactically annotated corpus.

pdf bib
An Online Annotation Assistant for Argument Schemes
John Lawrence | Jacky Visser | Chris Reed

Understanding the inferential principles underpinning an argument is essential to the proper interpretation and evaluation of persuasive discourse. Argument schemes capture the conventional patterns of reasoning appealed to in persuasion. The empirical study of these patterns relies on the availability of data about the actual use of argumentation in communicative practice. Annotated corpora of argument schemes, however, are scarce, small, and unrepresentative. Aiming to address this issue, we present one step in the development of improved datasets by integrating the Argument Scheme Key a novel annotation method based on one of the most popular typologies of argument schemes into the widely used OVA software for argument analysis.

pdf bib
Explaining Simple Natural Language Inference
Aikaterini-Lida Kalouli | Annebeth Buis | Livy Real | Martha Palmer | Valeria de Paiva

The vast amount of research introducing new corpora and techniques for semi-automatically annotating corpora shows the important role that datasets play in today’s research, especially in the machine learning community. This rapid development raises concerns about the quality of the datasets created and consequently of the models trained, as recently discussed with respect to the Natural Language Inference (NLI) task. In this work we conduct an annotation experiment based on a small subset of the SICK corpus. The experiment reveals several problems in the annotation guidelines, and various challenges of the NLI task itself. Our quantitative evaluation of the experiment allows us to assign our empirical observations to specific linguistic phenomena and leads us to recommendations for future annotation tasks, for NLI and possibly for other tasks.

pdf bib
On the role of discourse relations in persuasive texts
Ines Rehbein

This paper investigates the use of explicitly signalled discourse relations in persuasive texts. We present a corpus study where we control for speaker and topic and show that the distribution of different discourse connectives varies considerably across different discourse settings. While this variation can be explained by genre differences, we also observe variation regarding the distribution of discourse relations across different settings. This variation, however, can not be easily explained by genre differences. We argue that the differences regarding the use of discourse relations reflects different strategies of persuasion and that these might be due to audience design.

pdf bib
One format to rule them all The emtsv pipeline for HungarianHungarian
Balázs Indig | Bálint Sass | Eszter Simon | Iván Mittelholcz | Noémi Vadász | Márton Makrai

We present a more efficient version of the e-magyar NLP pipeline for Hungarian called emtsv. It integrates Hungarian NLP tools in a framework whose individual modules can be developed or replaced independently and allows new ones to be added. The design also allows convenient investigation and manual correction of the data flow from one module to another. The improvements we publish include effective communication between the modules and support of the use of individual modules both in the chain and standing alone. Our goals are accomplished using extended tsv (tab separated values) files, a simple, uniform, generic and self-documenting input / output format. Our vision is maintaining the system for a long time and making it easier for external developers to fit their own modules into the system, thus sharing existing competencies in the field of processing Hungarian, a mid-resourced language. The source code is available under LGPL 3.0 license at https://github.com/dlt-rilmta/emtsv.

pdf bib
Turkish Treebanking : Unifying and Constructing EffortsTurkish Treebanking: Unifying and Constructing Efforts
Utku Türk | Furkan Atmaca | Şaziye Betül Özateş | Abdullatif Köksal | Balkiz Ozturk Basaran | Tunga Gungor | Arzucan Özgür

In this paper, we present the current version of two different treebanks, the re-annotation of the Turkish PUD Treebank and the first annotation of the Turkish National Corpus Universal Dependency (henceforth TNC-UD). The annotation of both treebanks, the Turkish PUD Treebank and TNC-UD, was carried out based on the decisions concerning linguistic adequacy of re-annotation of the Turkish IMST-UD Treebank (Trk et. al., forthcoming). Both of the treebanks were annotated with the same annotation process and morphological and syntactic analyses. The TNC-UD is planned to have 10,000 sentences. In this paper, we will present the first 500 sentences along with the annotation PUD Treebank. Moreover, this paper also offers the parsing results of a graph-based neural parser on the previous and re-annotated PUD, as well as the TNC-UD. In light of the comparisons, even though we observe a slight decrease in the attachment scores of the Turkish PUD treebank, we demonstrate that the annotation of the TNC-UD improves the parsing accuracy of Turkish. In addition to the treebanks, we have also constructed a custom annotation software with advanced filtering and morphological editing options. Both the treebanks, including a full edit-history and the annotation guidelines, and the custom software are publicly available under an open license online.

pdf bib
A Dataset for Semantic Role Labelling of Hindi-English Code-Mixed TweetsHindi-English Code-Mixed Tweets
Riya Pal | Dipti Sharma

We present a data set of 1460 Hindi-English code-mixed tweets consisting of 20,949 tokens labelled with Proposition Bank labels marking their semantic roles. We created verb frames for complex predicates present in the corpus and formulated mappings from Paninian dependency labels to Proposition Bank labels. With the help of these mappings and the dependency tree, we propose a baseline rule based system for Semantic Role Labelling of Hindi-English code-mixed data. We obtain an accuracy of 96.74 % for Argument Identification and are able to further classify 73.93 % of the labels correctly. While there is relevant ongoing research on Semantic Role Labelling and on building tools for code-mixed social media data, this is the first attempt at labelling semantic roles in code-mixed data, to the best of our knowledge.

pdf bib
A Multi-Platform Annotation Ecosystem for Domain Adaptation
Richard Eckart de Castilho | Nancy Ide | Jin-Dong Kim | Jan-Christoph Klie | Keith Suderman

This paper describes an ecosystem consisting of three independent text annotation platforms. To demonstrate their ability to work in concert, we illustrate how to use them to address an interactive domain adaptation task in biomedical entity recognition. The platforms and the approach are in general domain-independent and can be readily applied to other areas of science.

pdf bib
Comparative judgments are more consistent than binary classification for labelling word complexity
Sian Gooding | Ekaterina Kochmar | Advait Sarkar | Alan Blackwell

Lexical simplification systems replace complex words with simple ones based on a model of which words are complex in context. We explore how users can help train complex word identification models through labelling more efficiently and reliably. We show that using an interface where annotators make comparative rather than binary judgments leads to more reliable and consistent labels, and explore whether comparative judgments may provide a faster way for collecting labels.

up

pdf (full)
bib (full)
Proceedings of the First Workshop on NLP for Conversational AI

pdf bib
Proceedings of the First Workshop on NLP for Conversational AI
Yun-Nung Chen | Tania Bedrax-Weiss | Dilek Hakkani-Tur | Anuj Kumar | Mike Lewis | Thang-Minh Luong | Pei-Hao Su | Tsung-Hsien Wen

pdf bib
A Repository of Conversational Datasets
Matthew Henderson | Paweł Budzianowski | Iñigo Casanueva | Sam Coope | Daniela Gerz | Girish Kumar | Nikola Mrkšić | Georgios Spithourakis | Pei-Hao Su | Ivan Vulić | Tsung-Hsien Wen

Progress in Machine Learning is often driven by the availability of large datasets, and consistent evaluation metrics for comparing modeling approaches. To this end, we present a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using 1-of-100 accuracy. The repository contains scripts that allow researchers to reproduce the standard datasets, or to adapt the pre-processing and data filtering steps to their needs. We introduce and evaluate several competitive baselines for conversational response selection, whose implementations are shared in the repository, as well as a neural encoder model that is trained on the entire training set.

pdf bib
Building a Production Model for Retrieval-Based Chatbots
Kyle Swanson | Lili Yu | Christopher Fox | Jeremy Wohlwend | Tao Lei

Response suggestion is an important task for building human-computer conversation systems. Recent approaches to conversation modeling have introduced new model architectures with impressive results, but relatively little attention has been paid to whether these models would be practical in a production setting. In this paper, we describe the unique challenges of building a production retrieval-based conversation system, which selects outputs from a whitelist of candidate responses. To address these challenges, we propose a dual encoder architecture which performs rapid inference and scales well with the size of the whitelist. We also introduce and compare two methods for generating whitelists, and we carry out a comprehensive analysis of the model and whitelists. Experimental results on a large, proprietary help desk chat dataset, including both offline metrics and a human evaluation, indicate production-quality performance and illustrate key lessons about conversation modeling in practice.

pdf bib
DSTC7 Task 1 : Noetic End-to-End Response SelectionDSTC7 Task 1: Noetic End-to-End Response Selection
Chulaka Gunasekara | Jonathan K. Kummerfeld | Lazaros Polymenakos | Walter Lasecki

Goal-oriented dialogue in complex domains is an extremely challenging problem and there are relatively few datasets. This task provided two new resources that presented different challenges : one was focused but small, while the other was large but diverse. We also considered several new variations on the next utterance selection problem : (1) increasing the number of candidates, (2) including paraphrases, and (3) not including a correct option in the candidate set. Twenty teams participated, developing a range of neural network models, including some that successfully incorporated external data to boost performance. Both datasets have been publicly released, enabling future work to build on these results, working towards robust goal-oriented dialogue systems.

pdf bib
End-to-End Neural Context Reconstruction in Chinese DialogueChinese Dialogue
Wei Yang | Rui Qiao | Haocheng Qin | Amy Sun | Luchen Tan | Kun Xiong | Ming Li

We tackle the problem of context reconstruction in Chinese dialogue, where the task is to replace pronouns, zero pronouns, and other referring expressions with their referent nouns so that sentences can be processed in isolation without context. Following a standard decomposition of the context reconstruction task into referring expression detection and coreference resolution, we propose a novel end-to-end architecture for separately and jointly accomplishing this task. Key features of this model include POS and position encoding using CNNs and a novel pronoun masking mechanism. One perennial problem in building such models is the paucity of training data, which we address by augmenting previously-proposed methods to generate a large amount of realistic training data. The combination of more data and better models yields accuracy higher than the state-of-the-art method in coreference resolution and end-to-end context reconstruction.

pdf bib
Relevant and Informative Response Generation using Pointwise Mutual Information
Junya Takayama | Yuki Arase

A sequence-to-sequence model tends to generate generic responses with little information for input utterances. To solve this problem, we propose a neural model that generates relevant and informative responses. Our model has simple architecture to enable easy application to existing neural dialogue models. Specifically, using positive pointwise mutual information, it first identifies keywords that frequently co-occur in responses given an utterance. Then, the model encourages the decoder to use the keywords for response generation. Experiment results demonstrate that our model successfully diversifies responses relative to previous models.

pdf bib
Responsive and Self-Expressive Dialogue Generation
Kozo Chikai | Junya Takayama | Yuki Arase

A neural conversation model is a promising approach to develop dialogue systems with the ability of chit-chat. It allows training a model in an end-to-end manner without complex rule design nor feature engineering. However, as a side effect, the neural model tends to generate safe but uninformative and insensitive responses like OK and I do n’t know. Such replies are called generic responses and regarded as a critical problem for user-engagement of dialogue systems. For a more engaging chit-chat experience, we propose a neural conversation model that generates responsive and self-expressive replies. Specifically, our model generates domain-aware and sentiment-rich responses. Experiments empirically confirmed that our model outperformed the sequence-to-sequence model ; 68.1 % of our responses were domain-aware with sentiment polarities, which was only 2.7 % for responses generated by the sequence-to-sequence model.

up

pdf (full)
bib (full)
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

pdf bib
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
Garrett Nicolai | Ryan Cotterell

pdf bib
AX Semantics’ Submission to the SIGMORPHON 2019 Shared TaskAX Semantics’ Submission to the SIGMORPHON 2019 Shared Task
Andreas Madsack | Robert Weißgraeber

This paper describes the AX Semantics’ submission to the SIGMORPHON 2019 shared task on morphological reinflection. We implemented two systems, both tackling the task for all languages in one codebase, without any underlying language specific features. The first one is an encoder-decoder model using AllenNLP ; the second system uses the same model modified by a custom trainer that trains only with the target language resources after a specific threshold. We especially focused on building an implementation using AllenNLP with out-of-the-box methods to facilitate easy operation and reuse.

pdf bib
ITIST at the SIGMORPHON 2019 Shared Task : Sparse Two-headed Models for InflectionITIST at the SIGMORPHON 2019 Shared Task: Sparse Two-headed Models for Inflection
Ben Peters | André F. T. Martins

This paper presents the Instituto de TelecomunicaesInstituto Superior Tcnico submission to Task 1 of the SIGMORPHON 2019 Shared Task. Our models combine sparse sequence-to-sequence models with a two-headed attention mechanism that learns separate attention distributions for the lemma and inflectional tags. Among submissions to Task 1, our models rank second and third. Despite the low data setting of the task (only 100 in-language training examples), they learn plausible inflection patterns and often concentrate all probability mass into a small set of hypotheses, making beam search exact.

pdf bib
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in MorphologyCMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology
Aditi Chaudhary | Elizabeth Salesky | Gayatri Bhat | David R. Mortensen | Jaime Carbonell | Yulia Tsvetkov

This paper presents the submission by the CMU-01 team to the SIGMORPHON 2019 task 2 of Morphological Analysis and Lemmatization in Context. This task requires us to produce the lemma and morpho-syntactic description of each token in a sequence, for 107 treebanks. We approach this task with a hierarchical neural conditional random field (CRF) model which predicts each coarse-grained feature (eg. POS, Case, etc.) independently. However, most treebanks are under-resourced, thus making it challenging to train deep neural models for them. Hence, we propose a multi-lingual transfer training regime where we transfer from multiple related languages that share similar typology.

pdf bib
THOMAS : The Hegemonic OSU Morphological Analyzer using Seq2seqTHOMAS: The Hegemonic OSU Morphological Analyzer using Seq2seq
Byung-Doh Oh | Pranav Maneriker | Nanjiang Jiang

This paper describes the OSU submission to the SIGMORPHON 2019 shared task, Crosslinguality and Context in Morphology. Our system addresses the contextual morphological analysis subtask of Task 2, which is to produce the morphosyntactic description (MSD) of each fully inflected word within a given sentence. We frame this as a sequence generation task and employ a neural encoder-decoder (seq2seq) architecture to generate the sequence of MSD tags given the encoded representation of each token. Follow-up analyses reveal that our system most significantly improves performance on morphologically complex languages whose inflected word forms typically have longer MSD tag sequences. In addition, our system seems to capture the structured correlation between MSD tags, such as that between the verb tag and TAM-related tags.contextual morphological analysis subtask of Task 2, which is to produce the morphosyntactic description (MSD) of each fully inflected word within a given sentence. We frame this as a sequence generation task and employ a neural encoder-decoder (seq2seq) architecture to generate the sequence of MSD tags given the encoded representation of each token. Follow-up analyses reveal that our system most significantly improves performance on morphologically complex languages whose inflected word forms typically have longer MSD tag sequences. In addition, our system seems to capture the structured correlation between MSD tags, such as that between the “verb” tag and TAM-related tags.

pdf bib
Sigmorphon 2019 Task 2 system description paper : Morphological analysis in context for many languages, with supervision from only a few
Brad Aiken | Jared Kelly | Alexis Palmer | Suleyman Olcay Polat | Taraka Rama | Rodney Nielsen

This paper presents the UNT HiLT+Ling system for the Sigmorphon 2019 shared Task 2 : Morphological Analysis and Lemmatization in Context. Our core approach focuses on the morphological tagging task ; part-of-speech tagging and lemmatization are treated as secondary tasks. Given the highly multilingual nature of the task, we propose an approach which makes minimal use of the supplied training data, in order to be extensible to languages without labeled training data for the morphological inflection task. Specifically, we use a parallel Bible corpus to align contextual embeddings at the verse level. The aligned verses are used to build cross-language translation matrices, which in turn are used to map between embedding spaces for the various languages. Finally, we use sets of inflected forms, primarily from a high-resource language, to induce vector representations for individual UniMorph tags. Morphological analysis is performed by matching vector representations to embeddings for individual tokens. While our system results are dramatically below the average system submitted for the shared task evaluation campaign, our method is (we suspect) unique in its minimal reliance on labeled training data.

pdf bib
UDPipe at SIGMORPHON 2019 : Contextualized Embeddings, Regularization with Morphological Categories, Corpora MergingUDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging
Milan Straka | Jana Straková | Jan Hajic

We present our contribution to the SIGMORPHON 2019 Shared Task : Crosslinguality and Context in Morphology, Task 2 : contextual morphological analysis and lemmatization. We submitted a modification of the UDPipe 2.0, one of best-performing systems of the CoNLL 2018 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies and an overall winner of the The 2018 Shared Task on Extrinsic Parser Evaluation. As our first improvement, we use the pretrained contextualized embeddings (BERT) as additional inputs to the network ; secondly, we use individual morphological features as regularization ; and finally, we merge the selected corpora of the same language. In the lemmatization task, our system exceeds all the submitted systems by a wide margin with lemmatization accuracy 95.78 (second best was 95.00, third 94.46). In the morphological analysis, our system placed tightly second : our morphological analysis accuracy was 93.19, the winning system’s 93.23.

pdf bib
A Little Linguistics Goes a Long Way : Unsupervised Segmentation with Limited Language Specific Guidance
Alexander Erdmann | Salam Khalifa | Mai Oudah | Nizar Habash | Houda Bouamor

We present de-lexical segmentation, a linguistically motivated alternative to greedy or other unsupervised methods, requiring only minimal language specific input. Our technique involves creating a small grammar of closed-class affixes which can be written in a few hours. The grammar over generates analyses for word forms attested in a raw corpus which are disambiguated based on features of the linguistic base proposed for each form. Extending the grammar to cover orthographic, morpho-syntactic or lexical variation is simple, making it an ideal solution for challenging corpora with noisy, dialect-inconsistent, or otherwise non-standard content. In two evaluations, we consistently outperform competitive unsupervised baselines and approach the performance of state-of-the-art supervised models trained on large amounts of data, providing evidence for the value of linguistic input during preprocessing.

pdf bib
Equiprobable mappings in weighted constraint grammars
Arto Anttila | Scott Borgeson | Giorgio Magri

We show that MaxEnt is so rich that it can distinguish between any two different mappings : there always exists a nonnegative weight vector which assigns them different MaxEnt probabilities. Stochastic HG instead does admit equiprobable mappings and we give a complete formal characterization of them.

pdf bib
Unbounded Stress in Subregular Phonology
Yiding Hao | Samuel Andersson

This paper situates culminative unbounded stress systems within the subregular hierarchy for functions. While Baek (2018) has argued that such systems can be uniformly understood as input tier-based strictly local constraints, we show here that default-to-opposite-side and default-to-same-side stress systems belong to distinct subregular classes when they are viewed as functions that assign primary stress to underlying forms. While the former system can be captured by input tier-based input strictly local functions, a subsequential function class that we define here, the latter system is not subsequential, though it is weakly deterministic according to McCollum et al.’s (2018) non-interaction criterion. Our results motivate the extension of recently proposed subregular language classes to subregular functions and argue in favor of McCollum et al’s definition of weak determinism over that of Heinz and Lai (2013).

pdf bib
Convolutional neural networks for low-resource morpheme segmentation : baseline or state-of-the-art?
Alexey Sorokin

We apply convolutional neural networks to the task of shallow morpheme segmentation using low-resource datasets for 5 different languages. We show that both in fully supervised and semi-supervised settings our model beats previous state-of-the-art approaches. We argue that convolutional neural networks reflect local nature of morpheme segmentation better than other semi-supervised approaches.

pdf bib
Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages
Ramy Eskander | Judith Klavans | Smaranda Muresan

Polysynthetic languages pose a challenge for morphological analysis due to the root-morpheme complexity and to the word class squish. In addition, many of these polysynthetic languages are low-resource. We propose unsupervised approaches for morphological segmentation of low-resource polysynthetic languages based on Adaptor Grammars (AG) (Eskander et al., 2016). We experiment with four languages from the Uto-Aztecan family. Our AG-based approaches outperform other unsupervised approaches and show promise when compared to supervised methods, outperforming them on two of the four languages.

pdf bib
The SIGMORPHON 2019 Shared Task : Morphological Analysis in Context and Cross-Lingual Transfer for InflectionSIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection
Arya D. McCarthy | Ekaterina Vylomova | Shijie Wu | Chaitanya Malaviya | Lawrence Wolf-Sonkin | Garrett Nicolai | Christo Kirov | Miikka Silfverberg | Sabrina J. Mielke | Jeffrey Heinz | Ryan Cotterell | Mans Hulden

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years’ inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low-resource language. This year also presents a new second challenge on lemmatization and morphological feature analysis in context. All submissions featured a neural component and built on either this year’s strong baselines or highly ranked systems from previous years’ shared tasks. Every participating team improved in accuracy over the baselines for the inflection task (though not Levenshtein distance), and every team in the contextual analysis task improved on both state-of-the-art neural and non-neural baselines.

up

pdf (full)
bib (full)
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)

pdf bib
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)
Isabelle Augenstein | Spandana Gella | Sebastian Ruder | Katharina Kann | Burcu Can | Johannes Welbl | Alexis Conneau | Xiang Ren | Marek Rei

pdf bib
Deep Generalized Canonical Correlation Analysis
Adrian Benton | Huda Khayrallah | Biman Gujral | Dee Ann Reisinger | Sheng Zhang | Raman Arora

We present Deep Generalized Canonical Correlation Analysis (DGCCA) a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other. While methods for nonlinear two view representation learning (Deep CCA, (Andrew et al., 2013)) and linear many-view representation learning (Generalized CCA (Horst, 1961)) exist, DGCCA combines the flexibility of nonlinear (deep) representation learning with the statistical power of incorporating information from many sources, or views. We present the DGCCA formulation as well as an efficient stochastic optimization algorithm for solving it. We learn and evaluate DGCCA representations for three downstream tasks : phonetic transcription from acoustic & articulatory measurements, recommending hashtags and recommending friends on a dataset of Twitter users.

pdf bib
Multilingual NMT with a Language-Independent Attention BridgeNMT with a Language-Independent Attention Bridge
Raúl Vázquez | Alessandro Raganato | Jörg Tiedemann | Mathias Creutz

In this paper, we propose an architecture for machine translation (MT) capable of obtaining multilingual sentence representations by incorporating an intermediate attention bridge that is shared across all languages. We train the model with language-specific encoders and decoders that are connected through an inner-attention layer on the encoder side. The attention bridge exploits the semantics from each language for translation and develops into a language-agnostic meaning representation that can efficiently be used for transfer learning. We present a new framework for the efficient development of multilingual neural machine translation (NMT) using this model and scheduled training. We have tested the approach in a systematic way with a multi-parallel data set. The model achieves substantial improvements over strong bilingual models and performs well for zero-shot translation, which demonstrates its ability of abstraction and transfer learning.

pdf bib
Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks
Maxim Kodryan | Artem Grachev | Dmitry Ignatov | Dmitry Vetrov

Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90 % of the weights in both encoder and decoder layers can be removed with a minimal quality loss.

pdf bib
Specializing Distributional Vectors of All Words for Lexical Entailment
Aishwarya Kamath | Jonas Pfeiffer | Edoardo Maria Ponti | Goran Glavaš | Ivan Vulić

Semantic specialization methods fine-tune distributional word vectors using lexical knowledge from external resources (e.g. WordNet) to accentuate a particular relation between words. However, such post-processing methods suffer from limited coverage as they affect only vectors of words seen in the external resources. We present the first post-processing method that specializes vectors of all vocabulary words including those unseen in the resources for the asymmetric relation of lexical entailment (LE) (i.e., hyponymy-hypernymy relation). Leveraging a partially LE-specialized distributional space, our POSTLE (i.e., post-specialization for LE) model learns an explicit global specialization function, allowing for specialization of vectors of unseen words, as well as word vectors from other languages via cross-lingual transfer. We capture the function as a deep feed-forward neural network : its objective re-scales vector norms to reflect the concept hierarchy while simultaneously attracting hyponymy-hypernymy pairs to better reflect semantic similarity. An extended model variant augments the basic architecture with an adversarial discriminator. We demonstrate the usefulness and versatility of POSTLE models with different input distributional spaces in different scenarios (monolingual LE and zero-shot cross-lingual LE transfer) and tasks (binary and graded LE). We report consistent gains over state-of-the-art LE-specialization methods, and successfully LE-specialize word vectors for languages without any external lexical knowledge.

pdf bib
Composing Noun Phrase Vector Representations
Aikaterini-Lida Kalouli | Valeria de Paiva | Richard Crouch

Vector representations of words have seen an increasing success over the past years in a variety of NLP tasks. While there seems to be a consensus about the usefulness of word embeddings and how to learn them, it is still unclear which representations can capture the meaning of phrases or even whole sentences. Recent work has shown that simple operations outperform more complex deep architectures. In this work, we propose two novel constraints for computing noun phrase vector representations. First, we propose that the semantic and not the syntactic contribution of each component of a noun phrase should be considered, so that the resulting composed vectors express more of the phrase meaning. Second, the composition process of the two phrase vectors should apply suitable dimensions’ selection in a way that specific semantic features captured by the phrase’s meaning become more salient. Our proposed methods are compared to 11 other approaches, including popular baselines and a neural net architecture, and are evaluated across 6 tasks and 2 datasets. Our results show that these constraints lead to more expressive phrase representations and can be applied to other state-of-the-art methods to improve their performance.

pdf bib
Towards Robust Named Entity Recognition for Historic GermanGerman
Stefan Schweter | Johannes Baiter

In this paper we study the influence of using language model pre-training for named entity recognition for Historic German. We achieve new state-of-the-art results using carefully chosen training data for language models. For a low-resource domain like named entity recognition for Historic German, language model pre-training can be a strong competitor to CRF-only methods. We show that language model pre-training can be more effective than using transfer-learning with labeled datasets. Furthermore, we introduce a new language model pre-training objective, synthetic masked language model pre-training (SMLM), that allows a transfer from one domain (contemporary texts) to another domain (historical texts) by using only the same (character) vocabulary. Results show that using SMLM can achieve comparable results for Historic named entity recognition, even when they are only trained on contemporary texts. Our pre-trained character-based language models improve upon classical CRF-based methods and previous work on Bi-LSTMs by boosting F1 score performance by up to 6 %.

pdf bib
On Evaluating Embedding Models for Knowledge Base Completion
Yanjie Wang | Daniel Ruffinelli | Rainer Gemulla | Samuel Broscheit | Christian Meilicke

Knowledge graph embedding models have recently received significant attention in the literature. These models learn latent semantic representations for the entities and relations in a given knowledge base ; the representations can be used to infer missing knowledge. In this paper, we study the question of how well recent embedding models perform for the task of knowledge base completion, i.e., the task of inferring new facts from an incomplete knowledge base. We argue that the entity ranking protocol, which is currently used to evaluate knowledge graph embedding models, is not suitable to answer this question since only a subset of the model predictions are evaluated. We propose an alternative entity-pair ranking protocol that considers all model predictions as a whole and is thus more suitable to the task. We conducted an experimental study on standard datasets and found that the performance of popular embeddings models was unsatisfactory under the new protocol, even on datasets that are generally considered to be too easy. Moreover, we found that a simple rule-based model often provided superior performance. Our findings suggest that there is a need for more research into embedding models as well as their training strategies for the task of knowledge base completion.

pdf bib
Constructive Type-Logical Supertagging With Self-Attention Networks
Konstantinos Kogkalidis | Michael Moortgat | Tejaswini Deoskar

We propose a novel application of self-attention networks towards grammar induction. We present an attention-based supertagger for a refined type-logical grammar, trained on constructing types inductively. In addition to achieving a high overall type accuracy, our model is able to learn the syntax of the grammar’s type system along with its denotational semantics. This lifts the closed world assumption commonly made by lexicalized grammar supertaggers, greatly enhancing its generalization potential. This is evidenced both by its adequate accuracy over sparse word types and its ability to correctly construct complex types never seen during training, which, to the best of our knowledge, was as of yet unaccomplished.

pdf bib
An Empirical Study on Pre-trained Embeddings and Language Models for Bot Detection
Andres Garcia-Silva | Cristian Berrio | José Manuel Gómez-Pérez

Fine-tuning pre-trained language models has significantly advanced the state of art in a wide range of NLP downstream tasks. Usually, such language models are learned from large and well-formed text corpora from e.g. encyclopedic resources, books or news. However, a significant amount of the text to be analyzed nowadays is Web data, often from social media. In this paper we consider the research question : How do standard pre-trained language models generalize and capture the peculiarities of rather short, informal and frequently automatically generated text found in social media? To answer this question, we focus on bot detection in Twitter as our evaluation task and test the performance of fine-tuning approaches based on language models against popular neural architectures such as LSTM and CNN combined with pre-trained and contextualized embeddings. Our results also show strong performance variations among the different language model approaches, which suggest further research.

pdf bib
Probing Multilingual Sentence Representations With X-ProbeX-Probe
Vinit Ravishankar | Lilja Øvrelid | Erik Velldal

This paper extends the task of probing sentence representations for linguistic insight in a multilingual domain. In doing so, we make two contributions : first, we provide datasets for multilingual probing, derived from Wikipedia, in five languages, viz. English, French, German, Spanish and Russian. Second, we evaluate six sentence encoders for each language, each trained by mapping sentence representations to English sentence representations, using sentences in a parallel corpus. We discover that cross-lingually mapped representations are often better at retaining certain linguistic information than representations derived from English encoders trained on natural language inference (NLI) as a downstream task.

pdf bib
Learning Multilingual Meta-Embeddings for Code-Switching Named Entity Recognition
Genta Indra Winata | Zhaojiang Lin | Pascale Fung

In this paper, we propose Multilingual Meta-Embeddings (MME), an effective method to learn multilingual representations by leveraging monolingual pre-trained embeddings. MME learns to utilize information from these embeddings via a self-attention mechanism without explicit language identification. We evaluate the proposed embedding method on the code-switching English-Spanish Named Entity Recognition dataset in a multilingual and cross-lingual setting. The experimental results show that our proposed method achieves state-of-the-art performance on the multilingual setting, and it has the ability to generalize to an unseen language task.

pdf bib
Investigating Sub-Word Embedding Strategies for the Morphologically Rich and Free Phrase-Order HungarianHungarian
Bálint Döbrössy | Márton Makrai | Balázs Tarján | György Szaszák

For morphologically rich languages, word embeddings provide less consistent semantic representations due to higher variance in word forms. Moreover, these languages often allow for less constrained word order, which further increases variance. For the highly agglutinative Hungarian, semantic accuracy of word embeddings measured on word analogy tasks drops by 50-75 % compared to English. We observed that embeddings learn morphosyntax quite well instead. Therefore, we explore and evaluate several sub-word unit based embedding strategies character n-grams, lemmatization provided by an NLP-pipeline, and segments obtained in unsupervised learning (morfessor) to boost semantic consistency in Hungarian word vectors. The effect of changing embedding dimension and context window size have also been considered. Morphological analysis based lemmatization was found to be the best strategy to improve embeddings’ semantic accuracy, whereas adding character n-grams was found consistently counterproductive in this regard.

pdf bib
A Self-Training Approach for Short Text Clustering
Amir Hadifar | Lucas Sterckx | Thomas Demeester | Chris Develder

Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations of the short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem : their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method.

pdf bib
Improving Word Embeddings Using Kernel PCAPCA
Vishwani Gupta | Sven Giesselbach | Stefan Rüping | Christian Bauckhage

Word-based embedding approaches such as Word2Vec capture the meaning of words and relations between them, particularly well when trained with large text collections ; however, they fail to do so with small datasets. Extensions such as fastText reduce the amount of data needed slightly, however, the joint task of learning meaningful morphology, syntactic and semantic representations still requires a lot of data. In this paper, we introduce a new approach to warm-start embedding models with morphological information, in order to reduce training time and enhance their performance. We use word embeddings generated using both word2vec and fastText models and enrich them with morphological information of words, derived from kernel principal component analysis (KPCA) of word similarity matrices. This can be seen as explicitly feeding the network morphological similarities and letting it learn semantic and syntactic similarities. Evaluating our models on word similarity and analogy tasks in English and German, we find that they not only achieve higher accuracies than the original skip-gram and fastText models but also require significantly less training data and time. Another benefit of our approach is that it is capable of generating a high-quality representation of infrequent words as, for example, found in very recent news articles with rapidly changing vocabularies. Lastly, we evaluate the different models on a downstream sentence classification task in which a CNN model is initialized with our embeddings and find promising results.

pdf bib
Assessing Incrementality in Sequence-to-Sequence Models
Dennis Ulmer | Dieuwke Hupkes | Elia Bruni

Since their inception, encoder-decoder models have successfully been applied to a wide array of problems in computational linguistics. The most recent successes are predominantly due to the use of different variations of attention mechanisms, but their cognitive plausibility is questionable. In particular, because past representations can be revisited at any point in time, attention-centric methods seem to lack an incentive to build up incrementally more informative representations of incoming sentences. This way of processing stands in stark contrast with the way in which humans are believed to process language : continuously and rapidly integrating new information as it is encountered. In this work, we propose three novel metrics to assess the behavior of RNNs with and without an attention mechanism and identify key differences in the way the different model types process sentences.

pdf bib
On Committee Representations of Adversarial Learning Models for Question-Answer Ranking
Sparsh Gupta | Vitor Carvalho

Adversarial training is a process in Machine Learning that explicitly trains models on adversarial inputs (inputs designed to deceive or trick the learning process) in order to make it more robust or accurate. In this paper we investigate how representing adversarial training models as committees can be used to effectively improve the performance of Question-Answer (QA) Ranking. We start by empirically probing the effects of adversarial training over multiple QA ranking algorithms, including the state-of-the-art Multihop Attention Network model. We evaluate these algorithms on several benchmark datasets and observe that, while adversarial training is beneficial to most baseline algorithms, there are cases where it may lead to overfitting and performance degradation. We investigate the causes of such degradation, and then propose a new representation procedure for this adversarial learning problem, based on committee learning, that not only is capable of consistently improving all baseline algorithms, but also outperforms the previous state-of-the-art algorithm by as much as 6 % in NDCG.

pdf bib
Best Practices for Learning Domain-Specific Cross-Lingual Embeddings
Lena Shakurova | Beata Nyari | Chao Li | Mihai Rotaru

Cross-lingual embeddings aim to represent words in multiple languages in a shared vector space by capturing semantic similarities across languages. They are a crucial component for scaling tasks to multiple languages by transferring knowledge from languages with rich resources to low-resource languages. A common approach to learning cross-lingual embeddings is to train monolingual embeddings separately for each language and learn a linear projection from the monolingual spaces into a shared space, where the mapping relies on a small seed dictionary. While there are high-quality generic seed dictionaries and pre-trained cross-lingual embeddings available for many language pairs, there is little research on how they perform on specialised tasks. In this paper, we investigate the best practices for constructing the seed dictionary for a specific domain. We evaluate the embeddings on the sequence labelling task of Curriculum Vitae parsing and show that the size of a bilingual dictionary, the frequency of the dictionary words in the domain corpora and the source of data (task-specific vs generic) influence performance. We also show that the less training data is available in the low-resource language, the more the construction of the bilingual dictionary matters, and demonstrate that some of the choices are crucial in the zero-shot transfer learning case.

pdf bib
Learning Word Embeddings without Context Vectors
Alexey Zobnin | Evgenia Elistratova

Most word embedding algorithms such as word2vec or fastText construct two sort of vectors : for words and for contexts. Naive use of vectors of only one sort leads to poor results. We suggest using indefinite inner product in skip-gram negative sampling algorithm. This allows us to use only one sort of vectors without loss of quality. Our context-free cf algorithm performs on par with SGNS on word similarity datasets

pdf bib
Modality-based Factorization for Multimodal Fusion
Elham J. Barezi | Pascale Fung

We propose a novel method, Modality-based Redundancy Reduction Fusion (MRRF), for understanding and modulating the relative contribution of each modality in multimodal inference tasks. This is achieved by obtaining an (M+1)-way tensor to consider the high-order relationships between M modalities and the output layer of a neural network model. Applying a modality-based tensor factorization method, which adopts different factors for different modalities, results in removing information present in a modality that can be compensated by other modalities, with respect to model outputs. This helps to understand the relative utility of information in each modality. In addition it leads to a less complicated model with less parameters and therefore could be applied as a regularizer avoiding overfitting. We have applied this method to three different multimodal datasets in sentiment analysis, personality trait recognition, and emotion recognition. We are able to recognize relationships and relative importance of different modalities in these tasks and achieves a 1 % to 4 % improvement on several evaluation measures compared to the state-of-the-art for all three tasks.M+1)-way tensor to consider the high-order relationships between M modalities and the output layer of a neural network model. Applying a modality-based tensor factorization method, which adopts different factors for different modalities, results in removing information present in a modality that can be compensated by other modalities, with respect to model outputs. This helps to understand the relative utility of information in each modality. In addition it leads to a less complicated model with less parameters and therefore could be applied as a regularizer avoiding overfitting. We have applied this method to three different multimodal datasets in sentiment analysis, personality trait recognition, and emotion recognition. We are able to recognize relationships and relative importance of different modalities in these tasks and achieves a 1% to 4% improvement on several evaluation measures compared to the state-of-the-art for all three tasks.

up

pdf (full)
bib (full)
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications
Helen Yannakoudakis | Ekaterina Kochmar | Claudia Leacock | Nitin Madnani | Ildikó Pilán | Torsten Zesch

pdf bib
The many dimensions of algorithmic fairness in educational applications
Anastassia Loukina | Nitin Madnani | Klaus Zechner

The issues of algorithmic fairness and bias have recently featured prominently in many publications highlighting the fact that training the algorithms for maximum performance may often result in predictions that are biased against various groups. Educational applications based on NLP and speech processing technologies often combine multiple complex machine learning algorithms and are thus vulnerable to the same sources of bias as other machine learning systems. Yet such systems can have high impact on people’s lives especially when deployed as part of high-stakes tests. In this paper we discuss different definitions of fairness and possible ways to apply them to educational applications. We then use simulated and real data to consider how test-takers’ native language backgrounds can affect their automated scores on an English language proficiency assessment. We illustrate that total fairness may not be achievable and that different definitions of fairness may require different solutions.

pdf bib
Computationally Modeling the Impact of Task-Appropriate Language Complexity and Accuracy on Human Grading of German EssaysGerman Essays
Zarah Weiss | Anja Riemenschneider | Pauline Schröter | Detmar Meurers

Computational linguistic research on the language complexity of student writing typically involves human ratings as a gold standard. However, educational science shows that teachers find it difficult to identify and cleanly separate accuracy, different aspects of complexity, contents, and structure. In this paper, we therefore explore the use of computational linguistic methods to investigate how task-appropriate complexity and accuracy relate to the grading of overall performance, content performance, and language performance as assigned by teachers. Based on texts written by students for the official school-leaving state examination (Abitur), we show that teachers successfully assign higher language performance grades to essays with higher task-appropriate language complexity and properly separate this from content scores. Yet, accuracy impacts teacher assessment for all grading rubrics, also the content score, overemphasizing the role of accuracy. Our analysis is based on broad computational linguistic modeling of German language complexity and an innovative theory- and data-driven feature aggregation method inferring task-appropriate language complexity.

pdf bib
Analysing Rhetorical Structure as a Key Feature of Summary Coherence
Jan Šnajder | Tamara Sladoljev-Agejev | Svjetlana Kolić Vehovec

We present a model for automatic scoring of coherence based on comparing the rhetorical structure (RS) of college student summaries in L2 (English) against expert summaries. Coherence is conceptualised as a construct consisting of the rhetorical relation and its arguments. Comparison with expert-assigned scores shows that RS scores correlate with both cohesion and coherence. Furthermore, RS scores improve the accuracy of a regression model for cohesion score prediction.

pdf bib
The BEA-2019 Shared Task on Grammatical Error CorrectionBEA-2019 Shared Task on Grammatical Error Correction
Christopher Bryant | Mariano Felice | Øistein E. Andersen | Ted Briscoe

This paper reports on the BEA-2019 Shared Task on Grammatical Error Correction (GEC). As with the CoNLL-2014 shared task, participants are required to correct all types of errors in test data. One of the main contributions of the BEA-2019 shared task is the introduction of a new dataset, the Write&Improve+LOCNESS corpus, which represents a wider range of native and learner English levels and abilities. Another contribution is the introduction of tracks, which control the amount of annotated data available to participants. Systems are evaluated in terms of ERRANT F_0.5, which allows us to report a much wider range of performance statistics. The competition was hosted on Codalab and remains open for further submissions on the blind test set.

pdf bib
A Benchmark Corpus of English Misspellings and a Minimally-supervised Model for Spelling CorrectionEnglish Misspellings and a Minimally-supervised Model for Spelling Correction
Michael Flor | Michael Fried | Alla Rozovskaya

Spelling correction has attracted a lot of attention in the NLP community. However, models have been usually evaluated on artificiallycreated or proprietary corpora. A publiclyavailable corpus of authentic misspellings, annotated in context, is still lacking. To address this, we present and release an annotated data set of 6,121 spelling errors in context, based on a corpus of essays written by English language learners. We also develop a minimallysupervised context-aware approach to spelling correction. It achieves strong results on our data : 88.12 % accuracy. This approach can also train with a minimal amount of annotated data (performance reduced by less than 1 %). Furthermore, this approach allows easy portability to new domains. We evaluate our model on data from a medical domain and demonstrate that it rivals the performance of a model trained and tuned on in-domain data.

pdf bib
Regression or classification? Automated Essay Scoring for NorwegianNorwegian
Stig Johan Berggren | Taraka Rama | Lilja Øvrelid

In this paper we present first results for the task of Automated Essay Scoring for Norwegian learner language. We analyze a number of properties of this task experimentally and assess (i) the formulation of the task as either regression or classification, (ii) the use of various non-neural and neural machine learning architectures with various types of input representations, and (iii) applying multi-task learning for joint prediction of essay scoring and native language identification. We find that a GRU-based attention model trained in a single-task setting performs best at the AES task.

pdf bib
How to account for mispellings : Quantifying the benefit of character representations in neural content scoring models
Brian Riordan | Michael Flor | Robert Pugh

Character-based representations in neural models have been claimed to be a tool to overcome spelling variation in in word token-based input. We examine this claim in neural models for content scoring. We formulate precise hypotheses about the possible effects of adding character representations to word-based models and test these hypotheses on large-scale real world content scoring datasets. We find that, while character representations may provide small performance gains in general, their effectiveness in accounting for spelling variation may be limited. We show that spelling correction can provide larger gains than character representations, and that spelling correction improves the performance of models with character representations. With these insights, we report a new state of the art on the ASAP-SAS content scoring dataset.

pdf bib
The Unreasonable Effectiveness of Transformer Language Models in Grammatical Error Correction
Dimitris Alikaniotis | Vipul Raheja

Recent work on Grammatical Error Correction (GEC) has highlighted the importance of language modeling in that it is certainly possible to achieve good performance by comparing the probabilities of the proposed edits. At the same time, advancements in language modeling have managed to generate linguistic output, which is almost indistinguishable from that of human-generated text. In this paper, we up the ante by exploring the potential of more sophisticated language models in GEC and offer some key insights on their strengths and weaknesses. We show that, in line with recent results in other NLP tasks, Transformer architectures achieve consistently high performance and provide a competitive baseline for future machine learning models.

pdf bib
Erroneous data generation for Grammatical Error Correction
Shuyao Xu | Jiehao Zhang | Jin Chen | Long Qin

It has been demonstrated that the utilization of a monolingual corpus in neural Grammatical Error Correction (GEC) systems can significantly improve the system performance. The previous state-of-the-art neural GEC system is an ensemble of four Transformer models pretrained on a large amount of Wikipedia Edits. The Singsound GEC system follows a similar approach but is equipped with a sophisticated erroneous data generating component. Our system achieved an F0:5 of 66.61 in the BEA 2019 Shared Task : Grammatical Error Correction. With our novel erroneous data generating component, the Singsound neural GEC system yielded an M2 of 63.2 on the CoNLL-2014 benchmark (8.4 % relative improvement over the previous state-of-the-art system).

pdf bib
The LAIX Systems in the BEA-2019 GEC Shared TaskLAIX Systems in the BEA-2019 GEC Shared Task
Ruobing Li | Chuan Wang | Yefei Zha | Yonghong Yu | Shiman Guo | Qiang Wang | Yang Liu | Hui Lin

In this paper, we describe two systems we developed for the three tracks we have participated in the BEA-2019 GEC Shared Task. We investigate competitive classification models with bi-directional recurrent neural networks (Bi-RNN) and neural machine translation (NMT) models. For different tracks, we use ensemble systems to selectively combine the NMT models, the classification models, and some rules, and demonstrate that an ensemble solution can effectively improve GEC performance over single systems. Our GEC systems ranked the first in the Unrestricted Track, and the third in both the Restricted Track and the Low Resource Track.

pdf bib
The BLCU System in the BEA 2019 Shared TaskBLCU System in the BEA 2019 Shared Task
Liner Yang | Chencheng Wang

This paper describes the BLCU Group submissions to the Building Educational Applications (BEA) 2019 Shared Task on Grammatical Error Correction (GEC). The task is to detect and correct grammatical errors that occurred in essays. We participate in 2 tracks including the Restricted Track and the Unrestricted Track. Our system is based on a Transformer model architecture. We integrate many effective methods proposed in recent years. Such as, Byte Pair Encoding, model ensemble, checkpoints average and spell checker. We also corrupt the public monolingual data to further improve the performance of the model. On the test data of the BEA 2019 Shared Task, our system yields F0.5 = 58.62 and 59.50, ranking twelfth and fourth respectively.

pdf bib
Neural and FST-based approaches to grammatical error correctionFST-based approaches to grammatical error correction
Zheng Yuan | Felix Stahlberg | Marek Rei | Bill Byrne | Helen Yannakoudakis

In this paper, we describe our submission to the BEA 2019 shared task on grammatical error correction. We present a system pipeline that utilises both error detection and correction models. The input text is first corrected by two complementary neural machine translation systems : one using convolutional networks and multi-task learning, and another using a neural Transformer-based system. Training is performed on publicly available data, along with artificial examples generated through back-translation. The n-best lists of these two machine translation systems are then combined and scored using a finite state transducer (FST). Finally, an unsupervised re-ranking system is applied to the n-best output of the FST. The re-ranker uses a number of error detection features to re-rank the FST n-best list and identify the final 1-best correction hypothesis. Our system achieves 66.75 % F 0.5 on error correction (ranking 4th), and 82.52 % F 0.5 on token-level error detection (ranking 2nd) in the restricted track of the shared task.

pdf bib
Improving Precision of Grammatical Error Correction with a Cheat Sheet
Mengyang Qiu | Xuejiao Chen | Maggie Liu | Krishna Parvathala | Apurva Patil | Jungyeul Park

In this paper, we explore two approaches of generating error-focused phrases and examine whether these phrases can lead to better performance in grammatical error correction for the restricted track of BEA 2019 Shared Task on GEC. Our results show that phrases directly extracted from GEC corpora outperform phrases from statistical machine translation phrase table by a large margin. Appending error+context phrases to the original GEC corpora yields comparably high precision. We also explore the generation of artificial syntactic error sentences using error+context phrases for the unrestricted track. The additional training data greatly facilitates syntactic error correction (e.g., verb form) and contributes to better overall performance.

pdf bib
Evaluation of automatic collocation extraction methods for language learning
Vishal Bhalla | Klara Klimcikova

A number of methods have been proposed to automatically extract collocations, i.e., conventionalized lexical combinations, from text corpora. However, the attempts to evaluate and compare them with a specific application in mind lag behind. This paper compares three end-to-end resources for collocation learning, all of which used the same corpus but different methods. Adopting a gold-standard evaluation method, the results show that the method of dependency parsing outperforms regex-over-pos in collocation identification. The lexical association measures (AMs) used for collocation ranking perform about the same overall but differently for individual collocation types. Further analysis has also revealed that there are considerable differences between other commonly used AMs.

pdf bib
Anglicized Words and Misspelled Cognates in Native Language Identification
Ilia Markov | Vivi Nastase | Carlo Strapparava

In this paper, we present experiments that estimate the impact of specific lexical choices of people writing in a second language (L2). In particular, we look at misspelled words that indicate lexical uncertainty on the part of the author, and separate them into three categories : misspelled cognates, L2-ed (in our case, anglicized) words, and all other spelling errors. We test the assumption that such errors contain clues about the native language of an essay’s author through the task of native language identification. The results of the experiments show that the information brought by each of these categories is complementary. We also note that while the distribution of such features changes with the proficiency level of the writer, their contribution towards native language identification remains significant at all levels.

pdf bib
Linguistically-Driven Strategy for Concept Prerequisites Learning on ItalianItalian
Alessio Miaschi | Chiara Alzetta | Franco Alberto Cardillo | Felice Dell’Orletta

We present a new concept prerequisite learning method for Learning Object (LO) ordering that exploits only linguistic features extracted from textual educational resources. The method was tested in a cross- and in- domain scenario both for Italian and English. Additionally, we performed experiments based on a incremental training strategy to study the impact of the training set size on the classifier performances. The paper also introduces ITA-PREREQ, to the best of our knowledge the first Italian dataset annotated with prerequisite relations between pairs of educational concepts, and describe the automatic strategy devised to build it.

pdf bib
Grammatical-Error-Aware Incorrect Example Retrieval System for Learners of Japanese as a Second LanguageJapanese as a Second Language
Mio Arai | Masahiro Kaneko | Mamoru Komachi

Existing example retrieval systems do not include grammatically incorrect examples or present only a few examples, if any. Even if a retrieval system has a wide coverage of incorrect examples along with the correct counterpart, learners need to know whether their query includes errors or not. Considering the usability of retrieving incorrect examples, our proposed method uses a large-scale corpus and presents correct expressions along with incorrect expressions using a grammatical error detection system so that the learner do not need to be aware of how to search for the examples. Intrinsic and extrinsic evaluations indicate that our method improves accuracy of example sentence retrieval and quality of learner’s writing.

pdf bib
Toward Automated Content Feedback Generation for Non-native Spontaneous Speech
Su-Youn Yoon | Ching-Ni Hsieh | Klaus Zechner | Matthew Mulholland | Yuan Wang | Nitin Madnani

In this study, we developed an automated algorithm to provide feedback about the specific content of non-native English speakers’ spoken responses. The responses were spontaneous speech, elicited using integrated tasks where the language learners listened to and/or read passages and integrated the core content in their spoken responses. Our models detected the absence of key points considered to be important in a spoken response to a particular test question, based on two different models : (a) a model using word-embedding based content features and (b) a state-of-the art short response scoring engine using traditional n-gram based features. Both models achieved a substantially improved performance over the majority baseline, and the combination of the two models achieved a significant further improvement. In particular, the models were robust to automated speech recognition (ASR) errors, and performance based on the ASR word hypotheses was comparable to that based on manual transcriptions. The accuracy and F-score of the best model for the questions included in the train set were 0.80 and 0.68, respectively. Finally, we discussed possible approaches to generating targeted feedback about the content of a language learner’s response, based on automatically detected missing key points.

pdf bib
Curio SmartChat : A system for Natural Language Question Answering for Self-Paced K-12 LearningSmartChat : A system for Natural Language Question Answering for Self-Paced K-12 Learning
Srikrishna Raamadhurai | Ryan Baker | Vikraman Poduval

During learning, students often have questions which they would benefit from responses to in real time. In class, a student can ask a question to a teacher. During homework, or even in class if the student is shy, it can be more difficult to receive a rapid response. In this work, we introduce Curio SmartChat, an automated question answering system for middle school Science topics. Our system has now been used by around 20,000 students who have so far asked over 100,000 questions. We present data on the challenge created by students’ grammatical errors and spelling mistakes, and discuss our system’s approach and degree of effectiveness at disambiguating questions that the system is initially unsure about. We also discuss the prevalence of student small talk not related to science topics, the pluses and minuses of this behavior, and how a system should respond to these conversational acts. We conclude with discussions and point to directions for potential future work.

pdf bib
Supporting content evaluation of student summaries by Idea Unit embedding
Marcello Gecchele | Hiroaki Yamada | Takenobu Tokunaga | Yasuyo Sawaki

This paper discusses the computer-assisted content evaluation of summaries. We propose a method to make a correspondence between the segments of the source text and its summary. As a unit of the segment, we adopt Idea Unit (IU) which is proposed in Applied Linguistics. Introducing IUs enables us to make a correspondence even for the sentences that contain multiple ideas. The IU correspondence is made based on the similarity between vector representations of IU. An evaluation experiment with two source texts and 20 summaries showed that the proposed method is more robust against rephrased expressions than the conventional ROUGE-based baselines. Also, the proposed method outperformed the baselines in recall. We im-plemented the proposed method in a GUI toolSegment Matcher that aids teachers to estab-lish a link between corresponding IUs acrossthe summary and source text.

pdf bib
On Understanding the Relation between Expert Annotations of Text Readability and Target Reader Comprehension
Sowmya Vajjala | Ivana Lucic

Automatic readability assessment aims to ensure that readers read texts that they can comprehend. However, computational models are typically trained on texts created from the perspective of the text writer, not the target reader. There is little experimental research on the relationship between expert annotations of readability, reader’s language proficiency, and different levels of reading comprehension. To address this gap, we conducted a user study in which over a 100 participants read texts of different reading levels and answered questions created to test three forms of comprehension. Our results indicate that more than readability annotation or reader proficiency, it is the type of comprehension question asked that shows differences between reader responses-inferential questions were difficult for users of all levels of proficiency across reading levels. The data collected from this study will be released with this paper, which will, for the first time, provide a collection of 45 reader bench marked texts to evaluate readability assessment systems developed for adult learners of English. It can also potentially be useful for the development of question generation approaches in intelligent tutoring systems research.

pdf bib
Analyzing Linguistic Complexity and Accuracy in Academic Language Development of German across Elementary and Secondary SchoolGerman across Elementary and Secondary School
Zarah Weiss | Detmar Meurers

We track the development of writing complexity and accuracy in German students’ early academic language development from first to eighth grade. Combining an empirically broad approach to linguistic complexity with the high-quality error annotation included in the Karlsruhe Children’s Text corpus (Lavalley et al. 2015) used, we construct models of German academic language development that successfully identify the student’s grade level. We show that classifiers for the early years rely more on accuracy development, whereas development in secondary school is better characterized by increasingly complex language in all domains : linguistic system, language use, and human sentence processing characteristics. We demonstrate the generalizability and robustness of models using such a broad complexity feature set across writing topics.

pdf bib
Learning Outcomes and Their Relatedness in a Medical Curriculum
Sneha Mondal | Tejas Dhamecha | Shantanu Godbole | Smriti Pathak | Red Mendoza | K Gayathri Wijayarathna | Nabil Zary | Swarnadeep Saha | Malolan Chetlur

A typical medical curriculum is organized in a hierarchy of instructional objectives called Learning Outcomes (LOs) ; a few thousand LOs span five years of study. Gaining a thorough understanding of the curriculum requires learners to recognize and apply related LOs across years, and across different parts of the curriculum. However, given the large scope of the curriculum, manually labeling related LOs is tedious, and almost impossible to scale. In this paper, we build a system that learns relationships between LOs, and we achieve up to human-level performance in the LO relationship extraction task. We then present an application where the proposed system is employed to build a map of related LOs and Learning Resources (LRs) pertaining to a virtual patient case. We believe that our system can help medical students grasp the curriculum better, within classroom as well as in Intelligent Tutoring Systems (ITS) settings.

pdf bib
Equity Beyond Bias in Language Technologies for Education
Elijah Mayfield | Michael Madaio | Shrimai Prabhumoye | David Gerritsen | Brittany McLaughlin | Ezekiel Dixon-Román | Alan W Black

There is a long record of research on equity in schools. As machine learning researchers begin to study fairness and bias in earnest, language technologies in education have an unusually strong theoretical and applied foundation to build on. Here, we introduce concepts from culturally relevant pedagogy and other frameworks for teaching and learning, identifying future work on equity in NLP. We present case studies in a range of topics like intelligent tutoring systems, computer-assisted language learning, automated essay scoring, and sentiment analysis in classrooms, and provide an actionable agenda for research.

pdf bib
From Receptive to Productive : Learning to Use Confusing Words through Automatically Selected Example Sentences
Chieh-Yang Huang | Yi-Ting Huang | MeiHua Chen | Lun-Wei Ku

Knowing how to use words appropriately has been a key to improving language proficiency. Previous studies typically discuss how students learn receptively to select the correct candidate from a set of confusing words in the fill-in-the-blank task where specific context is given. In this paper, we go one step further, assisting students to learn to use confusing words appropriately in a productive task : sentence translation. We leverage the GiveMe-Example system, which suggests example sentences for each confusing word, to achieve this goal. In this study, students learn to differentiate the confusing words by reading the example sentences, and then choose the appropriate word(s) to complete the sentence translation task. Results show students made substantial progress in terms of sentence structure. In addition, highly proficient students better managed to learn confusing words. In view of the influence of the first language on learners, we further propose an effective approach to improve the quality of the suggested sentences.

pdf bib
Automated Essay Scoring with Discourse-Aware Neural Models
Farah Nadeem | Huy Nguyen | Yang Liu | Mari Ostendorf

Automated essay scoring systems typically rely on hand-crafted features to predict essay quality, but such systems are limited by the cost of feature engineering. Neural networks offer an alternative to feature engineering, but they typically require more annotated data. This paper explores network structures, contextualized embeddings and pre-training strategies aimed at capturing discourse characteristics of essays. Experiments on three essay scoring tasks show benefits from all three strategies in different combinations, with simpler architectures being more effective when less training data is available.

pdf bib
Rubric Reliability and Annotation of Content and Argument in Source-Based Argument Essays
Yanjun Gao | Alex Driban | Brennan Xavier McManus | Elena Musi | Patricia Davies | Smaranda Muresan | Rebecca J. Passonneau

We present a unique dataset of student source-based argument essays to facilitate research on the relations between content, argumentation skills, and assessment. Two classroom writing assignments were given to college students in a STEM major, accompanied by a carefully designed rubric. The paper presents a reliability study of the rubric, showing it to be highly reliable, and initial annotation on content and argumentation annotation of the essays.

up

pdf (full)
bib (full)
Proceedings of the 6th Workshop on Argument Mining

pdf bib
Proceedings of the 6th Workshop on Argument Mining
Benno Stein | Henning Wachsmuth

pdf bib
A Cascade Model for Proposition Extraction in Argumentation
Yohan Jo | Jacky Visser | Chris Reed | Eduard Hovy

We present a model to tackle a fundamental but understudied problem in computational argumentation : proposition extraction. Propositions are the basic units of an argument and the primary building blocks of most argument mining systems. However, they are usually substituted by argumentative discourse units obtained via surface-level text segmentation, which may yield text segments that lack semantic information necessary for subsequent argument mining processes. In contrast, our cascade model aims to extract complete propositions by handling anaphora resolution, text segmentation, reported speech, questions, imperatives, missing subjects, and revision. We formulate each task as a computational problem and test various models using a corpus of the 2016 U.S. presidential debates. We show promising performance for some tasks and discuss main challenges in proposition extraction.

pdf bib
Dissecting Content and Context in Argumentative Relation Analysis
Juri Opitz | Anette Frank

When assessing relations between argumentative units (e.g., support or attack), computational systems often exploit disclosing indicators or markers that are not part of elementary argumentative units (EAUs) themselves, but are gained from their context (position in paragraph, preceding tokens, etc.). We show that this dependency is much stronger than previously assumed. In fact, we show that by completely masking the EAU text spans and only feeding information from their context, a competitive system may function even better. We argue that an argument analysis system that relies more on discourse context than the argument’s content is unsafe, since it can easily be tricked. To alleviate this issue, we separate argumentative units from their context such that the system is forced to model and rely on an EAU’s content. We show that the resulting classification system is more robust, and argue that such models are better suited for predicting argumentative relations across documents.

pdf bib
The Swedish PoliGraph : A Semantic Graph for Argument Mining of Swedish Parliamentary DataSwedish PoliGraph: A Semantic Graph for Argument Mining of Swedish Parliamentary Data
Stian Rødven Eide

As part of a larger project on argument mining of Swedish parliamentary data, we have created a semantic graph that, together with named entity recognition and resolution (NER), should make it easier to establish connections between arguments in a given debate. The graph is essentially a semantic database that keeps track of Members of Parliament (MPs), in particular their presence in the parliament and activity in debates, but also party affiliation and participation in commissions. The hope is that the Swedish PoliGraph will enable us to perform named entity resolution on debates in the Swedish parliament with a high accuracy, with the aim of determining to whom an argument is directed.

pdf bib
Towards Effective Rebuttal : Listening Comprehension Using Corpus-Wide Claim Mining
Tamar Lavee | Matan Orbach | Lili Kotlerman | Yoav Kantor | Shai Gretz | Lena Dankin | Michal Jacovi | Yonatan Bilu | Ranit Aharonov | Noam Slonim

Engaging in a live debate requires, among other things, the ability to effectively rebut arguments claimed by your opponent. In particular, this requires identifying these arguments. Here, we suggest doing so by automatically mining claims from a corpus of news articles containing billions of sentences, and searching for them in a given speech. This raises the question of whether such claims indeed correspond to those made in spoken speeches. To this end, we collected a large dataset of 400 speeches in English discussing 200 controversial topics, mined claims for each topic, and asked annotators to identify the mined claims mentioned in each speech. Results show that in the vast majority of speeches debaters indeed make use of such claims. In addition, we present several baselines for the automatic detection of mined claims in speeches, forming the basis for future work. All collected data is freely available for research.

pdf bib
Is It Worth the Attention? A Comparative Evaluation of Attention Layers for Argument Unit Segmentation
Maximilian Spliethöver | Jonas Klaff | Hendrik Heuer

Attention mechanisms have seen some success for natural language processing downstream tasks in recent years and generated new state-of-the-art results. A thorough evaluation of the attention mechanism for the task of Argumentation Mining is missing. With this paper, we report a comparative evaluation of attention layers in combination with a bidirectional long short-term memory network, which is the current state-of-the-art approach for the unit segmentation task. We also compare sentence-level contextualized word embeddings to pre-generated ones. Our findings suggest that for this task, the additional attention layer does not improve the performance. In most cases, contextualized embeddings do also not show an improvement on the score achieved by pre-defined embeddings.

pdf bib
The Utility of Discourse Parsing Features for Predicting Argumentation Structure
Freya Hewett | Roshan Prakash Rane | Nina Harlacher | Manfred Stede

Research on argumentation mining from text has frequently discussed relationships to discourse parsing, but few empirical results are available so far. One corpus that has been annotated in parallel for argumentation structure and for discourse structure (RST, SDRT) are the ‘argumentative microtexts’ (Peldszus and Stede, 2016a). While results on perusing the gold RST annotations for predicting argumentation have been published (Peldszus and Stede, 2016b), the step to automatic discourse parsing has not yet been taken. In this paper, we run various discourse parsers (RST, PDTB) on the corpus, compare their results to the gold annotations (for RST) and then assess the contribution of automatically-derived discourse features for argumentation parsing. After reproducing the state-of-the-art Evidence Graph model from Afantenos et al. (2018) for the microtexts, we find that PDTB features can indeed improve its performance.

pdf bib
Categorizing Comparative Sentences
Alexander Panchenko | Alexander Bondarenko | Mirco Franzek | Matthias Hagen | Chris Biemann

We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e.g., Python has better NLP libraries than MATLAB Python, better, MATLAB). To this end, we manually annotate 7,199 sentences for 217 distinct target item pairs from several domains (27 % of the sentences contain an oriented comparison in the sense of better or worse). A gradient boosting model based on pre-trained sentence embeddings reaches an F1 score of 85 % in our experimental evaluation. The model can be used to extract comparative sentences for pro / con argumentation in comparative / argument search engines or debating technologies.

pdf bib
Ranking Passages for Argument Convincingness
Peter Potash | Adam Ferguson | Timothy J. Hazen

In data ranking applications, pairwise annotation is often more consistent than cardinal annotation for learning ranking models. We examine this in a case study on ranking text passages for argument convincingness. Our task is to choose text passages that provide the highest-quality, most-convincing arguments for opposing sides of a topic. Using data from a deployed system within the Bing search engine, we construct a pairwise-labeled dataset for argument convincingness that is substantially more comprehensive in topical coverage compared to existing public resources. We detail the process of extracting topical passages for queries submitted to a search engine, creating annotated sets of passages aligned to different stances on a topic, and assessing argument convincingness of passages using pairwise annotation. Using a state-of-the-art convincingness model, we evaluate several methods for using pairwise-annotated data examples to train models for ranking passages. Our results show pairwise training outperforms training that regresses to a target score for each passage. Our results also show a simple ‘win-rate’ score is a better regression target than the previously proposed page-rank target. Lastly, addressing the need to filter noisy crowd-sourced annotations when constructing a dataset, we show that filtering for transitivity within pairwise annotations is more effective than filtering based on annotation confidence measures for individual examples.

up

pdf (full)
bib (full)
Proceedings of the Fourth Arabic Natural Language Processing Workshop

pdf bib
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Wassim El-Hajj | Lamia Hadrich Belguith | Fethi Bougares | Walid Magdy | Imed Zitouni | Nadi Tomeh | Mahmoud El-Haj | Wajdi Zaghouani

pdf bib
Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings
Marimuthu Kalimuthu | Michael Barz | Daniel Sonntag

We study the problem of incremental domain adaptation of a generic neural machine translation model with limited resources (e.g., budget and time) for human translations or model training. In this paper, we propose a novel query strategy for selecting unlabeled samples from a new domain based on sentence embeddings for Arabic. We accelerate the fine-tuning process of the generic model to the target domain. Specifically, our approach estimates the informativeness of instances from the target domain by comparing the distance of their sentence embeddings to embeddings from the generic domain. We perform machine translation experiments (Ar-to-En direction) for comparing a random sampling baseline with our new approach, similar to active learning, using two small update sets for simulating the work of human translators. For the prescribed setting we can save more than 50 % of the annotation costs without loss in quality, demonstrating the effectiveness of our approach.

pdf bib
POS Tagging for Improving Code-Switching Identification in ArabicPOS Tagging for Improving Code-Switching Identification in Arabic
Mohammed Attia | Younes Samih | Ali Elkahky | Hamdy Mubarak | Ahmed Abdelali | Kareem Darwish

When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language. This paper explores the possibility of utilizing this pattern in improving code-switching identification between Modern Standard Arabic (MSA) and Egyptian Arabic (EA). We try to answer the question of how strong is the POS signal in word-level code-switching identification. We build a deep learning model enriched with linguistic features (including POS tags) that outperforms the state-of-the-art results by 1.9 % on the development set and 1.0 % on the test set. We also show that in intra-sentential code-switching, the selection of lexical items is constrained by POS categories, where function words tend to come more often from the dialectal language while the majority of content words come from the standard language.

pdf bib
Syntax-Ignorant N-gram Embeddings for Sentiment Analysis of Arabic DialectsArabic Dialects
Hala Mulki | Hatem Haddad | Mourad Gridach | Ismail Babaoğlu

Arabic sentiment analysis models have employed compositional embedding features to represent the Arabic dialectal content. These embeddings are usually composed via ordered, syntax-aware composition functions and learned within deep neural frameworks. With the free word order and the varying syntax nature across the different Arabic dialects, a sentiment analysis system developed for one dialect might not be efficient for the others. Here we present syntax-ignorant n-gram embeddings to be used in sentiment analysis of several Arabic dialects. The proposed embeddings were composed and learned using an unordered composition function and a shallow neural model. Five datasets of different dialects were used to evaluate the produced embeddings in the sentiment analysis task. The obtained results revealed that, our syntax-ignorant embeddings could outperform word2vec model and doc2vec both variant models in addition to hand-crafted system baselines, while a competent performance was noticed towards baseline systems that adopted more complicated neural architectures.

pdf bib
Constrained Sequence-to-sequence Semitic Root Extraction for Enriching Word EmbeddingsSemitic Root Extraction for Enriching Word Embeddings
Ahmed El-Kishky | Xingyu Fu | Aseel Addawood | Nahil Sobh | Clare Voss | Jiawei Han

In this paper, we tackle the problem of root extraction from words in the Semitic language family. A challenge in applying natural language processing techniques to these languages is the data sparsity problem that arises from their rich internal morphology, where the substructure is inherently non-concatenative and morphemes are interdigitated in word formation. While previous automated methods have relied on human-curated rules or multiclass classification, they have not fully leveraged the various combinations of regular, sequential concatenative morphology within the words and the internal interleaving within templatic stems of roots and patterns. To address this, we propose a constrained sequence-to-sequence root extraction method. Experimental results show our constrained model outperforms a variety of methods at root extraction. Furthermore, by enriching word embeddings with resulting decompositions, we show improved results on word analogy, word similarity, and language modeling tasks.

pdf bib
En-Ar Bilingual Word Embeddings without Word Alignment : Factors Effects
Taghreed Alqaisi | Simon O’Keefe

This paper introduces the first attempt to investigate morphological segmentation on En-Ar bilingual word embeddings using bilingual word embeddings model without word alignment (BilBOWA). We investigate the effect of sentence length and embedding size on the learning process. Our experiment shows that using the D3 segmentation scheme improves the accuracy of learning bilingual word embeddings up to 10 percentage points compared to the ATB and D0 schemes in all different training settings.

pdf bib
Assessing Arabic Weblog Credibility via Deep Co-learningArabic Weblog Credibility via Deep Co-learning
Chadi Helwe | Shady Elbassuoni | Ayman Al Zaatari | Wassim El-Hajj

Assessing the credibility of online content has garnered a lot of attention lately. We focus on one such type of online content, namely weblogs or blogs for short. Some recent work attempted the task of automatically assessing the credibility of blogs, typically via machine learning. However, in the case of Arabic blogs, there are hardly any datasets available that can be used to train robust machine learning models for this difficult task. To overcome the lack of sufficient training data, we propose deep co-learning, a semi-supervised end-to-end deep learning approach to assess the credibility of Arabic blogs. In deep co-learning, multiple weak deep neural network classifiers are trained using a small labeled dataset, and each using a different view of the data. Each one of these classifiers is then used to classify unlabeled data, and its prediction is used to train the other classifiers in a semi-supervised fashion. We evaluate our deep co-learning approach on an Arabic blogs dataset, and we report significant improvements in performance compared to many baselines including fully-supervised deep learning models as well as ensemble models.

pdf bib
Construction and Annotation of the Jordan Comprehensive Contemporary Arabic Corpus (JCCA)Jordan Comprehensive Contemporary Arabic Corpus (JCCA)
Majdi Sawalha | Faisal Alshargi | Abdallah AlShdaifat | Sane Yagi | Mohammad A. Qudah

To compile a modern dictionary that catalogues the words in currency, and to study linguistic patterns in the contemporary language, it is necessary to have a corpus of authentic texts that reflect current usage of the language. Although there are numerous Arabic corpora, none claims to be representative of the language in terms of the combination of geographical region, genre, subject matter, mode, and medium. This paper describes a 100-million-word corpus that takes the British National Corpus (BNC) as a model. The aim of the corpus is to be balanced, annotated, comprehensive, and representative of contemporary Arabic as written and spoken in Arab countries today. It will be different from most others in not being heavily-dominated by the news or in mixing the classical with the modern. In this paper is an outline of the methodology adopted for the design, construction, and annotation of this corpus. DIWAN (Alshargi and Rambow, 2015) was used to annotate a one-million-word snapshot of the corpus. DIWAN is a dialectal word annotation tool, but we upgraded it by adding a new tag-set that is based on traditional Arabic grammar and by adding the roots and morphological patterns of nouns and verbs. Moreover, the corpus we constructed covers the major spoken varieties of Arabic.

pdf bib
Mazajak : An Online Arabic Sentiment AnalyserMazajak: An Online Arabic Sentiment Analyser
Ibrahim Abu Farha | Walid Magdy

Sentiment analysis (SA) is one of the most useful natural language processing applications. Literature is flooding with many papers and systems addressing this task, but most of the work is focused on English. In this paper, we present Mazajak, an online system for Arabic SA. The system is based on a deep learning model, which achieves state-of-the-art results on many Arabic dialect datasets including SemEval 2017 and ASTD. The availability of such system should assist various applications and research that rely on sentiment analysis as a tool.

pdf bib
The MADAR Shared Task on Arabic Fine-Grained Dialect IdentificationMADAR Shared Task on Arabic Fine-Grained Dialect Identification
Houda Bouamor | Sabit Hassan | Nizar Habash

In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification. This shared task was organized as part of The Fourth Arabic Natural Language Processing Workshop, collocated with ACL 2019. The shared task includes two subtasks : the MADAR Travel Domain Dialect Identification subtask (Subtask 1) and the MADAR Twitter User Dialect Identification subtask (Subtask 2). This shared task is the first to target a large set of dialect labels at the city and country levels. The data for the shared task was created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project. A total of 21 teams from 15 countries participated in the shared task.

pdf bib
ZCU-NLP at MADAR 2019 : Recognizing Arabic DialectsZCU-NLP at MADAR 2019: Recognizing Arabic Dialects
Pavel Přibáň | Stephen Taylor

In this paper, we present our systems for the MADAR Shared Task : Arabic Fine-Grained Dialect Identification. The shared task consists of two subtasks. The goal of Subtask 1 (S-1) is to detect an Arabic city dialect in a given text and the goal of Subtask2 (S-2) is to predict the country of origin of a Twitter user by using tweets posted by the user. In S-1, our proposed systems are based on language modelling. We use language models to extract features that are later used as an input for other machine learning algorithms. We also experiment with recurrent neural networks (RNN), but these experiments showed that simpler machine learning algorithms are more successful. Our system achieves 0.658 macro F1-score and our rank is 6th out of 19 teams in S-1 and 7th in S-2 with 0.475 macro F1-score.

pdf bib
Simple But Not Nave : Fine-Grained Arabic Dialect Identification Using Only N-GramsArabic Dialect Identification Using Only N-Grams
Sohaila Eltanbouly | May Bashendy | Tamer Elsayed

This paper presents the participation of Qatar University team in MADAR shared task, which addresses the problem of sentence-level fine-grained Arabic Dialect Identification over 25 different Arabic dialects in addition to the Modern Standard Arabic. Arabic Dialect Identification is not a trivial task since different dialects share some features, e.g., utilizing the same character set and some vocabularies. We opted to adopt a very simple approach in terms of extracted features and classification models ; we only utilize word and character n-grams as features, and Na ve Bayes models as classifiers. Surprisingly, the simple approach achieved non-na ve performance. The official results, reported on a held-out testing set, show that the dialect of a given sentence can be identified at an accuracy of 64.58 % by our best submitted run.

pdf bib
LIUM-MIRACL Participation in the MADAR Arabic Dialect Identification Shared TaskLIUM-MIRACL Participation in the MADAR Arabic Dialect Identification Shared Task
Saméh Kchaou | Fethi Bougares | Lamia Hadrich-Belguith

This paper describes the joint participation of the LIUM and MIRACL Laboratories at the Arabic dialect identification challenge of the MADAR Shared Task (Bouamor et al., 2019) conducted during the Fourth Arabic Natural Language Processing Workshop (WANLP 2019). We participated to the Travel Domain Dialect Identification subtask. We built several systems and explored different techniques including conventional machine learning methods and deep learning algorithms. Deep learning approaches did not perform well on this task. We experimented several classification systems and we were able to identify the dialect of an input sentence with an F1-score of 65.41 % on the official test set using only the training data supplied by the shared task organizers.

pdf bib
MICHAEL : Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)
Dhaou Ghoul | Gaël Lejeune

We present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID). MICHAEL uses simple character-level features in order to perform a pre-processing free classification. More precisely, Character N-grams extracted from the original sentences are used to train a Multinomial Naive Bayes classifier. This system achieved an official score (accuracy) of 53.25 % with 1 = N=3 but showed a much better result with character 4-grams (62.17 % accuracy).

pdf bib
Arabic Dialect Identification for Travel and Twitter TextArabic Dialect Identification for Travel and Twitter Text
Pruthwik Mishra | Vandan Mujadia

This paper presents the results of the experiments done as a part of MADAR Shared Task in WANLP 2019 on Arabic Fine-Grained Dialect Identification. Dialect Identification is one of the prominent tasks in the field of Natural language processing where the subsequent language modules can be improved based on it. We explored the use of different features like char, word n-gram, language model probabilities, etc on different classifiers. Results show that these features help to improve dialect classification accuracy. Results also show that traditional machine learning classifier tends to perform better when compared to neural network models on this task in a low resource setting.

pdf bib
Mawdoo3 AI at MADAR Shared Task : Arabic Tweet Dialect IdentificationAI at MADAR Shared Task: Arabic Tweet Dialect Identification
Bashar Talafha | Wael Farhan | Ahmed Altakrouri | Hussein Al-Natsheh

Arabic dialect identification is an inherently complex problem, as Arabic dialect taxonomy is convoluted and aims to dissect a continuous space rather than a discrete one. In this work, we present machine and deep learning approaches to predict 21 fine-grained dialects form a set of given tweets per user. We adopted numerous feature extraction methods most of which showed improvement in the final model, such as word embedding, Tf-idf, and other tweet features. Our results show that a simple LinearSVC can outperform any complex deep learning model given a set of curated features. With a relatively complex user voting mechanism, we were able to achieve a Macro-Averaged F1-score of 71.84 % on MADAR shared subtask-2. Our best submitted model ranked second out of all participating teams.

pdf bib
The SMarT Classifier for Arabic Fine-Grained Dialect IdentificationSMarT Classifier for Arabic Fine-Grained Dialect Identification
Karima Meftouh | Karima Abidi | Salima Harrat | Kamel Smaili

This paper describes the approach adopted by the SMarT research group to build a dialect identification system in the framework of the Madar shared task on Arabic fine-grained dialect identification. We experimented several approaches, but we finally decided to use a Multinomial Naive Bayes classifier based on word and character ngrams in addition to the language model probabilities. We achieved a score of 67.73 % in terms of Macro accuracy and a macro-averaged F1-score of 67.31 %

pdf bib
JHU System Description for the MADAR Arabic Dialect Identification Shared TaskJHU System Description for the MADAR Arabic Dialect Identification Shared Task
Tom Lippincott | Pamela Shapiro | Kevin Duh | Paul McNamee

Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models. We found several of these techniques provided small boosts in performance, though a simple character-level language model was a strong baseline, and a lower-order LM achieved best performance on Subtask 2. Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel. This suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning.

pdf bib
A Character Level Convolutional BiLSTM for Arabic Dialect IdentificationBiLSTM for Arabic Dialect Identification
Mohamed Elaraby | Ahmed Zahran

In this paper, we describe CU-RAISA teamcontribution to the 2019Madar shared task2, which focused on Twitter User fine-grained dialect identification. Among par-ticipating teams, our system ranked the4th(with 61.54 %) F1-Macro measure. Our sys-tem is trained using a character level convo-lutional bidirectional long-short-term memorynetwork trained on 2k users’ data. We showthat training on concatenated user tweets asinput is further superior to training on usertweets separately and assign user’s label on themode of user’s tweets’ predictions.

pdf bib
Team JUST at the MADAR Shared Task on Arabic Fine-Grained Dialect IdentificationJUST at the MADAR Shared Task on Arabic Fine-Grained Dialect Identification
Bashar Talafha | Ali Fadel | Mahmoud Al-Ayyoub | Yaser Jararweh | Mohammad AL-Smadi | Patrick Juola

In this paper, we describe our team’s effort on the MADAR Shared Task on Arabic Fine-Grained Dialect Identification. The task requires building a system capable of differentiating between 25 different Arabic dialects in addition to MSA. Our approach is simple. After preprocessing the data, we use Data Augmentation (DA) to enlarge the training data six times. We then build a language model and extract n-gram word-level and character-level TF-IDF features and feed them into an MNB classifier. Despite its simplicity, the resulting model performs really well producing the 4th highest F-measure and region-level accuracy and the 5th highest precision, recall, city-level accuracy and country-level accuracy among the participating teams.

pdf bib
QC-GO Submission for MADAR Shared Task : Arabic Fine-Grained Dialect IdentificationQC-GO Submission for MADAR Shared Task: Arabic Fine-Grained Dialect Identification
Younes Samih | Hamdy Mubarak | Ahmed Abdelali | Mohammed Attia | Mohamed Eldesouki | Kareem Darwish

This paper describes the QC-GO team submission to the MADAR Shared Task Subtask 1 (travel domain dialect identification) and Subtask 2 (Twitter user location identification). In our participation in both subtasks, we explored a number of approaches and system combinations to obtain the best performance for both tasks. These include deep neural nets and heuristics. Since individual approaches suffer from various shortcomings, the combination of different approaches was able to fill some of these gaps. Our system achieves F1-Scores of 66.1 % and 67.0 % on the development sets for Subtasks 1 and 2 respectively.

up

pdf (full)
bib (full)
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

pdf bib
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
Nina Tahmasebi | Lars Borin | Adam Jatowt | Yang Xu

pdf bib
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
Ekaterina Vylomova | Sean Murphy | Nicholas Haslam

The paper focuses on diachronic evaluation of semantic changes of harm-related concepts in psychology. More specifically, we investigate a hypothesis that certain concepts such as addiction, bullying, harassment, prejudice, and trauma became broader during the last four decades. We evaluate semantic changes using two models : an LSA-based model from Sagi et al. (2009) and a diachronic adaptation of word2vec from Hamilton et al. (2016), that are trained on a large corpus of journal abstracts covering the period of 1980 2019. Several concepts showed evidence of broadening. Addiction moved from physiological dependency on a substance to include psychological dependency on gaming and the Internet. Similarly, harassment and trauma shifted towards more psychological meanings. On the other hand, bullying has transformed into a more victim-related concept and expanded to new areas such as workplaces.

pdf bib
GASC : Genre-Aware Semantic Change for Ancient GreekGASC: Genre-Aware Semantic Change for Ancient Greek
Valerio Perrone | Marco Palma | Simon Hengchen | Alessandro Vatri | Jim Q. Smith | Barbara McGillivray

Word meaning changes over time, depending on linguistic and extra-linguistic factors. Associating a word’s correct meaning in its historical context is a central challenge in diachronic research, and is relevant to a range of NLP tasks, including information retrieval and semantic search in historical texts. Bayesian models for semantic change have emerged as a powerful tool to address this challenge, providing explicit and interpretable representations of semantic change phenomena. However, while corpora typically come with rich metadata, existing models are limited by their inability to exploit contextual information (such as text genre) beyond the document time-stamp. This is particularly critical in the case of ancient languages, where lack of data and long diachronic span make it harder to draw a clear distinction between polysemy (the fact that a word has several senses) and semantic change (the process of acquiring, losing, or changing senses), and current systems perform poorly on these languages. We develop GASC, a dynamic semantic change model that leverages categorical metadata about the texts’ genre to boost inference and uncover the evolution of meanings in Ancient Greek corpora. In a new evaluation framework, our model achieves improved predictive performance compared to the state of the art.

pdf bib
A Method to Automatically Identify Diachronic Variation in Collocations.
Marcos Garcia | Marcos García Salido

This paper introduces a novel method to track collocational variations in diachronic corpora that can identify several changes undergone by these phraseological combinations and to propose alternative solutions found in later periods. The strategy consists of extracting syntactically-related candidates of collocations and ranking them using statistical association measures. Then, starting from the first period of the corpus, the system tracks each combination over time, verifying different types of historical variation such as the loss of one or both lemmas, the disappearance of the collocation, or its diachronic frequency trend. Using a distributional semantics strategy, it also proposes other linguistic structures which convey similar meanings to those extinct collocations. A case study on historical corpora of Portuguese and Spanish shows that the system speeds up and facilitates the finding of some diachronic changes and phraseological shifts that are harder to identify without using automated methods.

pdf bib
Written on Leaves or in Stones? : Computational Evidence for the Era of Authorship of Old Thai ProseThai Prose
Attapol Rutherford | Santhawat Thanyawong

We aim to provide computational evidence for the era of authorship of two important old Thai texts : Traiphumikatha and Pumratchatham. The era of authorship of these two books is still an ongoing debate among Thai literature scholars. Analysis of old Thai texts present a challenge for standard natural language processing techniques, due to the lack of corpora necessary for building old Thai word and syllable segmentation. We propose an accurate and interpretable model to classify each segment as one of the three eras of authorship (Sukhothai, Ayuddhya, or Rattanakosin) without sophisticated linguistic preprocessing. Contrary to previous hypotheses, our model suggests that both books were written during the Sukhothai era. Moreover, the second half of the Pumratchtham is uncharacteristic of the Sukhothai era, which may have confounded literary scholars in the past. Further, our model reveals that the most indicative linguistic changes stem from unidirectional grammaticalized words and polyfunctional words, which show up as most dominant features in the model.

pdf bib
Identifying Temporal Trends Based on Perplexity and Clustering : Are We Looking at Language Change?
Sidsel Boldsen | Manex Agirrezabal | Patrizia Paggio

In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.

pdf bib
Predicting Historical Phonetic Features using Deep Neural Networks : A Case Study of the Phonetic System of Proto-Indo-EuropeanProto-Indo-European
Frederik Hartmann

Traditional historical linguistics lacks the possibility to empirically assess its assumptions regarding the phonetic systems of past languages and language stages since most current methods rely on comparative tools to gain insights into phonetic features of sounds in proto- or ancestor languages. The paper at hand presents a computational method based on deep neural networks to predict phonetic features of historical sounds where the exact quality is unknown and to test the overall coherence of reconstructed historical phonetic features. The method utilizes the principles of coarticulation, local predictability and statistical phonological constraints to predict phonetic features by the features of their immediate phonetic environment. The validity of this method will be assessed using New High German phonetic data and its specific application to diachronic linguistics will be demonstrated in a case study of the phonetic system Proto-Indo-European.

pdf bib
ParHistVis : Visualization of Parallel Multilingual Historical DataParHistVis: Visualization of Parallel Multilingual Historical Data
Aikaterini-Lida Kalouli | Rebecca Kehlbeck | Rita Sevastjanova | Katharina Kaiser | Georg A. Kaiser | Miriam Butt

The study of language change through parallel corpora can be advantageous for the analysis of complex interactions between time, text domain and language. Often, those advantages can not be fully exploited due to the sparse but high-dimensional nature of such historical data. To tackle this challenge, we introduce ParHistVis : a novel, free, easy-to-use, interactive visualization tool for parallel, multilingual, diachronic and synchronic linguistic data. We illustrate the suitability of the components of the tool based on a use case of word order change in Romance wh-interrogatives.

pdf bib
DiaHClust : an Iterative Hierarchical Clustering Approach for Identifying Stages in Language ChangeDiaHClust: an Iterative Hierarchical Clustering Approach for Identifying Stages in Language Change
Christin Schätzle | Hannah Booth

Language change is often assessed against a set of pre-determined time periods in order to be able to trace its diachronic trajectory. This is problematic, since a pre-determined periodization might obscure significant developments and lead to false assumptions about the data. Moreover, these time periods can be based on factors which are either arbitrary or non-linguistic, e.g., dividing the corpus data into equidistant stages or taking into account language-external events. Addressing this problem, in this paper we present a data-driven approach to periodization : ‘DiaHClust’. DiaHClust is based on iterative hierarchical clustering and offers a multi-layered perspective on change from text-level to broader time periods. We demonstrate the usefulness of DiaHClust via a case study investigating syntactic change in Icelandic, modelling the syntactic system of the language in terms of vectors of syntactic change.

pdf bib
Times Are Changing : Investigating the Pace of Language Change in Diachronic Word Embeddings
Stephanie Brandl | David Lassner

We propose Word Embedding Networks, a novel method that is able to learn word embeddings of individual data slices while simultaneously aligning and ordering them without feeding temporal information a priori to the model. This gives us the opportunity to analyse the dynamics in word embeddings on a large scale in a purely data-driven manner. In experiments on two different newspaper corpora, the New York Times (English) and die Zeit (German), we were able to show that time actually determines the dynamics of semantic change. However, there is by no means a uniform evolution, but instead times of faster and times of slower change.

pdf bib
Studying Laws of Semantic Divergence across Languages using Cognate Sets
Ana Uban | Alina Maria Ciobanu | Liviu P. Dinu

Semantic divergence in related languages is a key concern of historical linguistics. Intra-lingual semantic shift has been previously studied in computational linguistics, but this can only provide a limited picture of the evolution of word meanings, which often develop in a multilingual environment. In this paper we investigate semantic change across languages by measuring the semantic distance of cognate words in multiple languages. By comparing current meanings of cognates in different languages, we hope to uncover information about their previous meanings, and about how they diverged in their respective languages from their common original etymon. We further study the properties of their semantic divergence, by analyzing how the features of words such as frequency and polysemy are related to the divergence in their meaning, and thus make the first steps towards formulating laws of cross-lingual semantic change.

pdf bib
Detecting Syntactic Change Using a Neural Part-of-Speech Tagger
William Merrill | Gigi Stark | Robert Frank

We train a diachronic long short-term memory (LSTM) part-of-speech tagger on a large corpus of American English from the 19th, 20th, and 21st centuries. We analyze the tagger’s ability to implicitly learn temporal structure between years, and the extent to which this knowledge can be transferred to date new sentences. The learned year embeddings show a strong linear correlation between their first principal component and time. We show that temporal information encoded in the model can be used to predict novel sentences’ years of composition relatively well. Comparisons to a feedforward baseline suggest that the temporal change learned by the LSTM is syntactic rather than purely lexical. Thus, our results suggest that our tagger is implicitly learning to model syntactic change in American English over the course of the 19th, 20th, and early 21st centuries.

pdf bib
Spatio-Temporal Prediction of Dialectal Variant Usage
Péter Jeszenszky | Panote Siriaraya | Philipp Stoeckle | Adam Jatowt

The distribution of most dialectal variants have not only spatial but also temporal patterns. Based on the ‘apparent time hypothesis’, much of dialect change is happening through younger speakers accepting innovations. Thus, synchronic diversity can be interpreted diachronically. With the assumption of the ‘contact effect’, i.e. contact possibility (contact and isolation) between speaker communities being responsible for language change, and the apparent time hypothesis, we aim to predict the usage of dialectal variants. In this paper we model the contact possibility based on two of the most important factors in sociolinguistics to be affecting language change : age and distance. The first steps of the approach involve modeling contact possibility using a logistic predictor, taking the age of respondents into account. We test the global, and the local role of age for variation where the local level means spatial subsets around each survey site, chosen based on k nearest neighbors. The prediction approach is tested on Swiss German syntactic survey data, featuring multiple respondents from different age cohorts at survey sites. The results show the relative success of the logistic prediction approach and the limitations of the method, therefore further proposals are made to develop the methodology.global, and the local role of age for variation where the local level means spatial subsets around each survey site, chosen based on k nearest neighbors. The prediction approach is tested on Swiss German syntactic survey data, featuring multiple respondents from different age cohorts at survey sites. The results show the relative success of the logistic prediction approach and the limitations of the method, therefore further proposals are made to develop the methodology.

pdf bib
One-to-X Analogical Reasoning on Word Embeddings : a Case for Diachronic Armed Conflict Prediction from News TextsX Analogical Reasoning on Word Embeddings: a Case for Diachronic Armed Conflict Prediction from News Texts
Andrey Kutuzov | Erik Velldal | Lilja Øvrelid

We extend the well-known word analogy task to a one-to-X formulation, including one-to-none cases, when no correct answer exists. The task is cast as a relation discovery problem and applied to historical armed conflicts datasets, attempting to predict new relations of type ‘location : armed-group’ based on data about past events. As the source of semantic information, we use diachronic word embedding models trained on English news texts. A simple technique to improve diachronic performance in such task is demonstrated, using a threshold based on a function of cosine distance to decrease the number of false positives ; this approach is shown to be beneficial on two different corpora. Finally, we publish a ready-to-use test set for one-to-X analogy evaluation on historical armed conflicts data.

pdf bib
Semantic Change in the Language of UK Parliamentary DebatesUK Parliamentary Debates
Gavin Abercrombie | Riza Batista-Navarro

We investigate changes in the meanings of words used in the UK Parliament across two different epochs. We use word embeddings to explore changes in the distribution of words of interest and uncover words that appear to have undergone semantic transformation in the intervening period, and explore different ways of obtaining target words for this purpose. We find that semantic changes are generally in line with those found in other corpora, and little evidence that parliamentary language is more static than general English. It also seems that words with senses that have been recorded in the dictionary as having fallen into disuse do not undergo semantic changes in this domain.

pdf bib
Semantic Change and Emerging Tropes In a Large Corpus of New High German PoetryNew High German Poetry
Thomas Haider | Steffen Eger

Due to its semantic succinctness and novelty of expression, poetry is a great test-bed for semantic change analysis. However, so far there is a scarcity of large diachronic corpora. Here, we provide a large corpus of German poetry which consists of about 75k poems with more than 11 million tokens, with poems ranging from the 16th to early 20th century. We then track semantic change in this corpus by investigating the rise of tropes (‘love is magic’) over time and detecting change points of meaning, which we find to occur particularly within the German Romantic period. Additionally, through self-similarity, we reconstruct literary periods and find evidence that the law of linear semantic change also applies to poetry.

pdf bib
Conceptual Change and Distributional Semantic Models : an Exploratory Study on Pitfalls and Possibilities
Pia Sommerauer | Antske Fokkens

Studying conceptual change using embedding models has become increasingly popular in the Digital Humanities community while critical observations about them have received less attention. This paper investigates what the impact of known pitfalls can be on the conclusions drawn in a digital humanities study through the use case of Racism. In addition, we suggest an approach for modeling a complex concept in terms of words and relations representative of the conceptual system. Our results show that different models created from the same data yield different results, but also indicate that using different model architectures, comparing different corpora and comparing to control words and relations can help to identify which results are solid and which may be due to artefact. We propose guidelines to conduct similar studies, but also note that more work is needed to fully understand how we can distinguish artefacts from actual conceptual changes.

pdf bib
Towards Automatic Variant Analysis of Ancient Devotional Texts
Amir Hazem | Béatrice Daille | Dominique Stutzmann | Jacob Currie | Christine Jacquin

We address in this paper the issue of text reuse in liturgical manuscripts of the middle ages. More specifically, we study variant readings of the Obsecro Te prayer, part of the devotional Books of Hours often used by Christians as guidance for their daily prayers. We aim at automatically extracting and categorising pairs of words and expressions that exhibit variant relations. For this purpose, we adopt a linguistic classification that allows to better characterize the variants than edit operations. Then, we study the evolution of Obsecro Te texts from a temporal and geographical axis. Finally, we contrast several unsupervised state-of-the-art approaches for the automatic extraction of Obsecro Te variants. Based on the manual observation of 772 Obsecro Te copies which show more than 21,000 variants, we show that the proposed methodology is helpful for an automatic study of variants and may serve as basis to analyze and to depict useful information from devotional texts.

pdf bib
Understanding the Evolution of Circular Economy through Language Change
Sampriti Mahanty | Frank Boons | Julia Handl | Riza Theresa Batista-Navarro

In this study, we propose to focus on understanding the evolution of a specific scientific conceptthat of Circular Economy (CE)by analysing how the language used in academic discussions has changed semantically. It is worth noting that the meaning and central theme of this concept has remained the same ; however, we hypothesise that it has undergone semantic change by way of additional layers being added to the concept. We have shown that semantic change in language is a reflection of shifts in scientific ideas, which in turn help explain the evolution of a concept. Focusing on the CE concept, our analysis demonstrated that the change over time in the language used in academic discussions of CE is indicative of the way in which the concept evolved and expanded.