Computational Linguistics Journal (2018)


bib (full) Computational Linguistics, Volume 44, Issue 1 - April 2018

Computational Linguistics, Volume 44, Issue 1 - April 2018

pdf bib
A Notion of Semantic Coherence for Underspecified Semantic Representation
Mehdi Manshadi | Daniel Gildea | James F. Allen

The general problem of finding satisfying solutions to constraint-based underspecified representations of quantifier scope is NP-complete. Existing frameworks, including Dominance Graphs, Minimal Recursion Semantics, and Hole Semantics, have struggled to balance expressivity and tractability in order to cover real natural language sentences with efficient algorithms. We address this trade-off with a general principle of coherence, which requires that every variable introduced in the domain of discourse must contribute to the overall semantics of the sentence. We show that every underspecified representation meeting this criterion can be efficiently processed, and that our set of representations subsumes all previously identified tractable sets.

pdf bib
Cache Transition Systems for Graph Parsing
Daniel Gildea | Giorgio Satta | Xiaochang Peng

Motivated by the task of semantic parsing, we describe a transition system that generalizes standard transition-based dependency parsing techniques to generate a graph rather than a tree. Our system includes a cache with fixed size m, and we characterize the relationship between the parameter m and the class of graphs that can be produced through the graph-theoretic concept of tree decomposition. We find empirically that small cache sizes cover a high percentage of sentences in existing semantic corpora.

pdf bib
Weighted DAG Automata for Semantic GraphsDAG Automata for Semantic Graphs
David Chiang | Frank Drewes | Daniel Gildea | Adam Lopez | Giorgio Satta

Graphs have a variety of uses in natural language processing, particularly as representations of linguistic meaning. A deficit in this area of research is a formal framework for creating, combining, and using models involving graphs that parallels the frameworks of finite automata for strings and finite tree automata for trees. A possible starting point for such a framework is the formalism of directed acyclic graph (DAG) automata, defined by Kamimura and Slutzki and extended by Quernheim and Knight. In this article, we study the latter in depth, demonstrating several new results, including a practical recognition algorithm that can be used for inference and learning with models defined on DAG automata. We also propose an extension to graphs with unbounded node degree and show that our results carry over to the extended formalism.


bib (full) Computational Linguistics, Volume 44, Issue 2 - June 2018

Computational Linguistics, Volume 44, Issue 2 - June 2018

pdf bib
A Dependency Perspective on RST Discourse Parsing and EvaluationRST Discourse Parsing and Evaluation
Mathieu Morey | Philippe Muller | Nicholas Asher

Computational text-level discourse analysis mostly happens within Rhetorical Structure Theory (RST), whose structures have classically been presented as constituency trees, and relies on data from the RST Discourse Treebank (RST-DT) ; as a result, the RST discourse parsing community has largely borrowed from the syntactic constituency parsing community. The standard evaluation procedure for RST discourse parsers is thus a simplified variant of PARSEVAL, and most RST discourse parsers use techniques that originated in syntactic constituency parsing. In this article, we isolate a number of conceptual and computational problems with the constituency hypothesis. We then examine the consequences, for the implementation and evaluation of RST discourse parsers, of adopting a dependency perspective on RST structures, a view advocated so far only by a few approaches to discourse parsing. While doing that, we show the importance of the notion of headedness of RST structures. We analyze RST discourse parsing as dependency parsing by adapting to RST a recent proposal in syntactic parsing that relies on head-ordered dependency trees, a representation isomorphic to headed constituency trees. We show how to convert the original trees from the RST corpus, RST-DT, and their binarized versions used by all existing RST parsers to head-ordered dependency trees. We also propose a way to convert existing simple dependency parser output to constituent trees. This allows us to evaluate and to compare approaches from both constituent-based and dependency-based perspectives in a unified framework, using constituency and dependency metrics. We thus propose an evaluation framework to compare extant approaches easily and uniformly, something the RST parsing community has lacked up to now. We can also compare parsers’ predictions to each other across frameworks. This allows us to characterize families of parsing strategies across the different frameworks, in particular with respect to the notion of headedness. Our experiments provide evidence for the conceptual similarities between dependency parsers and shift-reduce constituency parsers, and confirm that dependency parsing constitutes a viable approach to RST discourse parsing.

pdf bib
Unrestricted Bridging Resolution
Yufang Hou | Katja Markert | Michael Strube

In contrast to identity anaphors, which indicate coreference between a noun phrase and its antecedent, bridging anaphors link to their antecedent(s) via lexico-semantic, frame, or encyclopedic relations. Bridging resolution involves recognizing bridging anaphors and finding links to antecedents. In contrast to most prior work, we tackle both problems. Our work also follows a more wide-ranging definition of bridging than most previous work and does not impose any restrictions on the type of bridging anaphora or relations between anaphor and antecedent. We create a corpus (ISNotes) annotated for information status (IS), bridging being one of the IS subcategories. The annotations reach high reliability for all categories and marginal reliability for the bridging subcategory. We use a two-stage statistical global inference method for bridging resolution. Given all mentions in a document, the first stage, bridging anaphora recognition, recognizes bridging anaphors as a subtask of learning fine-grained IS. We use a cascading collective classification method where (i) collective classification allows us to investigate relations among several mentions and autocorrelation among IS classes and (ii) cascaded classification allows us to tackle class imbalance, important for minority classes such as bridging. We show that our method outperforms current methods both for IS recognition overall as well as for bridging, specifically. The second stage, bridging antecedent selection, finds the antecedents for all predicted bridging anaphors. We investigate the phenomenon of semantically or syntactically related bridging anaphors that share the same antecedent, a phenomenon we call sibling anaphors.

pdf bib
Spurious Ambiguity and Focalization
Glyn Morrill | Oriol Valentín

Spurious ambiguity is the phenomenon whereby distinct derivations in grammar may assign the same structural reading, resulting in redundancy in the parse search space and inefficiency in parsing. Understanding the problem depends on identifying the essential mathematical structure of derivations. This is trivial in the case of context free grammar, where the parse structures are ordered trees ; in the case of type logical categorial grammar, the parse structures are proof nets. However, with respect to multiplicatives, intrinsic proof nets have not yet been given for displacement calculus, and proof nets for additives, which have applications to polymorphism, are not easy to characterize. In this context we approach here multiplicative-additive spurious ambiguity by means of the proof-theoretic technique of focalization.

pdf bib
The Influence of Context on the Learning of Metrical Stress Systems Using Finite-State Machines
Cesko Voeten | Menno van Zaanen

Languages vary in the way stress is assigned to syllables within words. This article investigates the learnability of stress systems in a wide range of languages. The stress systems can be described using finite-state automata with symbols indicating levels of stress (primary, secondary, or no stress). Finite-state automata have been the focus of research in the area of grammatical inference for some time now. It has been shown that finite-state machines are learnable from examples using state-merging. One such approach, which aims to learn k-testable languages, has been applied to stress systems with some success. The family of k-testable languages has been shown to be efficiently learnable (in polynomial time). Here, we extend this approach to k, l-local languages by taking not only left context, but also right context, into account. We consider empirical results testing the performance of our learner using various amounts of context (corresponding to varying definitions of phonological locality). Our results show that our approach of learning stress patterns using state-merging is more reliant on left context than on right context. Additionally, some stress systems fail to be learned by our learner using either the left-context k-testable or the left-and-right-context k, l-local learning system. A more complex merging strategy, and hence grammar representation, is required for these stress systems.


bib (full) Computational Linguistics, Volume 44, Issue 3 - September 2018

Computational Linguistics, Volume 44, Issue 3 - September 2018

pdf bib
A Structured Review of the Validity of BLEUBLEU
Ehud Reiter

The BLEU metric has been widely used in NLP for over 15 years to evaluate NLP systems, especially in machine translation and natural language generation. I present a structured review of the evidence on whether BLEU is a valid evaluation techniquein other words, whether BLEU scores correlate with real-world utility and user-satisfaction of NLP systems ; this review covers 284 correlations reported in 34 papers. Overall, the evidence supports using BLEU for diagnostic evaluation of MT systems (which is what it was originally proposed for), but does not support using BLEU outside of MT, for evaluation of individual texts, or for scientific hypothesis testing.

pdf bib
Native Language Identification With Classifier Stacking and Ensembles
Shervin Malmasi | Mark Dras

Ensemble methods using multiple classifiers have proven to be among the most successful approaches for the task of Native Language Identification (NLI), achieving the current state of the art. However, a systematic examination of ensemble methods for NLI has yet to be conducted. Additionally, deeper ensemble architectures such as classifier stacking have not been closely evaluated. We present a set of experiments using three ensemble-based models, testing each with multiple configurations and algorithms. This includes a rigorous application of meta-classification models for NLI, achieving state-of-the-art results on several large data sets, evaluated in both intra-corpus and cross-corpus modes.

pdf bib
Using Semantics for Granularities of Tokenization
Martin Riedl | Chris Biemann

Depending on downstream applications, it is advisable to extend the notion of tokenization from low-level character-based token boundary detection to identification of meaningful and useful language units. This entails both identifying units composed of several single words that form a several single words that form a, as well as splitting single-word compounds into their meaningful parts. In this article, we introduce unsupervised and knowledge-free methods for these two tasks. The main novelty of our research is based on the fact that methods are primarily based on distributional similarity, of which we use two flavors : a sparse count-based and a dense neural-based distributional semantic model. First, we introduce DRUID, which is a method for detecting MWEs. The evaluation on MWE-annotated data sets in two languages and newly extracted evaluation data sets for 32 languages shows that DRUID compares favorably over previous methods not utilizing distributional information. Second, we present SECOS, an algorithm for decompounding close compounds. In an evaluation of four dedicated decompounding data sets across four languages and on data sets extracted from Wiktionary for 14 languages, we demonstrate the superiority of our approach over unsupervised baselines, sometimes even matching the performance of previous language-specific and supervised methods. In a final experiment, we show how both decompounding and MWE information can be used in information retrieval. Here, we obtain the best results when combining word information with MWEs and the compound parts in a bag-of-words retrieval set-up.

pdf bib
Feature-Based Decipherment for Machine Translation
Iftekhar Naim | Parker Riley | Daniel Gildea

Orthographic similarities across languages provide a strong signal for unsupervised probabilistic transduction (decipherment) for closely related language pairs. The existing decipherment models, however, are not well suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed log-linear model. To address this challenge, we perform approximate inference via Markov chain Monte Carlo sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence outperforms the existing generative decipherment models by exploiting the orthographic features. The model both scales to large vocabularies and preserves accuracy in low- and no-resource contexts.

pdf bib
Survey : Anaphora With Non-nominal Antecedents in Computational Linguistics : a SurveySurvey: Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey
Varada Kolhatkar | Adam Roussel | Stefanie Dipper | Heike Zinsmeister

This article provides an extensive overview of the literature related to the phenomenon of non-nominal-antecedent anaphora (also known as abstract anaphora or discourse deixis), a type of anaphora in which an anaphor like that refers to an antecedent (marked in boldface) that is syntactically non-nominal, such as the first sentence in It’s way too hot here. That’s why I’m moving to Alaska. Annotating and automatically resolving these cases of anaphora is interesting in its own right because of the complexities involved in identifying non-nominal antecedents, which typically represent abstract objects such as events, facts, and propositions. There is also practical value in the resolution of non-nominal-antecedent anaphora, as this would help computational systems in machine translation, summarization, and question answering, as well as, conceivably, any other task dependent on some measure of text understanding. Most of the existing approaches to anaphora annotation and resolution focus on nominal-antecedent anaphora, classifying many of the cases where the antecedents are syntactically non-nominal as non-anaphoric. There has been some work done on this topic, but it remains scattered and difficult to collect and assess. With this article, we hope to bring together and synthesize work done in disparate contexts up to now in order to identify fundamental problems and draw conclusions from an overarching perspective. Having a good picture of the current state of the art in this field can help researchers direct their efforts to where they are most necessary. Because of the great variety of theoretical approaches that have been brought to bear on the problem, there is an equally diverse array of terminologies that are used to describe it, so we will provide an overview and discussion of these terminologies. We also describe the linguistic properties of non-nominal-antecedent anaphora, examine previous annotation efforts that have addressed this topic, and present the computational approaches that aim at resolving non-nominal-antecedent anaphora automatically. We close with a review of the remaining open questions in this area and some of our recommendations for future research.


bib (full) Computational Linguistics, Volume 44, Issue 4 - December 2018

Computational Linguistics, Volume 44, Issue 4 - December 2018

pdf bib
Squib : The Language Resource SwitchboardSquib: The Language Resource Switchboard
Claus Zinn

The CLARIN research infrastructure gives users access to an increasingly rich and diverse set of language-related resources and tools. Whereas there is ample support for searching resources using metadata-based search, or full-text search, or for aggregating resources into virtual collections, there is little support for users to help them process resources in one way or another. In spite of the large number of tools that process texts in many different languages, there is no single point of access where users can find tools to fit their needs and the resources they have. In this squib, we present the Language Resource Switchboard (LRS), which helps users to discover tools that can process their resources. For this, the LRS identifies all applicable tools for a given resource, lists the tasks the tools can achieve, and invokes the selected tool in such a way so that processing can start immediately with little or no prior tool parameterization.

pdf bib
Squib : Reproducibility in Computational Linguistics : Are We Willing to Share?Squib: Reproducibility in Computational Linguistics: Are We Willing to Share?
Martijn Wieling | Josine Rawee | Gertjan van Noord

This study focuses on an essential precondition for reproducibility in computational linguistics : the willingness of authors to share relevant source code and data. Ten years after Ted Pedersen’s influential Last Words contribution in Computational Linguistics, we investigate to what extent researchers in computational linguistics are willing and able to share their data and code. We surveyed all 395 full papers presented at the 2011 and 2016 ACL Annual Meetings, and identified whether links to data and code were provided. If working links were not provided, authors were requested to provide this information. Although data were often available, code was shared less often. When working links to code or data were not provided in the paper, authors provided the code in about one third of cases. For a selection of ten papers, we attempted to reproduce the results using the provided data and code. We were able to reproduce the results approximately for six papers. For only a single paper did we obtain the exact same results. Our findings show that even though the situation appears to have improved comparing 2016 to 2011, empiricism in computational linguistics still largely remains a matter of faith. Nevertheless, we are somewhat optimistic about the future. Ensuring reproducibility is not only important for the field as a whole, but also seems worthwhile for individual researchers : The median citation count for studies with working links to the source code is higher.

pdf bib
Interactional Stancetaking in Online Forums
Scott F. Kiesling | Umashanthi Pavalanathan | Jim Fitzpatrick | Xiaochuang Han | Jacob Eisenstein

Language is shaped by the relationships between the speaker / writer and the audience, the object of discussion, and the talk itself. In turn, language is used to reshape these relationships over the course of an interaction. Computational researchers have succeeded in operationalizing sentiment, formality, and politeness, but each of these constructs captures only some aspects of social and relational meaning. Theories of interactional stancetaking have been put forward as holistic accounts, but until now, these theories have been applied only through detailed qualitative analysis of (portions of) a few individual conversations. In this article, we propose a new computational operationalization of interpersonal stancetaking. We begin with annotations of three linked stance dimensionsaffect, investment, and alignmenton 68 conversation threads from the online platform Reddit. Using these annotations, we investigate thread structure and linguistic properties of stancetaking in online conversations. We identify lexical features that characterize the extremes along each stancetaking dimension, and show that these stancetaking properties can be predicted with moderate accuracy from bag-of-words features, even with a relatively small labeled training set. These quantitative analyses are supplemented by extensive qualitative analysis, highlighting the compatibility of computational and qualitative methods in synthesizing evidence about the creation of interactional meaning.

pdf bib
A Joint Model of Conversational Discourse Latent Topics on Microblogs
Jing Li | Yan Song | Zhongyu Wei | Kam-Fai Wong

Conventional topic models are ineffective for topic extraction from microblog messages, because the data sparseness exhibited in short messages lacking structure and contexts results in poor message-level word co-occurrence patterns. To address this issue, we organize microblog messages as conversation trees based on their reposting and replying relations, and propose an unsupervised model that jointly learns word distributions to represent : (1) different roles of conversational discourse, and (2) various latent topics in reflecting content information. By explicitly distinguishing the probabilities of messages with varying discourse roles in containing topical words, our model is able to discover clusters of discourse words that are indicative of topical content. In an automatic evaluation on large-scale microblog corpora, our joint model yields topics with better coherence scores than competitive topic models from previous studies. Qualitative analysis on model outputs indicates that our model induces meaningful representations for both discourse and topics. We further present an empirical study on microblog summarization based on the outputs of our joint model. The results show that the jointly modeled discourse and topic representations can effectively indicate summary-worthy content in microblog conversations.

pdf bib
Sarcasm Analysis Using Conversation Context
Debanjan Ghosh | Alexander R. Fabbri | Smaranda Muresan

Computational models for sarcasm detection have often relied on the content of utterances in isolation. However, the speaker’s sarcastic intent is not always apparent without additional context. Focusing on social media discussions, we investigate three issues : (1) does modeling conversation context help in sarcasm detection? (2) can we identify what part of conversation context triggered the sarcastic reply? and (3) given a sarcastic post that contains multiple sentences, can we identify the specific sentence that is sarcastic? To address the first issue, we investigate several types of Long Short-Term Memory (LSTM) networks that can model both the conversation context and the current turn. We show that LSTM networks with sentence-level attention on context and current turn, as well as the conditional LSTM network, outperform the LSTM model that reads only the current turn. As conversation context, we consider the prior turn, the succeeding turn, or both. Our computational models are tested on two types of social media platforms : Twitter and discussion forums. We discuss several differences between these data sets, ranging from their size to the nature of the gold-label annotations. To address the latter two issues, we present a qualitative analysis of the attention weights produced by the LSTM models (with attention) and discuss the results compared with human performance on the two tasks.

pdf bib
We Usually Do n’t Like Going to the Dentist : Using Common Sense to Detect Irony on TwitterTwitter
Cynthia Van Hee | Els Lefever | Véronique Hoste

Although common sense and connotative knowledge come naturally to most people, computers still struggle to perform well on tasks for which such extratextual information is required. Automatic approaches to sentiment analysis and irony detection have revealed that the lack of such world knowledge undermines classification performance. In this article, we therefore address the challenge of modeling implicit or prototypical sentiment in the framework of automatic irony detection. Starting from manually annotated connoted situation phrases (e.g., flight delays, sitting the whole day at the doctor’s office), we defined the implicit sentiment held towards such situations automatically by using both a lexico-semantic knowledge base and a data-driven method. We further investigate how such implicit sentiment information affects irony detection by assessing a state-of-the-art irony classifier before and after it is informed with implicit sentiment information.

pdf bib
Combining Deep Learning and Argumentative Reasoning for the Analysis of Social Media Textual Content Using Small Data Sets
Oana Cocarascu | Francesca Toni

The use of social media has become a regular habit for many and has changed the way people interact with each other. In this article, we focus on analyzing whether news headlines support tweets and whether reviews are deceptive by analyzing the interaction or the influence that these texts have on the others, thus exploiting contextual information. Concretely, we define a deep learning method for relationbased argument mining to extract argumentative relations of attack and support. We then use this method for determining whether news articles support tweets, a useful task in fact-checking settings, where determining agreement toward a statement is a useful step toward determining its truthfulness. Furthermore, we use our method for extracting bipolar argumentation frameworks from reviews to help detect whether they are deceptive. We show experimentally that our method performs well in both settings. In particular, in the case of deception detection, our method contributes a novel argumentative feature that, when used in combination with other features in standard supervised classifiers, outperforms the latter even on small data sets.