Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Marilyn Walker, Heng Ji, Amanda Stent (Editors)

Anthology ID:
New Orleans, Louisiana
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Marilyn Walker | Heng Ji | Amanda Stent

pdf bib
Gender Bias in Coreference Resolution : Evaluation and Debiasing Methods
Jieyu Zhao | Tianlu Wang | Mark Yatskar | Vicente Ordonez | Kai-Wei Chang

In this paper, we introduce a new benchmark for co-reference resolution focused on gender bias, WinoBias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing datasets.

pdf bib
Is Something Better than Nothing? Automatically Predicting Stance-based Arguments Using Deep Learning and Small Labelled Dataset
Pavithra Rajendran | Danushka Bollegala | Simon Parsons

Online reviews have become a popular portal among customers making decisions about purchasing products. A number of corpora of reviews have been widely investigated in NLP in general, and, in particular, in argument mining. This is a subset of NLP that deals with extracting arguments and the relations among them from user-based content. A major problem faced by argument mining research is the lack of human-annotated data. In this paper, we investigate the use of weakly supervised and semi-supervised methods for automatically annotating data, and thus providing large annotated datasets. We do this by building on previous work that explores the classification of opinions present in reviews based whether the stance is expressed explicitly or implicitly. In the work described here, we automatically annotate stance as implicit or explicit and our results show that the datasets we generate, although noisy, can be used to learn better models for implicit / explicit opinion classification.

pdf bib
Multi-Task Learning for Argumentation Mining in Low-Resource Settings
Claudia Schulz | Steffen Eger | Johannes Daxenberger | Tobias Kahse | Iryna Gurevych

We investigate whether and where multi-task learning (MTL) can improve performance on NLP problems related to argumentation mining (AM), in particular argument component identification. Our results show that MTL performs particularly well (and better than single-task learning) when little training data is available for the main task, a common scenario in AM. Our findings challenge previous assumptions that conceptualizations across AM datasets are divergent and that MTL is difficult for semantic or higher-level tasks.

pdf bib
Neural Models for Reasoning over Multiple Mentions Using Coreference
Bhuwan Dhingra | Qiao Jin | Zhilin Yang | William Cohen | Ruslan Salakhutdinov

Many problems in NLP require aggregating information from multiple mentions of the same entity which may be far apart in the text. Existing Recurrent Neural Network (RNN) layers are biased towards short-term dependencies and hence not suited to such tasks. We present a recurrent layer which is instead biased towards coreferent dependencies. The layer uses coreference annotations extracted from an external system to connect entity mentions belonging to the same cluster. Incorporating this layer into a state-of-the-art reading comprehension model improves performance on three datasets Wikihop, LAMBADA and the bAbi AI tasks with large gains when training data is scarce.

pdf bib
Natural Language Generation by Hierarchical Decoding with Linguistic Patterns
Shang-Yu Su | Kai-Ling Lo | Yi-Ting Yeh | Yun-Nung Chen

Natural language generation (NLG) is a critical component in spoken dialogue systems. Classic NLG can be divided into two phases : (1) sentence planning : deciding on the overall sentence structure, (2) surface realization : determining specific word forms and flattening the sentence structure into a string. Many simple NLG models are based on recurrent neural networks (RNN) and sequence-to-sequence (seq2seq) model, which basically contains a encoder-decoder structure ; these NLG models generate sentences from scratch by jointly optimizing sentence planning and surface realization using a simple cross entropy loss training criterion. However, the simple encoder-decoder architecture usually suffers from generating complex and long sentences, because the decoder has to learn all grammar and diction knowledge. This paper introduces a hierarchical decoding NLG model based on linguistic patterns in different levels, and shows that the proposed method outperforms the traditional one with a smaller model size. Furthermore, the design of the hierarchical decoding is flexible and easily-extendible in various NLG systems.

pdf bib
RankME : Reliable Human Ratings for Natural Language GenerationRankME: Reliable Human Ratings for Natural Language Generation
Jekaterina Novikova | Ondřej Dušek | Verena Rieser

Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), which combines the use of continuous scales and relative assessments. We show that RankME significantly improves the reliability and consistency of human ratings compared to traditional evaluation methods. In addition, we show that it is possible to evaluate NLG systems according to multiple, distinct criteria, which is important for error analysis. Finally, we demonstrate that RankME, in combination with Bayesian estimation of system quality, is a cost-effective alternative for ranking multiple NLG systems.

pdf bib
Sentence Simplification with Memory-Augmented Neural Networks
Tu Vu | Baotian Hu | Tsendsuren Munkhdalai | Hong Yu

Sentence simplification aims to simplify the content and structure of complex sentences, and thus make them easier to interpret for human readers, and easier to process for downstream NLP applications. Recent advances in neural machine translation have paved the way for novel approaches to the task. In this paper, we adapt an architecture with augmented memory capacities called Neural Semantic Encoders (Munkhdalai and Yu, 2017) for sentence simplification. Our experiments demonstrate the effectiveness of our approach on different simplification datasets, both in terms of automatic evaluation measures and human judgments.

pdf bib
A Corpus of Non-Native Written English Annotated for MetaphorEnglish Annotated for Metaphor
Beata Beigman Klebanov | Chee Wee (Ben) Leong | Michael Flor

We present a corpus of 240 argumentative essays written by non-native speakers of English annotated for metaphor. The corpus is made publicly available. We provide benchmark performance of state-of-the-art systems on this new corpus, and explore the relationship between writing proficiency and metaphor use.

pdf bib
An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols
Chaitanya Kulkarni | Wei Xu | Alan Ritter | Raghu Machiraju

We describe an effort to annotate a corpus of natural language instructions consisting of 622 wet lab protocols to facilitate automatic or semi-automatic conversion of protocols into a machine-readable format and benefit biological research. Experimental results demonstrate the utility of our corpus for developing machine learning approaches to shallow semantic parsing of instructional texts. We make our annotated Wet Lab Protocol Corpus available to the research community.

pdf bib
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan | Swabha Swayamdipta | Omer Levy | Roy Schwartz | Samuel Bowman | Noah A. Smith

Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to. We show that, in a significant portion of such data, this protocol leaves clues that make it possible to identify the label by looking only at the hypothesis, without observing the premise. Specifically, we show that a simple text categorization model can correctly classify the hypothesis alone in about 67 % of SNLI (Bowman et. al, 2015) and 53 % of MultiNLI (Williams et. al, 2017). Our analysis reveals that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Our findings suggest that the success of natural language inference models to date has been overestimated, and that the task remains a hard open problem.

pdf bib
Humor Recognition Using Deep Learning
Peng-Yu Chen | Von-Wun Soo

Humor is an essential but most fascinating element in personal communication. How to build computational models to discover the structures of humor, recognize humor and even generate humor remains a challenge and there have been yet few attempts on it. In this paper, we construct and collect four datasets with distinct joke types in both English and Chinese and conduct learning experiments on humor recognition. We implement a Convolutional Neural Network (CNN) with extensive filter size, number and Highway Networks to increase the depth of networks. Results show that our model outperforms in recognition of different types of humor with benchmarks collected in both English and Chinese languages on accuracy, precision, and recall in comparison to previous works.

pdf bib
Reference-less Measure of Faithfulness for Grammatical Error Correction
Leshem Choshen | Omri Abend

We propose USim, a semantic measure for Grammatical Error Correction (that measures the semantic faithfulness of the output to the source, thereby complementing existing reference-less measures (RLMs) for measuring the output’s grammaticality. USim operates by comparing the semantic symbolic structure of the source and the correction, without relying on manually-curated references. Our experiments establish the validity of USim, by showing that the semantic structures can be consistently applied to ungrammatical text, that valid corrections obtain a high USim similarity score to the source, and that invalid corrections obtain a lower score.

pdf bib
Analogies in Complex Verb Meaning Shifts : the Effect of Affect in Semantic Similarity Models
Maximilian Köper | Sabine Schulte im Walde

We present a computational model to detect and distinguish analogies in meaning shifts between German base and complex verbs. In contrast to corpus-based studies, a novel dataset demonstrates that regular shifts represent the smallest class. Classification experiments relying on a standard similarity model successfully distinguish between four types of shifts, with verb classes boosting the performance, and affective features for abstractness, emotion and sentiment representing the most salient indicators.

pdf bib
Character-Based Neural Networks for Sentence Pair Modeling
Wuwei Lan | Wei Xu

Sentence pair modeling is critical for many NLP tasks, such as paraphrase identification, semantic textual similarity, and natural language inference. Most state-of-the-art neural models for these tasks rely on pretrained word embedding and compose sentence-level semantics in varied ways ; however, few works have attempted to verify whether we really need pretrained embeddings in these tasks. In this paper, we study how effective subword-level (character and character n-gram) representations are in sentence pair modeling. Though it is well-known that subword models are effective in tasks with single sentence input, including language modeling and machine translation, they have not been systematically studied in sentence pair modeling tasks where the semantic and string similarities between texts matter. Our experiments show that subword models without any pretrained word embedding can achieve new state-of-the-art results on two social media datasets and competitive results on news data for paraphrase identification.

pdf bib
Diachronic Usage Relatedness (DURel): A Framework for the Annotation of Lexical Semantic ChangeDURel): A Framework for the Annotation of Lexical Semantic Change
Dominik Schlechtweg | Sabine Schulte im Walde | Stefanie Eckmann

We propose a framework that extends synchronic polysemy annotation to diachronic changes in lexical meaning, to counteract the lack of resources for evaluating computational models of lexical semantic change. Our framework exploits an intuitive notion of semantic relatedness, and distinguishes between innovative and reductive meaning changes with high inter-annotator agreement. The resulting test set for German comprises ratings from five annotators for the relatedness of 1,320 use pairs across 22 target words.

pdf bib
Discriminating between Lexico-Semantic Relations with the Specialization Tensor Model
Goran Glavaš | Ivan Vulić

We present a simple and effective feed-forward neural architecture for discriminating between lexico-semantic relations (synonymy, antonymy, hypernymy, and meronymy). Our Specialization Tensor Model (STM) simultaneously produces multiple different specializations of input distributional word vectors, tailored for predicting lexico-semantic relations for word pairs. STM outperforms more complex state-of-the-art architectures on two benchmark datasets and exhibits stable performance across languages. We also show that, if coupled with a bilingual distributional space, the proposed model can transfer the prediction of lexico-semantic relations to a resource-lean target language without any training data.

pdf bib
Evaluating bilingual word embeddings on the long tail
Fabienne Braune | Viktor Hangya | Tobias Eder | Alexander Fraser

Bilingual word embeddings are useful for bilingual lexicon induction, the task of mining translations of given words. Many studies have shown that bilingual word embeddings perform well for bilingual lexicon induction but they focused on frequent words in general domains. For many applications, bilingual lexicon induction of rare and domain-specific words is of critical importance. Therefore, we design a new task to evaluate bilingual word embeddings on rare words in different domains. We show that state-of-the-art approaches fail on this task and present simple new techniques to improve bilingual word embeddings for mining rare words. We release new gold standard datasets and code to stimulate research on this task.

pdf bib
Frustratingly Easy Meta-Embedding Computing Meta-Embeddings by Averaging Source Word Embeddings
Joshua Coates | Danushka Bollegala

Creating accurate meta-embeddings from pre-trained source embeddings has received attention lately. Methods based on global and locally-linear transformation and concatenation have shown to produce accurate meta-embeddings. In this paper, we show that the arithmetic mean of two distinct word embedding sets yields a performant meta-embedding that is comparable or better than more complex meta-embedding learning methods. The result seems counter-intuitive given that vector spaces in different source embeddings are not comparable and can not be simply averaged. We give insight into why averaging can still produce accurate meta-embedding despite the incomparability of the source vector spaces.

pdf bib
Introducing Two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and RelatednessVietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness
Kim Anh Nguyen | Sabine Schulte im Walde | Ngoc Thang Vu

We present two novel datasets for the low-resource language Vietnamese to assess models of semantic similarity : ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges. The two datasets are verified through standard co-occurrence and neural network models, showing results comparable to the respective English datasets.

pdf bib
Similarity Measures for the Detection of Clinical Conditions with Verbal Fluency Tasks
Felipe Paula | Rodrigo Wilkens | Marco Idiart | Aline Villavicencio

Semantic Verbal Fluency tests have been used in the detection of certain clinical conditions, like Dementia. In particular, given a sequence of semantically related words, a large number of switches from one semantic class to another has been linked to clinical conditions. In this work, we investigate three similarity measures for automatically identifying switches in semantic chains : semantic similarity from a manually constructed resource, and word association strength and semantic relatedness, both calculated from corpora. This information is used for building classifiers to distinguish healthy controls from clinical cases with early stages of Alzheimer’s Disease and Mild Cognitive Deficits. The overall results indicate that for clinical conditions the classifiers that use these similarity measures outperform those that use a gold standard taxonomy.

pdf bib
The Word Analogy Testing Caveat
Natalie Schluter

There are some important problems in the evaluation of word embeddings using standard word analogy tests. In particular, in virtue of the assumptions made by systems generating the embeddings, these remain tests over randomness. We show that even supposing there were such word analogy regularities that should be detected in the word embeddings obtained via unsupervised means, standard word analogy test implementation practices provide distorted or contrived results. We raise concerns regarding the use of Principal Component Analysis to 2 or 3 dimensions as a provision of visual evidence for the existence of word analogy relations in embeddings. Finally, we propose some solutions to these problems.

pdf bib
Transition-Based Chinese AMR ParsingChinese AMR Parsing
Chuan Wang | Bin Li | Nianwen Xue

This paper presents the first AMR parser built on the Chinese AMR bank. By applying a transition-based AMR parsing framework to Chinese, we first investigate how well the transitions first designed for English AMR parsing generalize to Chinese and provide a comparative analysis between the transitions for English and Chinese. We then perform a detailed error analysis to identify the major challenges in Chinese AMR parsing that we hope will inform future research in this area.

pdf bib
Knowledge-Enriched Two-Layered Attention Network for Sentiment Analysis
Abhishek Kumar | Daisuke Kawahara | Sadao Kurohashi

We propose a novel two-layered attention network based on Bidirectional Long Short-Term Memory for sentiment analysis. The novel two-layered attention network takes advantage of the external knowledge bases to improve the sentiment prediction. It uses the Knowledge Graph Embedding generated using the WordNet. We build our model by combining the two-layered attention network with the supervised model based on Support Vector Regression using a Multilayer Perceptron network for sentiment analysis. We evaluate our model on the benchmark dataset of SemEval 2017 Task 5. Experimental results show that the proposed model surpasses the top system of SemEval 2017 Task 5. The model performs significantly better by improving the state-of-the-art system at SemEval 2017 Task 5 by 1.7 and 3.7 points for sub-tracks 1 and 2 respectively.

pdf bib
Modeling Inter-Aspect Dependencies for Aspect-Based Sentiment Analysis
Devamanyu Hazarika | Soujanya Poria | Prateek Vij | Gangeshwar Krishnamurthy | Erik Cambria | Roger Zimmermann

Aspect-based Sentiment Analysis is a fine-grained task of sentiment classification for multiple aspects in a sentence. Present neural-based models exploit aspect and its contextual information in the sentence but largely ignore the inter-aspect dependencies. In this paper, we incorporate this pattern by simultaneous classification of all aspects in a sentence along with temporal dependency processing of their corresponding sentence representations using recurrent networks. Results on the benchmark SemEval 2014 dataset suggest the effectiveness of our proposed approach.

pdf bib
Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-Based Sentiment Analysis
Fei Liu | Trevor Cohn | Timothy Baldwin

While neural networks have been shown to achieve impressive results for sentence-level sentiment analysis, targeted aspect-based sentiment analysis (TABSA) extraction of fine-grained opinion polarity w.r.t. a pre-defined set of aspects remains a difficult task. Motivated by recent advances in memory-augmented models for machine reading, we propose a novel architecture, utilising external memory chains with a delayed memory update mechanism to track entities. On a TABSA task, the proposed model demonstrates substantial improvements over state-of-the-art approaches, including those using external knowledge bases.

pdf bib
Modeling Semantic Plausibility by Injecting World Knowledge
Su Wang | Greg Durrett | Katrin Erk

Distributional data tells us that a man can swallow candy, but not that a man can swallow a paintball, since this is never attested. However both are physically plausible events. This paper introduces the task of semantic plausibility : recognizing plausible but possibly novel events. We present a new crowdsourced dataset of semantic plausibility judgments of single events such as man swallow paintball. Simple models based on distributional representations perform poorly on this task, despite doing well on selection preference, but injecting manually elicited knowledge about entity properties provides a substantial performance boost. Our error analysis shows that our new dataset is a great testbed for semantic plausibility models : more sophisticated knowledge representation and propagation could address many of the remaining errors.

pdf bib
A Bi-Model Based RNN Semantic Frame Parsing Model for Intent Detection and Slot FillingRNN Semantic Frame Parsing Model for Intent Detection and Slot Filling
Yu Wang | Yilin Shen | Hongxia Jin

Intent detection and slot filling are two main tasks for building a spoken language understanding(SLU) system. Multiple deep learning based models have demonstrated good results on these tasks. The most effective algorithms are based on the structures of sequence to sequence models (or encoder-decoder models), and generate the intents and semantic tags either using separate models. Most of the previous studies, however, either treat the intent detection and slot filling as two separate parallel tasks, or use a sequence to sequence model to generate both semantic tags and intent. None of the approaches consider the cross-impact between the intent detection task and the slot filling task. In this paper, new Bi-model based RNN semantic frame parsing network structures are designed to perform the intent detection and slot filling tasks jointly, by considering their cross-impact to each other using two correlated bidirectional LSTMs (BLSTM). Our Bi-model structure with a decoder achieves state-of-art result on the benchmark ATIS data, with about 0.5 % intent accuracy improvement and 0.9 % slot filling improvement.

pdf bib
A Laypeople Study on Terminology Identification across Domains and Task Definitions
Anna Hätty | Sabine Schulte im Walde

This paper introduces a new dataset of term annotation. Given that even experts vary significantly in their understanding of termhood, and that term identification is mostly performed as a binary task, we offer a novel perspective to explore the common, natural understanding of what constitutes a term : Laypeople annotate single-word and multi-word terms, across four domains and across four task definitions. Analyses based on inter-annotator agreement offer insights into differences in term specificity, term granularity and subtermhood.

pdf bib
A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network
Dai Quoc Nguyen | Tu Dinh Nguyen | Dat Quoc Nguyen | Dinh Phung

In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB achieves better link prediction performance than previous state-of-the-art embedding models on two benchmark datasets WN18RR and FB15k-237.

pdf bib
Cross-language Article Linking Using Cross-Encyclopedia Entity Embedding
Chun-Kai Wu | Richard Tzong-Han Tsai

Cross-language article linking (CLAL) is the task of finding corresponding article pairs of different languages across encyclopedias. This task is a difficult disambiguation problem in which one article must be selected among several candidate articles with similar titles and contents. Existing works focus on engineering text-based or link-based features for this task, which is a time-consuming job, and some of these features are only applicable within the same encyclopedia. In this paper, we address these problems by proposing cross-encyclopedia entity embedding. Unlike other works, our proposed method does not rely on known cross-language pairs. We apply our method to CLAL between English Wikipedia and Chinese Baidu Baike. Our features improve performance relative to the baseline by 29.62 %. Tested 30 times, our system achieved an average improvement of 2.76 % over the current best system (26.86 % over baseline), a statistically significant result.

pdf bib
Identifying the Most Dominant Event in a News Article by Mining Event Coreference Relations
Prafulla Kumar Choubey | Kaushik Raju | Ruihong Huang

Identifying the most dominant and central event of a document, which governs and connects other foreground and background events in the document, is useful for many applications, such as text summarization, storyline generation and text segmentation. We observed that the central event of a document usually has many coreferential event mentions that are scattered throughout the document for enabling a smooth transition of subtopics. Our empirical experiments, using gold event coreference relations, have shown that the central event of a document can be well identified by mining properties of event coreference chains. But the performance drops when switching to system predicted event coreference relations. In addition, we found that the central event can be more accurately identified by further considering the number of sub-events as well as the realis status of an event.

pdf bib
Keep Your Bearings : Lightly-Supervised Information Extraction with Ladder Networks That Avoids Semantic Drift
Ajay Nagesh | Mihai Surdeanu

We propose a novel approach to semi-supervised learning for information extraction that uses ladder networks (Rasmus et al., 2015). In particular, we focus on the task of named entity classification, defined as identifying the correct label (e.g., person or organization name) of an entity mention in a given context. Our approach is simple, efficient and has the benefit of being robust to semantic drift, a dominant problem in most semi-supervised learning systems. We empirically demonstrate the superior performance of our system compared to the state-of-the-art on two standard datasets for named entity classification. We obtain between 62 % and 200 % improvement over the state-of-art baseline on these two datasets.

pdf bib
Semi-Supervised Event Extraction with Paraphrase Clusters
James Ferguson | Colin Lockard | Daniel Weld | Hannaneh Hajishirzi

Supervised event extraction systems are limited in their accuracy due to the lack of available training data. We present a method for self-training event extraction systems by bootstrapping additional training data. This is done by taking advantage of the occurrence of multiple mentions of the same event instances across newswire articles from multiple sources. If our system can make a high-confidence extraction of some mentions in such a cluster, it can then acquire diverse training examples by adding the other mentions as well. Our experiments show significant performance improvements on multiple event extractors over ACE 2005 and TAC-KBP 2015 datasets.

pdf bib
Structure Regularized Neural Network for Entity Relation Classification for Chinese Literature TextChinese Literature Text
Ji Wen | Xu Sun | Xuancheng Ren | Qi Su

Relation classification is an important semantic processing task in the field of natural language processing. In this paper, we propose the task of relation classification for Chinese literature text. A new dataset of Chinese literature text is constructed to facilitate the study in this task. We present a novel model, named Structure Regularized Bidirectional Recurrent Convolutional Neural Network (SR-BRCNN), to identify the relation between entities. The proposed model learns relation representations along the shortest dependency path (SDP) extracted from the structure regularized dependency tree, which has the benefits of reducing the complexity of the whole model. Experimental results show that the proposed method significantly improves the F1 score by 10.3, and outperforms the state-of-the-art approaches on Chinese literature text.

pdf bib
Syntactically Aware Neural Architectures for Definition Extraction
Luis Espinosa-Anke | Steven Schockaert

Automatically identifying definitional knowledge in text corpora (Definition Extraction or DE) is an important task with direct applications in, among others, Automatic Glossary Generation, Taxonomy Learning, Question Answering and Semantic Search. It is generally cast as a binary classification problem between definitional and non-definitional sentences. In this paper we present a set of neural architectures combining Convolutional and Recurrent Neural Networks, which are further enriched by incorporating linguistic information via syntactic dependencies. Our experimental results in the task of sentence classification, on two benchmarking DE datasets (one generic, one domain-specific), show that these models obtain consistent state of the art results. Furthermore, we demonstrate that models trained on clean Wikipedia-like definitions can successfully be applied to more noisy domain-specific corpora.

pdf bib
Automatically Selecting the Best Dependency Annotation Design with Dynamic Oracles
Guillaume Wisniewski | Ophélie Lacroix | François Yvon

This work introduces a new strategy to compare the numerous conventions that have been proposed over the years for expressing dependency structures and discover the one for which a parser will achieve the highest parsing performance. Instead of associating each sentence in the training set with a single gold reference we propose to consider a set of references encoding alternative syntactic representations. Training a parser with a dynamic oracle will then automatically select among all alternatives the reference that will be predicted with the highest accuracy. Experiments on the UD corpora show the validity of this approach.

pdf bib
Consistent CCG Parsing over Multiple Sentences for Improved Logical ReasoningCCG Parsing over Multiple Sentences for Improved Logical Reasoning
Masashi Yoshikawa | Koji Mineshima | Hiroshi Noji | Daisuke Bekki

In formal logic-based approaches to Recognizing Textual Entailment (RTE), a Combinatory Categorial Grammar (CCG) parser is used to parse input premises and hypotheses to obtain their logical formulas. Here, it is important that the parser processes the sentences consistently ; failing to recognize the similar syntactic structure results in inconsistent predicate argument structures among them, in which case the succeeding theorem proving is doomed to failure. In this work, we present a simple method to extend an existing CCG parser to parse a set of sentences consistently, which is achieved with an inter-sentence modeling with Markov Random Fields (MRF). When combined with existing logic-based systems, our method always shows improvement in the RTE experiments on English and Japanese languages.

pdf bib
Exploiting Dynamic Oracles to Train Projective Dependency Parsers on Non-Projective Trees
Lauriane Aufrant | Guillaume Wisniewski | François Yvon

Because the most common transition systems are projective, training a transition-based dependency parser often implies to either ignore or rewrite the non-projective training examples, which has an adverse impact on accuracy. In this work, we propose a simple modification of dynamic oracles, which enables the use of non-projective data when training projective parsers. Evaluation on 73 treebanks shows that our method achieves significant gains (+2 to +7 UAS for the most non-projective languages) and consistently outperforms traditional projectivization and pseudo-projectivization approaches.

pdf bib
Improving Coverage and Runtime Complexity for Exact Inference in Non-Projective Transition-Based Dependency Parsers
Tianze Shi | Carlos Gómez-Rodríguez | Lillian Lee

We generalize Cohen, Gmez-Rodrguez, and Satta’s (2011) parser to a family of non-projective transition-based dependency parsers allowing polynomial-time exact inference. This includes novel parsers with better coverage than Cohen et al. (2011), and even a variant that reduces time complexity to O(n^6), improving over the known bounds in exact inference for non-projective transition-based parsing. We hope that this piece of theoretical work inspires design of novel transition systems with better coverage and better run-time guarantees.O(n^6), improving over the known bounds in exact inference for non-projective transition-based parsing. We hope that this piece of theoretical work inspires design of novel transition systems with better coverage and better run-time guarantees.

pdf bib
Towards a Variability Measure for Multiword Expressions
Caroline Pasquer | Agata Savary | Jean-Yves Antoine | Carlos Ramisch

One of the most outstanding properties of multiword expressions (MWEs), especially verbal ones (VMWEs), important both in theoretical models and applications, is their idiosyncratic variability. Some MWEs are always continuous, while some others admit certain types of insertions. Components of some MWEs are rarely or never modified, while some others admit either specific or unrestricted modification. This unpredictable variability profile of MWEs hinders modeling and processing them as words-with-spaces on the one hand, and as regular syntactic structures on the other hand. Since variability of MWEs is a matter of scale rather than a binary property, we propose a 2-dimensional language-independent measure of variability dedicated to verbal MWEs based on syntactic and discontinuity-related clues. We assess its relevance with respect to a linguistic benchmark and its utility for the tasks of VMWE classification and variant identification on a French corpus.

pdf bib
Contextual Augmentation : Data Augmentation by Words with Paradigmatic Relations
Sosuke Kobayashi

We propose a novel data augmentation for labeled sentences called contextual augmentation. We assume an invariance that sentences are natural even if the words in the sentences are replaced with other words with paradigmatic relations. We stochastically replace words with other words that are predicted by a bi-directional language model at the word positions. Words predicted according to a context are numerous but appropriate for the augmentation of the original words. Furthermore, we retrofit a language model with a label-conditional architecture, which allows the model to augment sentences without breaking the label-compatibility. Through the experiments for six various different text classification tasks, we demonstrate that the proposed method improves classifiers based on the convolutional or recurrent neural networks.

pdf bib
Self-Attention with Relative Position Representations
Peter Shaw | Jakob Uszkoreit | Ashish Vaswani

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.

pdf bib
Automated Paraphrase Lattice Creation for HyTER Machine Translation EvaluationHyTER Machine Translation Evaluation
Marianna Apidianaki | Guillaume Wisniewski | Anne Cocos | Chris Callison-Burch

We propose a variant of a well-known machine translation (MT) evaluation metric, HyTER (Dreyer and Marcu, 2012), which exploits reference translations enriched with meaning equivalent expressions. The original HyTER metric relied on hand-crafted paraphrase networks which restricted its applicability to new data. We test, for the first time, HyTER with automatically built paraphrase lattices. We show that although the metric obtains good results on small and carefully curated data with both manually and automatically selected substitutes, it achieves medium performance on much larger and noisier datasets, demonstrating the limits of the metric for tuning and evaluation of current MT systems.

pdf bib
Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks
Diego Marcheggiani | Jasmijn Bastings | Ivan Titov

Semantic representations have long been argued as potentially useful for enforcing meaning preservation and improving generalization performance of machine translation methods. In this work, we are the first to incorporate information about predicate-argument structure of source sentences (namely, semantic-role representations) into neural machine translation. We use Graph Convolutional Networks (GCNs) to inject a semantic bias into sentence encoders and achieve improvements in BLEU scores over the linguistic-agnostic and syntax-aware versions on the EnglishGerman language pair.

pdf bib
Incremental Decoding and Training Methods for Simultaneous Translation in Neural Machine Translation
Fahim Dalvi | Nadir Durrani | Hassan Sajjad | Stephan Vogel

We address the problem of simultaneous translation by modifying the Neural MT decoder to operate with dynamically built encoder and attention. We propose a tunable agent which decides the best segmentation strategy for a user-defined BLEU loss and Average Proportion (AP) constraint. Our agent outperforms previously proposed Wait-if-diff and Wait-if-worse agents (Cho and Esipova, 2016) on BLEU with a lower latency. Secondly we proposed data-driven changes to Neural MT training to better match the incremental decoding framework.

pdf bib
Learning Hidden Unit Contribution for Adapting Neural Machine Translation Models
David Vilar

In this paper we explore the use of Learning Hidden Unit Contribution for the task of neural machine translation. The method was initially proposed in the context of speech recognition for adapting a general system to the specific acoustic characteristics of each speaker. Similar in spirit, in a machine translation framework we want to adapt a general system to a specific domain. We show that the proposed method achieves improvements of up to 2.6 BLEU points over a general system, and up to 6 BLEU points if the initial system has only been trained on out-of-domain data, a situation which may easily happen in practice. The good performance together with its short training time and small memory footprint make it a very attractive solution for domain adaptation.

pdf bib
Neural Machine Translation Decoding with Terminology Constraints
Eva Hasler | Adrià de Gispert | Gonzalo Iglesias | Bill Byrne

Despite the impressive quality improvements yielded by neural machine translation (NMT) systems, controlling their translation output to adhere to user-provided terminology constraints remains an open problem. We describe our approach to constrained neural decoding based on finite-state machines and multi-stack decoding which supports target-side constraints as well as constraints with corresponding aligned input text spans. We demonstrate the performance of our framework on multiple translation tasks and motivate the need for constrained decoding with attentions as a means of reducing misplacement and duplication when translating user constraints.

pdf bib
On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference
Adam Poliak | Yonatan Belinkov | James Glass | Benjamin Van Durme

We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena. We use these representations as features to train a natural language inference (NLI) classifier based on datasets recast from existing semantic annotations. In applying this process to a representative NMT system, we find its encoder appears most suited to supporting inferences at the syntax-semantics interface, as compared to anaphora resolution requiring world knowledge. We conclude with a discussion on the merits and potential deficiencies of the existing process, and how it may be improved and extended as a broader framework for evaluating semantic coverage

pdf bib
Using Word Vectors to Improve Word Alignments for Low Resource Machine Translation
Nima Pourdamghani | Marjan Ghazvininejad | Kevin Knight

We present a method for improving word alignments using word similarities. This method is based on encouraging common alignment links between semantically similar words. We use word vectors trained on monolingual data to estimate similarity. Our experiments on translating fifteen languages into English show consistent BLEU score improvements across the languages.

pdf bib
When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?
Ye Qi | Devendra Sachan | Matthieu Felix | Sarguna Padmanabhan | Graham Neubig

The performance of Neural Machine Translation (NMT) systems often suffers in low-resource scenarios where sufficiently large-scale parallel corpora can not be obtained. Pre-trained word embeddings have proven to be invaluable for improving performance in natural language analysis tasks, which often suffer from paucity of data. However, their utility for NMT has not been extensively explored. In this work, we perform five sets of experiments that analyze when we can expect pre-trained word embeddings to help in NMT tasks. We show that such embeddings can be surprisingly effective in some cases providing gains of up to 20 BLEU points in the most favorable setting.

pdf bib
The Computational Complexity of Distinctive Feature Minimization in Phonology
Hubie Chen | Mans Hulden

We analyze the complexity of the problem of determining whether a set of phonemes forms a natural class and, if so, that of finding the minimal feature specification for the class. A standard assumption in phonology is that finding a minimal feature specification is an automatic part of acquisition and generalization. We find that the natural class decision problem is tractable (i.e. is in P), while the minimization problem is not ; the decision version of the problem which determines whether a natural class can be defined with k features or less is NP-complete. We also show that, empirically, a greedy algorithm for finding minimal feature specifications will sometimes fail, and thus can not be assumed to be the basis for human performance in solving the problem.k features or less is NP-complete. We also show that, empirically, a greedy algorithm for finding minimal feature specifications will sometimes fail, and thus cannot be assumed to be the basis for human performance in solving the problem.

pdf bib
Crowdsourcing Question-Answer Meaning Representations
Julian Michael | Gabriel Stanovsky | Luheng He | Ido Dagan | Luke Zettlemoyer

We introduce Question-Answer Meaning Representations (QAMRs), which represent the predicate-argument structure of a sentence as a set of question-answer pairs. We develop a crowdsourcing scheme to show that QAMRs can be labeled with very little training, and gather a dataset with over 5,000 sentences and 100,000 questions. A qualitative analysis demonstrates that the crowd-generated question-answer pairs cover the vast majority of predicate-argument relationships in existing datasets (including PropBank, NomBank, and QA-SRL) along with many previously under-resourced ones, including implicit arguments and relations. We also report baseline models for question generation and answering, and summarize a recent approach for using QAMR labels to improve an Open IE system. These results suggest the freely available QAMR data and annotation scheme should support significant future work.

pdf bib
Robust Machine Comprehension Models via Adversarial Training
Yicheng Wang | Mohit Bansal

It is shown that many published models for the Stanford Question Answering Dataset (Rajpurkar et al., 2016) lack robustness, suffering an over 50 % decrease in F1 score during adversarial evaluation based on the AddSent (Jia and Liang, 2017) algorithm. It has also been shown that retraining models on data generated by AddSent has limited effect on their robustness. We propose a novel alternative adversary-generation algorithm, AddSentDiverse, that significantly increases the variance within the adversarial training data by providing effective examples that punish the model for making certain superficial assumptions. Further, in order to improve robustness to AddSent’s semantic perturbations (e.g., antonyms), we jointly improve the model’s semantic-relationship learning capabilities in addition to our AddSentDiverse-based adversarial training data augmentation. With these additions, we show that we can make a state-of-the-art model significantly more robust, achieving a 36.5 % increase in F1 score under many different types of adversarial evaluation while maintaining performance on the regular SQuAD task.

pdf bib
Simple and Effective Semi-Supervised Question Answering
Bhuwan Dhingra | Danish Danish | Dheeraj Rajagopal

Recent success of deep learning models for the task of extractive Question Answering (QA) is hinged on the availability of large annotated corpora. However, large domain specific annotated corpora are limited and expensive to construct. In this work, we envision a system where the end user specifies a set of base documents and only a few labelled examples. Our system exploits the document structure to create cloze-style questions from these base documents ; pre-trains a powerful neural network on the cloze style questions ; and further fine-tunes the model on the labeled examples. We evaluate our proposed system across three diverse datasets from different domains, and find it to be highly effective with very little labeled data. We attain more than 50 % F1 score on SQuAD and TriviaQA with less than a thousand labelled examples. We are also releasing a set of 3.2 M cloze-style questions for practitioners to use while building QA systems.

pdf bib
Community Member Retrieval on Social Media Using Textual Information
Aaron Jaech | Shobhit Hathi | Mari Ostendorf

This paper addresses the problem of community membership detection using only text features in a scenario where a small number of positive labeled examples defines the community. The solution introduces an unsupervised proxy task for learning user embeddings : user re-identification. Experiments with 16 different communities show that the resulting embeddings are more effective for community membership identification than common unsupervised representations.

pdf bib
Predicting Foreign Language Usage from English-Only Social Media PostsEnglish-Only Social Media Posts
Svitlana Volkova | Stephen Ranshous | Lawrence Phillips

Social media is known for its multi-cultural and multilingual interactions, a natural product of which is code-mixing. Multilingual speakers mix languages they tweet to address a different audience, express certain feelings, or attract attention. This paper presents a large-scale analysis of 6 million tweets produced by 27 thousand multilingual users speaking 12 other languages besides English. We rely on this corpus to build predictive models to infer non-English languages that users speak exclusively from their English tweets. Unlike native language identification task, we rely on large amounts of informal social media communications rather than ESL essays. We contrast the predictive power of the state-of-the-art machine learning models trained on lexical, syntactic, and stylistic signals with neural network models learned from word, character and byte representations extracted from English only tweets. We report that content, style and syntax are the most predictive of non-English languages that users speak on Twitter. Neural network models learned from byte representations of user content combined with transfer learning yield the best performance. Finally, by analyzing cross-lingual transfer the influence of non-English languages on various levels of linguistic performance in English, we present novel findings on stylistic and syntactic variations across speakers of 12 languages in social media.

pdf bib
A Mixed Hierarchical Attention Based Encoder-Decoder Approach for Standard Table Summarization
Parag Jain | Anirban Laha | Karthik Sankaranarayanan | Preksha Nema | Mitesh M. Khapra | Shreyas Shetty

Structured data summarization involves generation of natural language summaries from structured input data. In this work, we consider summarizing structured data occurring in the form of tables as they are prevalent across a wide variety of domains. We formulate the standard table summarization problem, which deals with tables conforming to a single predefined schema. To this end, we propose a mixed hierarchical attention based encoder-decoder model which is able to leverage the structure in addition to the content of the tables. Our experiments on the publicly available weathergov dataset show around 18 BLEU (around 30 %) improvement over the current state-of-the-art.

pdf bib
Effective Crowdsourcing for a New Type of Summarization Task
Youxuan Jiang | Catherine Finegan-Dollak | Jonathan K. Kummerfeld | Walter Lasecki

Most summarization research focuses on summarizing the entire given text, but in practice readers are often interested in only one aspect of the document or conversation. We propose targeted summarization as an umbrella category for summarization tasks that intentionally consider only parts of the input data. This covers query-based summarization, update summarization, and a new task we propose where the goal is to summarize a particular aspect of a document. However, collecting data for this new task is hard because directly asking annotators (e.g., crowd workers) to write summaries leads to data with low accuracy when there are a large number of facts to include. We introduce a novel crowdsourcing workflow, Pin-Refine, that allows us to collect high-quality summaries for our task, a necessary step for the development of automatic systems.

pdf bib
Key2Vec : Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase EmbeddingsKey2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings
Debanjan Mahata | John Kuriakose | Rajiv Ratn Shah | Roger Zimmermann

Keyphrase extraction is a fundamental task in natural language processing that facilitates mapping of documents to a set of representative phrases. In this paper, we present an unsupervised technique (Key2Vec) that leverages phrase embeddings for ranking keyphrases extracted from scientific articles. Specifically, we propose an effective way of processing text documents for training multi-word phrase embeddings that are used for thematic representation of scientific articles and ranking of keyphrases extracted from them using theme-weighted PageRank. Evaluations are performed on benchmark datasets producing state-of-the-art results.

pdf bib
Learning to Generate Wikipedia Summaries for Underserved Languages from WikidataWikipedia Summaries for Underserved Languages from Wikidata
Lucie-Aimée Kaffee | Hady Elsahar | Pavlos Vougiouklis | Christophe Gravier | Frédérique Laforest | Jonathon Hare | Elena Simperl

While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures : Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.

pdf bib
Multi-Reward Reinforced Summarization with Saliency and Entailment
Ramakanth Pasunuru | Mohit Bansal

Abstractive text summarization is the task of compressing and rewriting a long document into a short summary while maintaining saliency, directed logical entailment, and non-redundancy. In this work, we address these three important aspects of a good summary via a reinforcement learning approach with two novel reward functions : ROUGESal and Entail, on top of a coverage-based baseline. The ROUGESal reward modifies the ROUGE metric by up-weighting the salient phrases / words detected via a keyphrase classifier. The Entail reward gives high (length-normalized) scores to logically-entailed summaries using an entailment classifier. Further, we show superior performance improvement when these rewards are combined with traditional metric (ROUGE) based rewards, via our novel and effective multi-reward approach of optimizing multiple rewards simultaneously in alternate mini-batches. Our method achieves the new state-of-the-art results on CNN / Daily Mail dataset as well as strong improvements in a test-only transfer setup on DUC-2002.

pdf bib
Objective Function Learning to Match Human Judgements for Optimization-Based Summarization
Maxime Peyrard | Iryna Gurevych

Supervised summarization systems usually rely on supervision at the sentence or n-gram level provided by automatic metrics like ROUGE, which act as noisy proxies for human judgments. In this work, we learn a summary-level scoring function including human judgments as supervision and automatically generated data as regularization. We extract summaries with a genetic algorithm using as a fitness function. We observe strong and promising performances across datasets in both automatic and manual evaluation.\\theta including human judgments as supervision and automatically generated data as regularization. We extract summaries with a genetic algorithm using \\theta as a fitness function. We observe strong and promising performances across datasets in both automatic and manual evaluation.

pdf bib
Unsupervised Keyphrase Extraction with Multipartite Graphs
Florian Boudin

We propose an unsupervised keyphrase extraction model that encodes topical information within a multipartite graph structure. Our model represents keyphrase candidates and topics in a single graph and exploits their mutually reinforcing relationship to improve candidate ranking. We further introduce a novel mechanism to incorporate keyphrase selection preferences into the model. Experiments conducted on three widely used datasets show significant improvements over state-of-the-art graph-based models.

pdf bib
Where Have I Heard This Story Before? Identifying Narrative Similarity in Movie RemakesI Heard This Story Before? Identifying Narrative Similarity in Movie Remakes
Snigdha Chaturvedi | Shashank Srivastava | Dan Roth

People can identify correspondences between narratives in everyday life. For example, an analogy with the Cinderella story may be made in describing the unexpected success of an underdog in seemingly different stories. We present a new task and dataset for story understanding : identifying instances of similar narratives from a collection of narrative texts. We present an initial approach for this problem, which finds correspondences between narratives in terms of plot events, and resemblances between characters and their social relationships. Our approach yields an 8 % absolute improvement in performance over a competitive information-retrieval baseline on a novel dataset of plot summaries of 577 movie remakes from Wikipedia.

pdf bib
Multimodal Emoji Prediction
Francesco Barbieri | Miguel Ballesteros | Francesco Ronzano | Horacio Saggion

Emojis are small images that are commonly included in social media text messages. The combination of visual and textual content in the same message builds up a modern way of communication, that automatic systems are not used to deal with. In this paper we extend recent advances in emoji prediction by putting forward a multimodal approach that is able to predict emojis in Instagram posts. Instagram posts are composed of pictures together with texts which sometimes include emojis. We show that these emojis can be predicted by using the text, but also using the picture. Our main finding is that incorporating the two synergistic modalities, in a combined model, improves accuracy in an emoji prediction task. This result demonstrates that these two modalities (text and images) encode different information on the use of emojis and therefore can complement each other.

pdf bib
Higher-Order Coreference Resolution with Coarse-to-Fine Inference
Kenton Lee | Luheng He | Luke Zettlemoyer

We introduce a fully-differentiable approximation to higher-order inference for coreference resolution. Our approach uses the antecedent distribution from a span-ranking architecture as an attention mechanism to iteratively refine span representations. This enables the model to softly consider multiple hops in the predicted clusters. To alleviate the computational cost of this iterative process, we introduce a coarse-to-fine approach that incorporates a less accurate but more efficient bilinear factor, enabling more aggressive pruning without hurting accuracy. Compared to the existing state-of-the-art span-ranking approach, our model significantly improves accuracy on the English OntoNotes benchmark, while being far more computationally efficient.

pdf bib
Non-Projective Dependency Parsing with Non-Local Transitions
Daniel Fernández-González | Carlos Gómez-Rodríguez

We present a novel transition system, based on the Covington non-projective parser, introducing non-local transitions that can directly create arcs involving nodes to the left of the current focus positions. This avoids the need for long sequences of No-Arcs transitions to create long-distance arcs, thus alleviating error propagation. The resulting parser outperforms the original version and achieves the best accuracy on the Stanford Dependencies conversion of the Penn Treebank among greedy transition-based parsers.

pdf bib
Detecting Linguistic Characteristics of Alzheimer’s Dementia by Interpreting Neural ModelsAlzheimer’s Dementia by Interpreting Neural Models
Sweta Karlekar | Tong Niu | Mohit Bansal

Alzheimer’s disease (AD) is an irreversible and progressive brain disease that can be stopped or slowed down with medical treatment. Language changes serve as a sign that a patient’s cognitive functions have been impacted, potentially leading to early diagnosis. In this work, we use NLP techniques to classify and analyze the linguistic characteristics of AD patients using the DementiaBank dataset. We apply three neural models based on CNNs, LSTM-RNNs, and their combination, to distinguish between language samples from AD and control patients. We achieve a new independent benchmark accuracy for the AD classification task. More importantly, we next interpret what these neural models have learned about the linguistic characteristics of AD patients, via analysis based on activation clustering and first-derivative saliency techniques. We then perform novel automatic pattern discovery inside activation clusters, and consolidate AD patients’ distinctive grammar patterns. Additionally, we show that first derivative saliency can not only rediscover previous language patterns of AD patients, but also shed light on the limitations of neural models. Lastly, we also include analysis of gender-separated AD data.

pdf bib
Feudal Reinforcement Learning for Dialogue Management in Large Domains
Iñigo Casanueva | Paweł Budzianowski | Pei-Hao Su | Stefan Ultes | Lina M. Rojas-Barahona | Bo-Hsiang Tseng | Milica Gašić

Reinforcement learning (RL) is a promising approach to solve dialogue policy optimisation. Traditional RL algorithms, however, fail to scale to large domains due to the curse of dimensionality. We propose a novel Dialogue Management architecture, based on Feudal RL, which decomposes the decision into two steps ; a first step where a master policy selects a subset of primitive actions, and a second step where a primitive action is chosen from the selected subset. The structural information included in the domain ontology is used to abstract the dialogue state space, taking the decisions at each step using different parts of the abstracted state. This, combined with an information sharing mechanism between slots, increases the scalability to large domains. We show that an implementation of this approach, based on Deep-Q Networks, significantly outperforms previous state of the art in several dialogue domains and environments, without the need of any additional reward signal.

pdf bib
Evaluating Historical Text Normalization Systems : How Well Do They Generalize?
Alexander Robertson | Sharon Goldwater

We highlight several issues in the evaluation of historical text normalization systems that make it hard to tell how well these systems would actually work in practicei.e., for new datasets or languages ; in comparison to more nave systems ; or as a preprocessing step for downstream NLP tools. We illustrate these issues and exemplify our proposed evaluation practices by comparing two neural models against a nave baseline system. We show that the neural models generalize well to unseen words in tests on five languages ; nevertheless, they provide no clear benefit over the nave baseline for downstream POS tagging of an English historical collection. We conclude that future work should include more rigorous evaluation, including both intrinsic and extrinsic measures where possible.

pdf bib
Natural Language to Structured Query Generation via Meta-Learning
Po-Sen Huang | Chenglong Wang | Rishabh Singh | Wen-tau Yih | Xiaodong He

In conventional supervised training, a model is trained to fit all the training examples. However, having a monolithic model may not always be the best strategy, as examples could vary widely. In this work, we explore a different learning protocol that treats each example as a unique pseudo-task, by reducing the original learning problem to a few-shot meta-learning scenario with the help of a domain-dependent relevance function. When evaluated on the WikiSQL dataset, our approach leads to faster convergence and achieves 1.1%5.4 % absolute accuracy gains over the non-meta-learning counterparts.

pdf bib
Slot-Gated Modeling for Joint Slot Filling and Intent Prediction
Chih-Wen Goo | Guang Gao | Yun-Kai Hsu | Chih-Li Huo | Tsung-Chieh Chen | Keng-Wei Hsu | Yun-Nung Chen

Attention-based recurrent neural network models for joint intent detection and slot filling have achieved the state-of-the-art performance, while they have independent attention weights. Considering that slot and intent have the strong relationship, this paper proposes a slot gate that focuses on learning the relationship between intent and slot attention vectors in order to obtain better semantic frame results by the global optimization. The experiments show that our proposed model significantly improves sentence-level semantic frame accuracy with 4.2 % and 1.9 % relative improvement compared to the attentional model on benchmark ATIS and Snips datasets respectively

pdf bib
An Evaluation of Image-Based Verb Prediction Models against Human Eye-Tracking Data
Spandana Gella | Frank Keller

Recent research in language and vision has developed models for predicting and disambiguating verbs from images. Here, we ask whether the predictions made by such models correspond to human intuitions about visual verbs. We show that the image regions a verb prediction model identifies as salient for a given verb correlate with the regions fixated by human observers performing a verb classification task.

pdf bib
Learning to Color from Language
Varun Manjunatha | Mohit Iyyer | Jordan Boyd-Graber | Larry Davis

Automatic colorization is the process of adding color to greyscale images. We condition this process on language, allowing end users to manipulate a colorized image by feeding in different captions. We present two different architectures for language-conditioned colorization, both of which produce more accurate and plausible colorizations than a language-agnostic version. Furthermore, we demonstrate through crowdsourced experiments that we can dramatically alter colorizations simply by manipulating descriptive color words in captions.

pdf bib
Watch, Listen, and Describe : Globally and Locally Aligned Cross-Modal Attentions for Video Captioning
Xin Wang | Yuan-Fang Wang | William Yang Wang

A major challenge for video captioning is to combine audio and visual cues. Existing multi-modal fusion methods have shown encouraging results in video understanding. However, the temporal structures of multiple modalities at different granularities are rarely explored, and how to selectively fuse the multi-modal representations at different levels of details remains uncharted. In this paper, we propose a novel hierarchically aligned cross-modal attention (HACA) framework to learn and selectively fuse both global and local temporal dynamics of different modalities. Furthermore, for the first time, we validate the superior performance of the deep audio features on the video captioning task. Finally, our HACA model significantly outperforms the previous best systems and achieves new state-of-the-art results on the widely used MSR-VTT dataset.