Transactions of the Association for Computational Linguistics (2019)


up

bib (full) Transactions of the Association for Computational Linguistics, Volume 7

bib
Transactions of the Association for Computational Linguistics, Volume 7
Lillian Lee | Mark Johnson | Brian Roark | Ani Nenkova

pdf bib
Semantic Neural Machine Translation Using AMRAMR
Linfeng Song | Daniel Gildea | Yue Zhang | Zhiguo Wang | Jinsong Su

It is intuitive that semantic representations can be useful for machine translation, mainly because they can help in enforcing meaning preservation and handling data sparsity (many sentences correspond to one meaning) of machine translation models. On the other hand, little work has been done on leveraging semantics for neural machine translation (NMT). In this work, we study the usefulness of AMR (abstract meaning representation) on NMT. Experiments on a standard English-to-German dataset show that incorporating AMR as additional knowledge can significantly improve a strong attention-based sequence-to-sequence neural translation model.

pdf bib
Joint Transition-Based Models for Morpho-Syntactic Parsing : Parsing Strategies for MRLs and a Case Study from Modern HebrewMRLs and a Case Study from Modern Hebrew
Amir More | Amit Seker | Victoria Basmova | Reut Tsarfaty

In standard NLP pipelines, morphological analysis and disambiguation (MA&D) precedes syntactic and semantic downstream tasks. However, for languages with complex and ambiguous word-internal structure, known as morphologically rich languages (MRLs), it has been hypothesized that syntactic context may be crucial for accurate MA&D, and vice versa. In this work we empirically confirm this hypothesis for Modern Hebrew, an MRL with complex morphology and severe word-level ambiguity, in a novel transition-based framework. Specifically, we propose a joint morphosyntactic transition-based framework which formally unifies two distinct transition systems, morphological and syntactic, into a single transition-based system with joint training and joint inference. We empirically show that MA&D results obtained in the joint settings outperform MA&D results obtained by the respective standalone components, and that end-to-end parsing results obtained by our joint system present a new state of the art for Hebrew dependency parsing.

pdf bib
Synchronous Bidirectional Neural Machine Translation
Long Zhou | Jiajun Zhang | Chengqing Zong

Existing approaches to neural machine translation (NMT) generate the target language sequence token-by-token from left to right. However, this kind of unidirectional decoding framework can not make full use of the target-side future contexts which can be produced in a right-to-left decoding direction, and thus suffers from the issue of unbalanced outputs. In this paper, we introduce a synchronous bidirectionalneural machine translation (SB-NMT) that predicts its outputs using left-to-right and right-to-left decoding simultaneously and interactively, in order to leverage both of the history and future information at the same time. Specifically, we first propose a new algorithm that enables synchronous bidirectional decoding in a single model. Then, we present an interactive decoding model in which left-to-right (right-to-left) generation does not only depend on its previously generated outputs, but also relies on future contexts predicted by right-to-left (left-to-right) decoding. We extensively evaluate the proposed SB-NMT model on large-scale NIST ChineseEnglish, WMT14 EnglishGerman, and WMT18 RussianEnglish translation tasks. Experimental results demonstrate that our model achieves significant improvements over the strong Transformer model by 3.92, 1.49, and 1.04 BLEU points, respectively, and obtains the state-of-the-art performance on ChineseEnglish and EnglishGerman translation tasks.

pdf bib
GILE : A Generalized Input-Label Embedding for Text ClassificationGILE: A Generalized Input-Label Embedding for Text Classification
Nikolaos Pappas | James Henderson

Neural text classification models typically treat output labels as categorical variables that lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model that generalizes over previous such models, addresses their limitations, and does not compromise performance on seen labels. The model consists of a joint nonlinear input-label embedding with controllable capacity and a joint-space-dependent classification unit that is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models that do not leverage label semantics and previous joint input-label space models in both scenarios.

pdf bib
Autosegmental Input Strictly Local Functions
Jane Chandlee | Adam Jardine

Autosegmental representations (ARs ; Goldsmith, 1976) are claimed to enable local analyses of otherwise non-local phenomena Odden (1994). Focusing on the domain of tone, we investigate this ability of ARs using a computationally well-defined notion of locality extended from Chandlee (2014). The result is a more nuanced understanding of the way in which ARs interact with phonological locality.

pdf bib
SECTOR : A Neural Model for Coherent Topic Segmentation and ClassificationSECTOR: A Neural Model for Coherent Topic Segmentation and Classification
Sebastian Arnold | Rudolf Schneider | Philippe Cudré-Mauroux | Felix A. Gers | Alexander Löser

When searching for information, a human reader first glances over a document, spots relevant sections, and then focuses on a few sentences for resolving her intention. However, the high variance of document structure complicates the identification of the salient topic of a given section at a glance. To tackle this challenge, we present SECTOR, a model to support machine reading systems by segmenting documents into coherent sections and assigning topic labels to each section. Our deep neural network architecture learns a latent topic embedding over the course of a document. This can be leveraged to classify local topics from plain text and segment a document at topic shifts. In addition, we contribute WikiSection, a publicly available data set with 242k labeled sections in English and German from two distinct domains : diseases and cities. From our extensive evaluation of 20 architectures, we report a highest score of 71.6 % F1 for the segmentation and classification of 30 topics from the English city domain, scored by our SECTOR long short-term memory model with Bloom filter embeddings and bidirectional segmentation. This is a significant improvement of 29.5 points F1 over state-of-the-art CNN classifiers with baseline segmentation.

pdf bib
Complex Program Induction for Querying Knowledge Bases in the Absence of Gold Programs
Amrita Saha | Ghulam Ahmed Ansari | Abhishek Laddha | Karthik Sankaranarayanan | Soumen Chakrabarti

Recent years have seen increasingly complex question-answering on knowledge bases (KBQA) involving logical, quantitative, and comparative reasoning over KB subgraphs. Neural Program Induction (NPI) is a pragmatic approach toward modularizing the reasoning process by translating a complex natural language query into a multi-step executable program. While NPI has been commonly trained with the ‘‘gold’’ program or its sketch, for realistic KBQA applications such gold programs are expensive to obtain. There, practically only natural language queries and the corresponding answers can be provided for training. The resulting combinatorial explosion in program space, along with extremely sparse rewards, makes NPI for KBQA ambitious and challenging. We present Complex Imperative Program Induction from Terminal Rewards (CIPITR), an advanced neural programmer that mitigates reward sparsity with auxiliary rewards, and restricts the program space to semantically correct programs using high-level constraints, KB schema, and inferred answer type. CIPITR solves complex KBQA considerably more accurately than key-value memory networks and neural symbolic machines (NSM). For moderately complex queries requiring 2- to 5-step programs, CIPITR scores at least 3 higher F1 than the competing systems. On one of the hardest class of programs (comparative reasoning) with 510 steps, CIPITR outperforms NSM by a factor of 89 and memory networks by 9 times.

pdf bib
DREAM : A Challenge Data Set and Models for Dialogue-Based Reading ComprehensionDREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension
Kai Sun | Dian Yu | Jianshu Chen | Dong Yu | Yejin Choi | Claire Cardie

We present DREAM, the first dialogue-based multiple-choice reading comprehension data set. Collected from English as a Foreign Language examinations designed by human experts to evaluate the comprehension level of Chinese learners of English, our data set contains 10,197 multiple-choice questions for 6,444 dialogues. In contrast to existing reading comprehension data sets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding. DREAM is likely to present significant challenges for existing reading comprehension systems : 84 % of answers are non-extractive, 85 % of questions require reasoning beyond a single sentence, and 34 % of questions also involve commonsense knowledge. We apply several popular neural reading comprehension models that primarily exploit surface information within the text and find them to, at best, just barely outperform a rule-based approach. We next investigate the effects of incorporating dialogue structure and different kinds of general world knowledge into both rule-based and (neural and non-neural) machine learning-based reading comprehension models. Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge. DREAM is available at https://dataset.org/dream/.

pdf bib
Syntax-aware Semantic Role Labeling without Parsing
Rui Cai | Mirella Lapata

In this paper we focus on learning dependency aware representations for semantic role labeling without recourse to an external parser. The backbone of our model is an LSTM-based semantic role labeler jointly trained with two auxiliary tasks : predicting the dependency label of a word and whether there exists an arc linking it to the predicate. The auxiliary tasks provide syntactic information that is specific to semantic role labeling and are learned from training data (dependency annotations) without relying on existing dependency parsers, which can be noisy (e.g., on out-of-domain data or infrequent constructions). Experimental results on the CoNLL-2009 benchmark dataset show that our model outperforms the state of the art in English, and consistently improves performance in other languages, including Chinese, German, and Spanish.

pdf bib
No Word is an IslandA Transformation Weighting Model for Semantic CompositionIsland—A Transformation Weighting Model for Semantic Composition
Corina Dima | Daniël de Kok | Neele Witte | Erhard Hinrichs

Composition models of distributional semantics are used to construct phrase representations from the representations of their words. Composition models are typically situated on two ends of a spectrum. They either have a small number of parameters but compose all phrases in the same way, or they perform word-specific compositions at the cost of a far larger number of parameters. In this paper we propose transformation weighting (TransWeight), a composition model that consistently outperforms existing models on nominal compounds, adjective-noun phrases, and adverb-adjective phrases in English, German, and Dutch. TransWeight drastically reduces the number of parameters needed compared with the best model in the literature by composing similar words in the same way.