Dan Roth


2022

pdf bib
Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics
Daniel Deutsch | Dan Roth
Findings of the Association for Computational Linguistics: ACL 2022

Question answering-based summarization evaluation metrics must automatically determine whether the QA model’s prediction is correct or not, a task known as answer verification. In this work, we benchmark the lexical answer verification methods which have been used by current QA-based metrics as well as two more sophisticated text comparison methods, BERTScore and LERC. We find that LERC out-performs the other methods in some settings while remaining statistically indistinguishable from lexical overlap in others. However, our experiments reveal that improved verification performance does not necessarily translate to overall QA-based metric quality: In some scenarios, using a worse verification method — or using none at all — has comparable performance to using the best verification method, a result that we attribute to properties of the datasets.

2021

pdf bib
Coreference Reasoning in Machine Reading Comprehension
Mingzhu Wu | Nafise Sadat Moosavi | Dan Roth | Iryna Gurevych
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Coreference resolution is essential for natural language understanding and has been long studied in NLP. In recent years, as the format of Question Answering (QA) became a standard for machine reading comprehension (MRC), there have been data collection efforts, e.g., Dasigi et al. (2019), that attempt to evaluate the ability of MRC models to reason about coreference. However, as we show, coreference reasoning in MRC is a greater challenge than earlier thought ; MRC datasets do not reflect the natural distribution and, consequently, the challenges of coreference reasoning. Specifically, success on these datasets does not reflect a model’s proficiency in coreference reasoning. We propose a methodology for creating MRC datasets that better reflect the challenges of coreference reasoning and use it to create a sample evaluation set. The results on our dataset show that state-of-the-art models still struggle with these phenomena. Furthermore, we develop an effective way to use naturally occurring coreference phenomena from existing coreference resolution datasets when training MRC models. This allows us to show an improvement in the coreference reasoning abilities of state-of-the-art models.

pdf bib
Understanding the Extent to which Content Quality Metrics Measure the Information Quality of Summaries
Daniel Deutsch | Dan Roth
Proceedings of the 25th Conference on Computational Natural Language Learning

Reference-based metrics such as ROUGE or BERTScore evaluate the content quality of a summary by comparing the summary to a reference. Ideally, this comparison should measure the summary’s information quality by calculating how much information the summaries have in common. In this work, we analyze the token alignments used by ROUGE and BERTScore to compare summaries and argue that their scores largely can not be interpreted as measuring information overlap. Rather, they are better estimates of the extent to which the summaries discuss the same topics. Further, we provide evidence that this result holds true for many other summarization evaluation metrics. The consequence of this result is that the most frequently used summarization evaluation metrics do not align with the community’s research goal, to generate summaries with high-quality information. However, we conclude by demonstrating that a recently proposed metric, QAEval, which scores summaries using question-answering, appears to better capture information quality than current evaluations, highlighting a direction for future research.

pdf bib
Building Low-Resource NER Models Using Non-Speaker AnnotationsNER Models Using Non-Speaker Annotations
Tatiana Tsygankova | Francesca Marini | Stephen Mayhew | Dan Roth
Proceedings of the Second Workshop on Data Science with Human in the Loop: Language Advances

In low-resource natural language processing (NLP), the key problems are a lack of target language training data, and a lack of native speakers to create it. Cross-lingual methods have had notable success in addressing these concerns, but in certain common circumstances, such as insufficient pre-training corpora or languages far from the source language, their performance suffers. In this work we propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker (NS) annotations, provided by annotators with no prior experience in the target language. We recruit 30 participants in a carefully controlled annotation experiment with Indonesian, Russian, and Hindi. We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations, and have the potential to outperform with additional effort. We conclude with observations of common annotation patterns and recommended implementation practices, and motivate how NS annotations can be used in addition to prior methods for improved performance.

pdf bib
Cross-lingual Entity Alignment with Incidental Supervision
Muhao Chen | Weijia Shi | Ben Zhou | Dan Roth
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Much research effort has been put to multilingual knowledge graph (KG) embedding methods to address the entity alignment task, which seeks to match entities in different languagespecific KGs that refer to the same real-world object. Such methods are often hindered by the insufficiency of seed alignment provided between KGs. Therefore, we propose a new model, JEANS, which jointly represents multilingual KGs and text corpora in a shared embedding scheme, and seeks to improve entity alignment with incidental supervision signals from text. JEANS first deploys an entity grounding process to combine each KG with the monolingual text corpus. Then, two learning processes are conducted : (i) an embedding learning process to encode the KG and text of each language in one embedding space, and (ii) a self-learning based alignment learning process to iteratively induce the correspondence of entities and that of lexemes between embeddings. Experiments on benchmark datasets show that JEANS leads to promising improvement on entity alignment with incidental supervision, and significantly outperforms state-of-the-art methods that solely rely on internal information of KGs.

pdf bib
Paired Examples as Indirect Supervision in Latent Decision Models
Nitish Gupta | Sameer Singh | Matt Gardner | Dan Roth
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Compositional, structured models are appealing because they explicitly decompose problems and provide interpretable intermediate outputs that give confidence that the model is not simply latching onto data artifacts. Learning these models is challenging, however, because end-task supervision only provides a weak indirect signal on what values the latent decisions should take. This often results in the model failing to learn to perform the intermediate tasks correctly. In this work, we introduce a way to leverage paired examples that provide stronger cues for learning latent decisions. When two related training examples share internal substructure, we add an additional training objective to encourage consistency between their latent decisions. Such an objective does not require external supervision for the values of the latent output, or even the end task, yet provides an additional training signal to that provided by individual training examples themselves. We apply our method to improve compositional question answering using neural module networks on the DROP dataset. We explore three ways to acquire paired questions in DROP : (a) discovering naturally occurring paired examples within the dataset, (b) constructing paired examples using templates, and (c) generating paired examples using a question generation model. We empirically demonstrate that our proposed approach improves both in- and out-of-distribution generalization and leads to correct latent decision predictions.

pdf bib
Generalization in Instruction Following Systems
Soham Dan | Michael Zhou | Dan Roth
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Understanding and executing natural language instructions in a grounded domain is one of the hallmarks of artificial intelligence. In this paper, we focus on instruction understanding in the blocks world domain and investigate the language understanding abilities of two top-performing systems for the task. We aim to understand if the test performance of these models indicates an understanding of the spatial domain and of the natural language instructions relative to it, or whether they merely over-fit spurious signals in the dataset. We formulate a set of expectations one might have from an instruction following model and concretely characterize the different dimensions of robustness such a model should possess. Despite decent test performance, we find that state-of-the-art models fall short of these expectations and are extremely brittle. We then propose a learning strategy that involves data augmentation and show through extensive experiments that the proposed learning strategy yields models that are competitive on the original test set while satisfying our expectations much better.

pdf bib
RESIN : A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking SystemRESIN: A Dockerized Schema-Guided Cross-document Cross-lingual Cross-media Information Extraction and Event Tracking System
Haoyang Wen | Ying Lin | Tuan Lai | Xiaoman Pan | Sha Li | Xudong Lin | Ben Zhou | Manling Li | Haoyu Wang | Hongming Zhang | Xiaodong Yu | Alexander Dong | Zhenhailong Wang | Yi Fung | Piyush Mishra | Qing Lyu | Dídac Surís | Brian Chen | Susan Windisch Brown | Martha Palmer | Chris Callison-Burch | Carl Vondrick | Jiawei Han | Dan Roth | Shih-Fu Chang | Heng Ji
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

We present a new information extraction system that can automatically construct temporal event graphs from a collection of news documents from multiple sources, multiple languages (English and Spanish for our experiment), and multiple data modalities (speech, text, image and video). The system advances state-of-the-art from two aspects : (1) extending from sentence-level event extraction to cross-document cross-lingual cross-media event extraction, coreference resolution and temporal event tracking ; (2) using human curated event schema library to match and enhance the extraction output. We have made the dockerlized system publicly available for research purpose at GitHub, with a demo video.

2020

pdf bib
Joint Constrained Learning for Event-Event Relation Extraction
Haoyu Wang | Muhao Chen | Hongming Zhang | Dan Roth
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Understanding natural language involves recognizing how multiple event mentions structurally and temporally interact with each other. In this process, one can induce event complexes that organize multi-granular events with temporal order and membership relations interweaving among them. Due to the lack of jointly labeled data for these relational phenomena and the restriction on the structures they articulate, we propose a joint constrained learning framework for modeling event-event relations. Specifically, the framework enforces logical constraints within and across multiple temporal and subevent relations of events by converting these constraints into differentiable learning objectives. We show that our joint constrained learning approach effectively compensates for the lack of jointly labeled data, and outperforms SOTA methods on benchmarks for both temporal relation extraction and event hierarchy construction, replacing a commonly used but more expensive global inference process. We also present a promising case study to show the effectiveness of our approach to inducing event complexes on an external corpus.

pdf bib
Analogous Process Structure Induction for Sub-event Sequence Prediction
Hongming Zhang | Muhao Chen | Haoyu Wang | Yangqiu Song | Dan Roth
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Computational and cognitive studies of event understanding suggest that identifying, comprehending, and predicting events depend on having structured representations of a sequence of events and on conceptualizing (abstracting) its components into (soft) event categories. Thus, knowledge about a known process such as buying a car can be used in the context of a new but analogous process such as buying a house. Nevertheless, most event understanding work in NLP is still at the ground level and does not consider abstraction. In this paper, we propose an Analogous Process Structure Induction (APSI) framework, which leverages analogies among processes and conceptualization of sub-event instances to predict the whole sub-event sequence of previously unseen open-domain processes. As our experiments and analysis indicate, APSI supports the generation of meaningful sub-event sequences for unseen processes and can help predict missing events.

pdf bib
Design Challenges in Low-resource Cross-lingual Entity Linking
Xingyu Fu | Weijia Shi | Xiaodong Yu | Zian Zhao | Dan Roth
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Cross-lingual Entity Linking (XEL), the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia, has seen a lot of research in recent years, with a range of promising techniques. However, current techniques do not rise to the challenges introduced by text in low-resource languages (LRL) and, surprisingly, fail to generalize to text not taken from Wikipedia, on which they are usually trained. This paper provides a thorough analysis of low-resource XEL techniques, focusing on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention. Our analysis indicates that current methods are limited by their reliance on Wikipedia’s interlanguage links and thus suffer when the foreign language’s Wikipedia is small. We conclude that the LRL setting requires the use of outside-Wikipedia cross-lingual resources and present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs. With experiments on 25 languages, QuEL shows an average increase of 25 % in gold candidate recall and of 13 % in end-to-end linking accuracy over state-of-the-art baselines.

pdf bib
SacreROUGE : An Open-Source Library for Using and Developing Summarization Evaluation MetricsSacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics
Daniel Deutsch | Dan Roth
Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)

We present SacreROUGE, an open-source library for using and developing summarization evaluation metrics. SacreROUGE removes many obstacles that researchers face when using or developing metrics : (1) The library provides Python wrappers around the official implementations of existing evaluation metrics so they share a common, easy-to-use interface ; (2) it provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments, so no additional code needs to be written for a new evaluation metric ; and (3) it includes scripts for loading datasets that contain human judgments so they can easily be used for evaluation. This work describes the design of the library, including the core Metric interface, the command-line API for evaluating summarization models and metrics, and the scripts to load and reformat publicly available datasets. The development of SacreROUGE is ongoing and open to contributions from the community.

pdf bib
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior
Zi Lin | Jeremiah Liu | Zi Yang | Nan Hua | Dan Roth
Findings of the Association for Computational Linguistics: EMNLP 2020

Traditional (unstructured) pruning methods for a Transformer model focus on regularizing the individual weights by penalizing them toward zero. In this work, we explore spectral-normalized identity priors (SNIP), a structured pruning approach which penalizes an entire residual module in a Transformer model toward an identity mapping. Our method identifies and discards unimportant non-linear mappings in the residual connections by applying a thresholding operator on the function norm, and is applicable to any structured module including a single attention head, an entire attention blocks, or a feed-forward subnetwork. Furthermore, we introduce spectral normalization to stabilize the distribution of the post-activation values of the Transformer layers, further improving the pruning effectiveness of the proposed methodology. We conduct experiments with BERT on 5 GLUE benchmark tasks to demonstrate that SNIP achieves effective pruning results while maintaining comparable performance. Specifically, we improve the performance over the state-of-the-art by 0.5 to 1.0 % on average at 50 % compression ratio.

pdf bib
Is Killed More Significant than Fled? A Contextual Model for Salient Event Detection
Disha Jindal | Daniel Deutsch | Dan Roth
Proceedings of the 28th International Conference on Computational Linguistics

Identifying the key events in a document is critical to holistically understanding its important information. Although measuring the salience of events is highly contextual, most previous work has used a limited representation of events that omits essential information. In this work, we propose a highly contextual model of event salience that uses a rich representation of events, incorporates document-level information and allows for interactions between latent event encodings. Our experimental results on an event salience dataset demonstrate that our model improves over previous work by an absolute 2-4 % on standard metrics, establishing a new state-of-the-art performance for the task. We also propose a new evaluation metric that addresses flaws in previous evaluation methodologies. Finally, we discuss the importance of salient event detection for the downstream task of summarization.

2019

pdf bib
Going on a vacation takes longer than Going for a walk : A Study of Temporal Commonsense Understanding
Ben Zhou | Daniel Khashabi | Qiang Ning | Dan Roth
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Understanding time is crucial for understanding events expressed in natural language. Because people rarely say the obvious, it is often necessary to have commonsense knowledge about various temporal aspects of events, such as duration, frequency, and temporal order. However, this important problem has so far received limited attention. This paper systematically studies this temporal commonsense problem. Specifically, we define five classes of temporal commonsense, and use crowdsourcing to develop a new dataset, MCTACO, that serves as a test set for this task. We find that the best current methods used on MCTACO are still far behind human performance, by about 20 %, and discuss several directions for improvement. We hope that the new dataset and our study here can foster more future research on this topic.

pdf bib
Summary Cloze : A New Task for Content Selection in Topic-Focused Summarization
Daniel Deutsch | Dan Roth
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

A key challenge in topic-focused summarization is determining what information should be included in the summary, a problem known as content selection. In this work, we propose a new method for studying content selection in topic-focused summarization called the summary cloze task. The goal of the summary cloze task is to generate the next sentence of a summary conditioned on the beginning of the summary, a topic, and a reference document(s). The main challenge is deciding what information in the references is relevant to the topic and partial summary and should be included in the summary. Although the cloze task does not address all aspects of the traditional summarization problem, the more narrow scope of the task allows us to collect a large-scale datset of nearly 500k summary cloze instances from Wikipedia. We report experimental results on this new dataset using various extractive models and a two-step abstractive model that first extractively selects a small number of sentences and then abstractively summarizes them. Our results show that the topic and partial summary help the models identify relevant content, but the task remains a significant challenge.

pdf bib
Benchmarking Zero-shot Text Classification : Datasets, Evaluation and Entailment Approach
Wenpeng Yin | Jamaal Hay | Dan Roth
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Zero-shot text classification (0Shot-TC) is a challenging NLU problem to which little attention has been paid by the research community. 0Shot-TC aims to associate an appropriate label with a piece of text, irrespective of the text domain and the aspect (e.g., topic, emotion, event, etc.) described by the label. And there are only a few articles studying 0Shot-TC, all focusing only on topical categorization which, we argue, is just the tip of the iceberg in 0Shot-TC. In addition, the chaotic experiments in literature make no uniform comparison, which blurs the progress. This work benchmarks the 0Shot-TC problem by providing unified datasets, standardized evaluations, and state-of-the-art baselines. Our contributions include : i) The datasets we provide facilitate studying 0Shot-TC relative to conceptually different and diverse aspects : the topic aspect includes sports and politics as labels ; the emotion aspect includes joy and anger ; the situation aspect includes medical assistance and water shortage. ii) We extend the existing evaluation setup (label-partially-unseen) given a dataset, train on some labels, test on all labels to include a more challenging yet realistic evaluation label-fully-unseen 0Shot-TC (Chang et al., 2008), aiming at classifying text snippets without seeing task specific training data at all. iii) We unify the 0Shot-TC of diverse aspects within a textual entailment formulation and study it this way.

pdf bib
An Improved Neural Baseline for Temporal Relation Extraction
Qiang Ning | Sanjay Subramanian | Dan Roth
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Determining temporal relations (e.g., before or after) between events has been a challenging natural language understanding task, partly due to the difficulty to generate large amounts of high-quality training data. Consequently, neural approaches have not been widely used on it, or showed only moderate improvements. This paper proposes a new neural system that achieves about 10 % absolute improvement in accuracy over the previous best system (25 % error reduction) on two benchmark datasets. The proposed system is trained on the state-of-the-art MATRES dataset and applies contextualized word embeddings, a Siamese encoder of a temporal common sense knowledge base, and global inference via integer linear programming (ILP). We suggest that the new approach could serve as a strong baseline for future research in this area.

pdf bib
ner and pos when nothing is capitalizedner and pos when nothing is capitalized
Stephen Mayhew | Tatiana Tsygankova | Dan Roth
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

For those languages which use it, capitalization is an important signal for the fundamental NLP tasks of Named Entity Recognition (NER) and Part of Speech (POS) tagging. In fact, it is such a strong signal that model performance on these tasks drops sharply in common lowercased scenarios, such as noisy web text or machine translation outputs. In this work, we perform a systematic analysis of solutions to this problem, modifying only the casing of the train or test data using lowercasing and truecasing methods. While prior work and first impressions might suggest training a caseless model, or using a truecaser at test time, we show that the most effective strategy is a concatenation of cased and lowercased training data, producing a single model with high performance on both cased and uncased text. As shown in our experiments, this result holds across tasks and input representations. Finally, we show that our proposed solution gives an 8 % F1 improvement in mention detection on noisy out-of-domain Twitter data.

pdf bib
Improving Generalization in Coreference Resolution via Adversarial Training
Sanjay Subramanian | Dan Roth
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

In order for coreference resolution systems to be useful in practice, they must be able to generalize to new text. In this work, we demonstrate that the performance of the state-of-the-art system decreases when the names of PER and GPE named entities in the CoNLL dataset are changed to names that do not occur in the training set. We use the technique of adversarial gradient-based training to retrain the state-of-the-art system and demonstrate that the retrained system achieves higher performance on the CoNLL dataset (both with and without the change of named entities) and the GAP dataset.

pdf bib
Seeing Things from a Different Angle : Discovering Diverse Perspectives about Claims
Sihao Chen | Daniel Khashabi | Wenpeng Yin | Chris Callison-Burch | Dan Roth
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

One key consequence of the information revolution is a significant increase and a contamination of our information supply. The practice of fact checking wo n’t suffice to eliminate the biases in text data we observe, as the degree of factuality alone does not determine whether biases exist in the spectrum of opinions visible to us. To better understand controversial issues, one needs to view them from a diverse yet comprehensive set of perspectives. For example, there are many ways to respond to a claim such as animals should have lawful rights, and these responses form a spectrum of perspectives, each with a stance relative to this claim and, ideally, with evidence supporting it. Inherently, this is a natural language understanding task, and we propose to address it as such. Specifically, we propose the task of substantiated perspective discovery where, given a claim, a system is expected to discover a diverse set of well-corroborated perspectives that take a stance with respect to the claim. Each perspective should be substantiated by evidence paragraphs which summarize pertinent results and facts. We construct PERSPECTRUM, a dataset of claims, perspectives and evidence, making use of online debate websites to create the initial data collection, and augmenting it using search engines in order to expand and diversify our dataset. We use crowd-sourcing to filter out noise and ensure high-quality data. Our dataset contains 1k claims, accompanied with pools of 10k and 8k perspective sentences and evidence paragraphs, respectively.

pdf bib
Partial Or Complete, That’s The Question
Qiang Ning | Hangfeng He | Chuchu Fan | Dan Roth
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

For many structured learning tasks, the data annotation process is complex and costly. Existing annotation schemes usually aim at acquiring completely annotated structures, under the common perception that partial structures are of low quality and could hurt the learning process. This paper questions this common perception, motivated by the fact that structures consist of interdependent sets of variables. Thus, given a fixed budget, partly annotating each structure may provide the same level of supervision, while allowing for more structures to be annotated. We provide an information theoretic formulation for this perspective and use it, in the context of three diverse structured learning tasks, to show that learning from partial structures can sometimes outperform learning from complete ones. Our findings may provide important insights into structured data annotation schemes and could support progress in learning protocols for structured tasks.

pdf bib
PerspectroScope : A Window to the World of Diverse PerspectivesPerspectroScope: A Window to the World of Diverse Perspectives
Sihao Chen | Daniel Khashabi | Chris Callison-Burch | Dan Roth
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

This work presents PerspectroScope, a web-based system which lets users query a discussion-worthy natural language claim, and extract and visualize various perspectives in support or against the claim, along with evidence supporting each perspective. The system thus lets users explore various perspectives that could touch upon aspects of the issue at hand. The system is built as a combination of retrieval engines and learned textual-entailment-like classifiers built using a few recent developments in natural language understanding. To make the system more adaptive, expand its coverage, and improve its decisions over time, our platform employs various mechanisms to get corrections from the users. PerspectroScope is available at github.com/CogComp/perspectroscope Web demo link : http://orwell.seas.upenn.edu:4002/ Link to demo video : https://www.youtube.com/watch?v=MXBTR1Sp3Bs

pdf bib
KnowSemLM : A Knowledge Infused Semantic Language ModelKnowSemLM: A Knowledge Infused Semantic Language Model
Haoruo Peng | Qiang Ning | Dan Roth
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Story understanding requires developing expectations of what events come next in text. Prior knowledge both statistical and declarative is essential in guiding such expectations. While existing semantic language models (SemLM) capture event co-occurrence information by modeling event sequences as semantic frames, entities, and other semantic units, this paper aims at augmenting them with causal knowledge (i.e., one event is likely to lead to another). Such knowledge is modeled at the frame and entity level, and can be obtained either statistically from text or stated declaratively. The proposed method, KnowSemLM, infuses this knowledge into a semantic LM by joint training and inference, and is shown to be effective on both the event cloze test and story / referent prediction tasks.

pdf bib
Named Entity Recognition with Partially Annotated Training Data
Stephen Mayhew | Snigdha Chaturvedi | Chen-Tse Tsai | Dan Roth
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Supervised machine learning assumes the availability of fully-labeled data, but in many cases, such as low-resource languages, the only data available is partially annotated. We study the problem of Named Entity Recognition (NER) with partially annotated training data in which a fraction of the named entities are labeled, and all other tokens, entities or otherwise, are labeled as non-entity by default. In order to train on this noisy dataset, we need to distinguish between the true and false negatives. To this end, we introduce a constraint-driven iterative algorithm that learns to detect false negatives in the noisy set and downweigh them, resulting in a weighted training set. With this set, we train a weighted NER model. We evaluate our algorithm with weighted variants of neural and non-neural NER models on data in 8 languages from several language and script families, showing strong ability to learn from partial data. Finally, to show real-world efficacy, we evaluate on a Bengali NER corpus annotated by non-speakers, outperforming the prior state-of-the-art by over 5 points F1.

2018

pdf bib
Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages
Shyam Upadhyay | Jordan Kodner | Dan Roth
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Generating the English transliteration of a name written in a foreign script is an important and challenging step in multilingual knowledge acquisition and information extraction. Existing approaches to transliteration generation require a large (5000) number of training examples. This difficulty contrasts with transliteration discovery, a somewhat easier task that involves picking a plausible transliteration from a given list. In this work, we present a bootstrapping algorithm that uses constrained discovery to improve generation, and can be used with as few as 500 training examples, which we show can be sourced from annotators in a matter of hours. This opens the task to languages for which large number of training examples are unavailable. We evaluate transliteration generation performance itself, as well the improvement it brings to cross-lingual candidate generation for entity linking, a typical downstream task. We present a comprehensive evaluation of our approach on nine languages, each written in a unique script.

pdf bib
CogCompTime : A Tool for Understanding Time in Natural LanguageCogCompTime: A Tool for Understanding Time in Natural Language
Qiang Ning | Ben Zhou | Zhili Feng | Haoruo Peng | Dan Roth
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Automatic extraction of temporal information is important for natural language understanding. It involves two basic tasks : (1) Understanding time expressions that are mentioned explicitly in text (e.g., February 27, 1998 or tomorrow), and (2) Understanding temporal information that is conveyed implicitly via relations. This paper introduces CogCompTime, a system that has these two important functionalities. It incorporates the most recent progress, achieves state-of-the-art performance, and is publicly available at.http://cogcomp.org/page/publication_view/844.

pdf bib
Mapping to Declarative Knowledge for Word Problem Solving
Subhro Roy | Dan Roth
Transactions of the Association for Computational Linguistics, Volume 6

Math word problems form a natural abstraction to a range of quantitative reasoning problems, such as understanding financial news, sports results, and casualties of war. Solving such problems requires the understanding of several mathematical concepts such as dimensional analysis, subset relationships, etc. In this paper, we develop declarative rules which govern the translation of natural language description of these concepts to math expressions. We then present a framework for incorporating such declarative knowledge into word problem solving. Our method learns to map arithmetic word problem text to math expressions, by learning to select the relevant declarative knowledge for each operation of the solution expression. This provides a way to handle multiple concepts in the same problem while, at the same time, supporting interpretability of the answer expression. Our method models the mapping to declarative knowledge as a latent variable, thus removing the need for expensive annotations. Experimental evaluation suggests that our domain knowledge based solver outperforms all other systems, and that it generalizes better in the realistic case where the training data it is exposed to is biased in a different way than the test data.

pdf bib
Term Definitions Help Hypernymy Detection
Wenpeng Yin | Dan Roth
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Existing methods of hypernymy detection mainly rely on statistics over a big corpus, either mining some co-occurring patterns like animals such as cats or embedding words of interest into context-aware vectors. These approaches are therefore limited by the availability of a large enough corpus that can cover all terms of interest and provide sufficient contextual information to represent their meaning. In this work, we propose a new paradigm, HyperDef, for hypernymy detection expressing word meaning by encoding word definitions, along with context driven representation. This has two main benefits : (i) Definitional sentences express (sense-specific) corpus-independent meanings of words, hence definition-driven approaches enable strong generalization once trained, the model is expected to work well in open-domain testbeds ; (ii) Global context from a large corpus and definitions provide complementary information for words. Consequently, our model, HyperDef, once trained on task-agnostic data, gets state-of-the-art results in multiple benchmarks

pdf bib
Looking Beyond the Surface : A Challenge Set for Reading Comprehension over Multiple Sentences
Daniel Khashabi | Snigdha Chaturvedi | Michael Roth | Shyam Upadhyay | Dan Roth
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We present a reading comprehension challenge in which questions can only be answered by taking into account information from multiple sentences. We solicit and verify questions and answers for this challenge through a 4-step crowdsourcing experiment. Our challenge dataset contains 6,500 + questions for 1000 + paragraphs across 7 different domains (elementary school science, news, travel guides, fiction stories, etc) bringing in linguistic diversity to the texts and to the questions wordings. On a subset of our dataset, we found human solvers to achieve an F1-score of 88.1 %. We analyze a range of baselines, including a recent state-of-art reading comprehension system, and demonstrate the difficulty of this challenge, despite a high human performance. The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills.

pdf bib
Improving Temporal Relation Extraction with a Globally Acquired Statistical Resource
Qiang Ning | Hao Wu | Haoruo Peng | Dan Roth
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Extracting temporal relations (before, after, overlapping, etc.) is a key aspect of understanding events described in natural language. We argue that this task would gain from the availability of a resource that provides prior knowledge in the form of the temporal order that events usually follow. This paper develops such a resource a probabilistic knowledge base acquired in the news domain by extracting temporal relations between events from the New York Times (NYT) articles over a 20-year span (19872007). We show that existing temporal extraction systems can be improved via this resource. As a byproduct, we also show that interesting statistics can be retrieved from this resource, which can potentially benefit other time-aware tasks. The proposed system and resource are both publicly available.

pdf bib
Where Have I Heard This Story Before? Identifying Narrative Similarity in Movie RemakesI Heard This Story Before? Identifying Narrative Similarity in Movie Remakes
Snigdha Chaturvedi | Shashank Srivastava | Dan Roth
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

People can identify correspondences between narratives in everyday life. For example, an analogy with the Cinderella story may be made in describing the unexpected success of an underdog in seemingly different stories. We present a new task and dataset for story understanding : identifying instances of similar narratives from a collection of narrative texts. We present an initial approach for this problem, which finds correspondences between narratives in terms of plot events, and resemblances between characters and their social relationships. Our approach yields an 8 % absolute improvement in performance over a competitive information-retrieval baseline on a novel dataset of plot summaries of 577 movie remakes from Wikipedia.

pdf bib
A Multi-Axis Annotation Scheme for Event Temporal Relations
Qiang Ning | Hao Wu | Dan Roth
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Existing temporal relation (TempRel) annotation schemes often have low inter-annotator agreements (IAA) even between experts, suggesting that the current annotation task needs a better definition. This paper proposes a new multi-axis modeling to better capture the temporal structure of events. In addition, we identify that event end-points are a major source of confusion in annotation, so we also propose to annotate TempRels based on start-points only. A pilot expert annotation effort using the proposed scheme shows significant improvement in IAA from the conventional 60’s to 80’s (Cohen’s Kappa). This better-defined annotation scheme further enables the use of crowdsourcing to alleviate the labor intensity for each annotator. We hope that this work can foster more interesting studies towards event understanding.

pdf bib
A Distributional and Orthographic Aggregation Model for English Derivational MorphologyEnglish Derivational Morphology
Daniel Deutsch | John Hewitt | Dan Roth
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Modeling derivational morphology to generate words with particular semantics is useful in many text generation tasks, such as machine translation or abstractive question answering. In this work, we tackle the task of derived word generation. That is, we attempt to generate the word runner for someone who runs. We identify two key problems in generating derived words from root words and transformations. We contribute a novel aggregation model of derived word generation that learns derivational transformations both as orthographic functions using sequence-to-sequence models and as functions in distributional word embedding space. The model then learns to choose between the hypothesis of each system. We also present two ways of incorporating corpus information into derived word generation.

pdf bib
Joint Reasoning for Temporal and Causal Relations
Qiang Ning | Zhili Feng | Hao Wu | Dan Roth
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Understanding temporal and causal relations between events is a fundamental natural language understanding task. Because a cause must occur earlier than its effect, temporal and causal relations are closely related and one relation often dictates the value of the other. However, limited attention has been paid to studying these two relations jointly. This paper presents a joint inference framework for them using constrained conditional models (CCMs). Specifically, we formulate the joint problem as an integer linear programming (ILP) problem, enforcing constraints that are inherent in the nature of time and causality. We show that the joint inference framework results in statistically significant improvement in the extraction of both temporal and causal relations from text.

pdf bib
End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions
Wenpeng Yin | Dan Roth | Hinrich Schütze
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This work deals with SciTail, a natural entailment challenge derived from a multi-choice question answering problem. The premises and hypotheses in SciTail were generated with no awareness of each other, and did not specifically aim at the entailment task. This makes it more challenging than other entailment data sets and more directly useful to the end-task question answering. We propose DEISTE (deep explorations of inter-sentence interactions for textual entailment) for this entailment task. Given word-to-word interactions between the premise-hypothesis pair (P, H), DEISTE consists of : (i) a parameter-dynamic convolution to make important words in P and H play a dominant role in learnt representations ; and (ii) a position-aware attentive convolution to encode the representation and position information of the aligned word pairs. Experiments show that DEISTE gets 5 % improvement over prior state of the art and that the pretrained DEISTE on SciTail generalizes well on RTE-5.

2017

pdf bib
Learning What is Essential in Questions
Daniel Khashabi | Tushar Khot | Ashish Sabharwal | Dan Roth
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Question answering (QA) systems are easily distracted by irrelevant or redundant words in questions, especially when faced with long or multi-sentence questions in difficult domains. This paper introduces and studies the notion of essential question terms with the goal of improving such QA solvers. We illustrate the importance of essential question terms by showing that humans’ ability to answer questions drops significantly when essential terms are eliminated from questions. We then develop a classifier that reliably (90 % mean average precision) identifies and ranks essential terms in questions. Finally, we use the classifier to demonstrate that the notion of question term essentiality allows state-of-the-art QA solver for elementary-level science questions to make better and more informed decisions, improving performance by up to 5%.We also introduce a new dataset of over 2,200 crowd-sourced essential terms annotated science questions.

pdf bib
A Consolidated Open Knowledge Representation for Multiple Texts
Rachel Wities | Vered Shwartz | Gabriel Stanovsky | Meni Adler | Ori Shapira | Shyam Upadhyay | Dan Roth | Eugenio Martinez Camara | Iryna Gurevych | Ido Dagan
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

We propose to move from Open Information Extraction (OIE) ahead to Open Knowledge Representation (OKR), aiming to represent information conveyed jointly in a set of texts in an open text-based manner. We do so by consolidating OIE extractions using entity and predicate coreference, while modeling information containment between coreferring elements via lexical entailment. We suggest that generating OKR structures can be a useful step in the NLP pipeline, to give semantic applications an easy handle on consolidated information across multiple texts.

pdf bib
Adapting to Learner Errors with Minimal Supervision
Alla Rozovskaya | Dan Roth | Mark Sammons
Computational Linguistics, Volume 43, Issue 4 - December 2017

This article considers the problem of correcting errors made by English as a Second Language writers from a machine learning perspective, and addresses an important issue of developing an appropriate training paradigm for the task, one that accounts for error patterns of non-native writers using minimal supervision. Existing training approaches present a trade-off between large amounts of cheap data offered by the native-trained models and additional knowledge of learner error patterns provided by the more expensive method of training on annotated learner data. We propose a novel training approach that draws on the strengths offered by the two standard training paradigmsof training either on native or on annotated learner dataand that outperforms both of these standard methods. Using the key observation that parameters relating to error regularities exhibited by non-native writers are relatively simple, we develop models that can incorporate knowledge about error regularities based on a small annotated sample but that are otherwise trained on native English data. The key contribution of this article is the introduction and analysis of two methods for adapting the learned models to error patterns of non-native writers ; one method that applies to generative classifiers and a second that applies to discriminative classifiers. Both methods demonstrated state-of-the-art performance in several text correction competitions. In particular, the Illinois system that implements these methods ranked at the top in two recent CoNLL shared tasks on error correction.1 We conduct further evaluation of the proposed approaches studying the effect of using error data from speakers of the same native language, languages that are closely related linguistically, and unrelated languages.

pdf bib
A Structured Learning Approach to Temporal Relation Extraction
Qiang Ning | Zhili Feng | Dan Roth
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Identifying temporal relations between events is an essential step towards natural language understanding. However, the temporal relation between two events in a story depends on, and is often dictated by, relations among other events. Consequently, effectively identifying temporal relations between events is a challenging problem even for human annotators. This paper suggests that it is important to take these dependencies into account while learning to identify these relations and proposes a structured learning approach to address this challenge. As a byproduct, this provides a new perspective on handling missing relations, a known issue that hurts existing methods. As we show, the proposed approach results in significant improvements on the two commonly used data sets for this problem.

pdf bib
Story Comprehension for Predicting What Happens Next
Snigdha Chaturvedi | Haoruo Peng | Dan Roth
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Automatic story comprehension is a fundamental challenge in Natural Language Understanding, and can enable computers to learn about social norms, human behavior and commonsense. In this paper, we present a story comprehension model that explores three distinct semantic aspects : (i) the sequence of events described in the story, (ii) its emotional trajectory, and (iii) its plot consistency. We judge the model’s understanding of real-world stories by inquiring if, like humans, it can develop an expectation of what will happen next in a given story. Specifically, we use it to predict the correct ending of a given short story from possible alternatives. The model uses a hidden variable to weigh the semantic aspects in the context of the story. Our experiments demonstrate the potential of our approach to characterize these semantic aspects, and the strength of the hidden variable based approach. The model outperforms the state-of-the-art approaches and achieves best results on a publicly available dataset.

pdf bib
Cheap Translation for Cross-Lingual Named Entity Recognition
Stephen Mayhew | Chen-Tse Tsai | Dan Roth
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Recent work in NLP has attempted to deal with low-resource languages but still assumed a resource level that is not present for most languages, e.g., the availability of Wikipedia in the target language. We propose a simple method for cross-lingual named entity recognition (NER) that works well in settings with very minimal resources. Our approach makes use of a lexicon to translate annotated data available in one or several high resource language(s) into the target language, and learns a standard monolingual NER model there. Further, when Wikipedia is available in the target language, our method can enhance Wikipedia based methods to yield state-of-the-art NER results ; we evaluate on 7 diverse languages, improving the state-of-the-art by an average of 5.5 % F1 points. With the minimal resources required, this is an extremely portable cross-lingual NER approach, as illustrated using a truly low-resource language, Uyghur.very minimal resources. Our approach makes use of a lexicon to “translate” annotated data available in one or several high resource language(s) into the target language, and learns a standard monolingual NER model there. Further, when Wikipedia is available in the target language, our method can enhance Wikipedia based methods to yield state-of-the-art NER results; we evaluate on 7 diverse languages, improving the state-of-the-art by an average of 5.5% F1 points. With the minimal resources required, this is an extremely portable cross-lingual NER approach, as illustrated using a truly low-resource language, Uyghur.

pdf bib
Entity Linking via Joint Encoding of Types, Descriptions, and Context
Nitish Gupta | Sameer Singh | Dan Roth
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

For accurate entity linking, we need to capture various information aspects of an entity, such as its description in a KB, contexts in which it is mentioned, and structured knowledge. Additionally, a linking system should work on texts from different domains without requiring domain-specific training data or hand-engineered features. In this work we present a neural, modular entity linking system that learns a unified dense representation for each entity using multiple sources of information, such as its description, contexts around its mentions, and its fine-grained types. We show that the resulting entity linking system is effective at combining these sources, and performs competitively, sometimes out-performing current state-of-the-art systems across datasets, without requiring any domain-specific training data or hand-engineered features. We also show that our model can effectively embed entities that are new to the KB, and is able to link its mentions accurately.

pdf bib
Integer Linear Programming formulations in Natural Language Processing
Dan Roth | Vivek Srikumar
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts

Making decisions in natural language processing problems often involves assigning values to sets of interdependent variables where the expressive dependency structure can influence, or even dictate what assignments are possible. This setting includes a broad range of structured prediction problems such as semantic role labeling, named entity and relation recognition, co-reference resolution, dependency parsing and semantic parsing. The setting is also appropriate for cases that may require making global decisions that involve multiple components, possibly pre-designed or pre-learned, as in event recognition and analysis, summarization, paraphrasing, textual entailment and question answering. In all these cases, it is natural to formulate the decision problem as a constrained optimization problem, with an objective function that is composed of learned models, subject to domain or problem specific constraints. Over the last few years, starting with a couple of papers written by (Roth & Yih, 2004, 2005), dozens of papers have been using the Integer linear programming (ILP) formulation developed there, including several award-winning papers (e.g., (Martins, Smith, & Xing, 2009 ; Koo, Rush, Collins, Jaakkola, & Sontag., 2010 ; Berant, Dagan, & Goldberger, 2011)).This tutorial will present the key ingredients of ILP formulations of natural language processing problems, aiming at guiding readers through the key modeling steps, explaining the learning and inference paradigms and exemplifying these by providing examples from the literature. We will cover a range of topics, from the theoretical foundations of learning and inference with ILP models, to practical modeling guides, to software packages and applications. The goal of this tutorial is to introduce the computational framework to broader ACL community, motivate it as a generic framework for learning and inference in global NLP decision problems, present some of the key theoretical and practical issues involved and survey some of the existing applications of it as a way to promote further development of the framework and additional applications. We will also make connections with some of the hot topics in current NLP research and show how they can be used within the general framework proposed here. The tutorial will thus be useful for many of the senior and junior researchers that have interest in global decision problems in NLP, providing a concise overview of recent perspectives and research results.