Tushar Khot


2020

pdf bib
More Bang for Your Buck : Natural Perturbation for Robust Question Answering
Daniel Khashabi | Tushar Khot | Ashish Sabharwal
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Deep learning models for linguistic tasks require large training datasets, which are expensive to create. As an alternative to the traditional approach of creating new instances by repeating the process of creating one instance, we propose doing so by first collecting a set of seed examples and then applying human-driven natural perturbations (as opposed to rule-based machine perturbations), which often change the gold label as well. Such perturbations have the advantage of being relatively easier (and hence cheaper) to create than writing out completely new examples. Further, they help address the issue that even models achieving human-level scores on NLP datasets are known to be considerably sensitive to small changes in input. To evaluate the idea, we consider a recent question-answering dataset (BOOLQ) and study our approach as a function of the perturbation cost ratio, the relative cost of perturbing an existing question vs. creating a new one from scratch. We find that when natural perturbations are moderately cheaper to create (cost ratio under 60 %), it is more effective to use them for training BOOLQ models : such models exhibit 9 % higher robustness and 4.5 % stronger generalization, while retaining performance on the original BOOLQ dataset.

pdf bib
IIRC : A Dataset of Incomplete Information Reading Comprehension QuestionsIIRC: A Dataset of Incomplete Information Reading Comprehension Questions
James Ferguson | Matt Gardner | Hannaneh Hajishirzi | Tushar Khot | Pradeep Dasigi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system’s performance at identifying a potential lack of sufficient information and locating sources for that information. To fill this gap, we present a dataset, IIRC, with more than 13 K questions over paragraphs from English Wikipedia that provide only partial information to answer them, with the missing information occurring in one or more linked documents. The questions were written by crowd workers who did not have access to any of the linked documents, leading to questions that have little lexical overlap with the contexts where the answers appear. This process also gave many questions without answers, and those that require discrete reasoning, increasing the difficulty of the task. We follow recent modeling work on various reading comprehension datasets to construct a baseline model for this dataset, finding that it achieves 31.1 % F1 on this task, while estimated human performance is 88.4 %. The dataset, code for the baseline system, and a leaderboard can be found at https://allennlp.org/iirc.

pdf bib
UNIFIEDQA : Crossing Format Boundaries with a Single QA SystemUNIFIEDQA: Crossing Format Boundaries with a Single QA System
Daniel Khashabi | Sewon Min | Tushar Khot | Ashish Sabharwal | Oyvind Tafjord | Peter Clark | Hannaneh Hajishirzi
Findings of the Association for Computational Linguistics: EMNLP 2020

Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit division in the QA community. We argue that such boundaries are artificial and perhaps unnecessary, given the reasoning abilities we seek to teach are not governed by the format. As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UNIFIEDQA, that performs well across 19 QA datasets spanning 4 diverse formats. UNIFIEDQA performs on par with 8 different models that were trained on individual datasets themselves. Even when faced with 12 unseen datasets of observed formats, UNIFIEDQA performs surprisingly well, showing strong generalization from its outof-format training data. Finally, simply finetuning this pre trained QA model into specialized models results in a new state of the art on 10 factoid and commonsense question answering datasets, establishing UNIFIEDQA as a strong starting point for building QA systems.

2019

pdf bib
Exploiting Explicit Paths for Multi-hop Reading Comprehension
Souvik Kundu | Tushar Khot | Ashish Sabharwal | Peter Clark
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We propose a novel, path-based reasoning approach for the multi-hop reading comprehension task where a system needs to combine facts from multiple passages to answer a question. Although inspired by multi-hop reasoning over knowledge graphs, our proposed approach operates directly over unstructured text. It generates potential paths through passages and scores them without any direct path supervision. The proposed model, named PathNet, attempts to extract implicit relations from text through entity pair representations, and compose them to encode each path. To capture additional context, PathNet also composes the passage representations along each path to compute a passage-based representation. Unlike previous approaches, our model is then able to explain its reasoning via these explicit paths through the passages. We show that our approach outperforms prior models on the multi-hop Wikihop dataset, and also can be generalized to apply to the OpenBookQA dataset, matching state-of-the-art performance.

2018

pdf bib
Bridging Knowledge Gaps in Neural Entailment via Symbolic Models
Dongyeop Kang | Tushar Khot | Ashish Sabharwal | Peter Clark
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Most textual entailment models focus on lexical gaps between the premise text and the hypothesis, but rarely on knowledge gaps. We focus on filling these knowledge gaps in the Science Entailment task, by leveraging an external structured knowledge base (KB) of science facts. Our new architecture combines standard neural entailment models with a knowledge lookup module. To facilitate this lookup, we propose a fact-level decomposition of the hypothesis, and verifying the resulting sub-facts against both the textual premise and the structured KB. Our model, NSNet, learns to aggregate predictions from these heterogeneous data formats. On the SciTail dataset, NSNet outperforms a simpler combination of the two predictions by 3 % and the base entailment model by 5 %.

pdf bib
AdvEntuRe : Adversarial Training for Textual Entailment with Knowledge-Guided ExamplesAdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples
Dongyeop Kang | Tushar Khot | Ashish Sabharwal | Eduard Hovy
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We consider the problem of learning textual entailment models with limited supervision (5K-10 K training examples), and present two complementary approaches for it. First, we propose knowledge-guided adversarial example generators for incorporating large lexical resources in entailment models via only a handful of rule templates. Second, to make the entailment modela discriminatormore robust, we propose the first GAN-style approach for training it using a natural language example generator that iteratively adjusts to the discriminator’s weaknesses. We demonstrate effectiveness using two entailment datasets, where the proposed methods increase accuracy by 4.7 % on SciTail and by 2.8 % on a 1 % sub-sample of SNLI. Notably, even a single hand-written rule, negate, improves the accuracy of negation examples in SNLI by 6.1 %.

2017

pdf bib
Learning What is Essential in Questions
Daniel Khashabi | Tushar Khot | Ashish Sabharwal | Dan Roth
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Question answering (QA) systems are easily distracted by irrelevant or redundant words in questions, especially when faced with long or multi-sentence questions in difficult domains. This paper introduces and studies the notion of essential question terms with the goal of improving such QA solvers. We illustrate the importance of essential question terms by showing that humans’ ability to answer questions drops significantly when essential terms are eliminated from questions. We then develop a classifier that reliably (90 % mean average precision) identifies and ranks essential terms in questions. Finally, we use the classifier to demonstrate that the notion of question term essentiality allows state-of-the-art QA solver for elementary-level science questions to make better and more informed decisions, improving performance by up to 5%.We also introduce a new dataset of over 2,200 crowd-sourced essential terms annotated science questions.

pdf bib
Answering Complex Questions Using Open Information Extraction
Tushar Khot | Ashish Sabharwal | Peter Clark
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

While there has been substantial progress in factoid question-answering (QA), answering complex questions remains challenging, typically requiring both a large body of knowledge and inference techniques. Open Information Extraction (Open IE) provides a way to generate semi-structured knowledge for QA, but to date such knowledge has only been used to answer simple questions with retrieval-based methods. We overcome this limitation by presenting a method for reasoning with Open IE knowledge, allowing more complex questions to be handled. Using a recently proposed support graph optimization framework for QA, we develop a new inference model for Open IE, in particular one that can work effectively with multiple short facts, noise, and the relational structure of tuples. Our model significantly outperforms a state-of-the-art structured solver on complex questions of varying difficulty, while also removing the reliance on manually curated knowledge.