Vered Shwartz


2021

pdf bib
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
Anna Rogers | Iacer Calixto | Ivan Vulić | Naomi Saphra | Nora Kassner | Oana-Maria Camburu | Trapit Bansal | Vered Shwartz
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)

2020

pdf bib
You are grounded ! : Latent Name Artifacts in Pre-trained Language Models
Vered Shwartz | Rachel Rudinger | Oyvind Tafjord
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models. We focus on artifacts associated with the representation of given names (e.g., Donald), which, depending on the corpus, may be associated with specific entities, as indicated by next token prediction (e.g., Trump). While helpful in some contexts, grounding happens also in under-specified or inappropriate contexts. For example, endings generated for ‘Donald is a’ substantially differ from those of other names, and often have more-than-average negative sentiment. We demonstrate the potential effect on downstream tasks with reading comprehension probes where name perturbation changes the model answers. As a silver lining, our experiments suggest that additional pre-training on different corpora may mitigate this bias.

pdf bib
Thinking Like a Skeptic : Defeasible Inference in Natural Language
Rachel Rudinger | Vered Shwartz | Jena D. Hwang | Chandra Bhagavatula | Maxwell Forbes | Ronan Le Bras | Noah A. Smith | Yejin Choi
Findings of the Association for Computational Linguistics: EMNLP 2020

Defeasible inference is a mode of reasoning in which an inference (X is a bird, therefore X flies) may be weakened or overturned in light of new evidence (X is a penguin). Though long recognized in classical AI and philosophy, defeasible inference has not been extensively studied in the context of contemporary data-driven research on natural language inference and commonsense reasoning. We introduce Defeasible NLI (abbreviated -NLI), a dataset for defeasible inference in natural language. Defeasible NLI contains extensions to three existing inference datasets covering diverse modes of reasoning : common sense, natural language inference, and social norms. From Defeasible NLI, we develop both a classification and generation task for defeasible inference, and demonstrate that the generation task is much more challenging. Despite lagging human performance, however, generative models trained on this data are capable of writing sentences that weaken or strengthen a specified inference up to 68 % of the time.\\delta-NLI), a dataset for defeasible inference in natural language. Defeasible NLI contains extensions to three existing inference datasets covering diverse modes of reasoning: common sense, natural language inference, and social norms. From Defeasible NLI, we develop both a classification and generation task for defeasible inference, and demonstrate that the generation task is much more challenging. Despite lagging human performance, however, generative models trained on this data are capable of writing sentences that weaken or strengthen a specified inference up to 68% of the time.

pdf bib
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing
Nafise Sadat Moosavi | Angela Fan | Vered Shwartz | Goran Glavaš | Shafiq Joty | Alex Wang | Thomas Wolf
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing

2019

pdf bib
Diversify Your Datasets : Analyzing Generalization via Controlled Variance in Adversarial Datasets
Ohad Rozen | Vered Shwartz | Roee Aharoni | Ido Dagan
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Phenomenon-specific adversarial datasets have been recently designed to perform targeted stress-tests for particular inference types. Recent work (Liu et al., 2019a) proposed that such datasets can be utilized for training NLI and other types of models, often allowing to learn the phenomenon in focus and improve on the challenge dataset, indicating a blind spot in the original training data. Yet, although a model can improve in such a training process, it might still be vulnerable to other challenge datasets targeting the same phenomenon but drawn from a different distribution, such as having a different syntactic complexity level. In this work, we extend this method to drive conclusions about a model’s ability to learn and generalize a target phenomenon rather than to learn a dataset, by controlling additional aspects in the adversarial datasets. We demonstrate our approach on two inference phenomena dative alternation and numerical reasoning, elaborating, and in some cases contradicting, the results of Liu et al.. Our methodology enables building better challenge datasets for creating more robust models, and may yield better model understanding and subsequent overarching improvements.

2018

pdf bib
SemEval-2018 Task 9 : Hypernym DiscoverySemEval-2018 Task 9: Hypernym Discovery
Jose Camacho-Collados | Claudio Delli Bovi | Luis Espinosa-Anke | Sergio Oramas | Tommaso Pasini | Enrico Santus | Vered Shwartz | Roberto Navigli | Horacio Saggion
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the SemEval 2018 Shared Task on Hypernym Discovery. We put forward this task as a complementary benchmark for modeling hypernymy, a problem which has traditionally been cast as a binary classification task, taking a pair of candidate words as input. Instead, our reformulated task is defined as follows : given an input term, retrieve (or discover) its suitable hypernyms from a target corpus. We proposed five different subtasks covering three languages (English, Spanish, and Italian), and two specific domains of knowledge in English (Medical and Music). Participants were allowed to compete in any or all of the subtasks. Overall, a total of 11 teams participated, with a total of 39 different systems submitted through all subtasks. Data, results and further information about the task can be found at.https://competitions.codalab.org/competitions/17119.\n

pdf bib
Integrating Multiplicative Features into Supervised Distributional Methods for Lexical Entailment
Tu Vu | Vered Shwartz
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Supervised distributional methods are applied successfully in lexical entailment, but recent work questioned whether these methods actually learn a relation between two words. Specifically, Levy et al. (2015) claimed that linear classifiers learn only separate properties of each word. We suggest a cheap and easy way to boost the performance of these methods by integrating multiplicative features into commonly used representations. We provide an extensive evaluation with different classifiers and evaluation setups, and suggest a suitable evaluation setup for the task, eliminating biases existing in previous ones.

pdf bib
Breaking NLI Systems with Sentences that Require Simple Lexical InferencesNLI Systems with Sentences that Require Simple Lexical Inferences
Max Glockner | Vered Shwartz | Yoav Goldberg
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We create a new NLI test set that shows the deficiency of state-of-the-art models in inferences that require lexical and world knowledge. The new examples are simpler than the SNLI test set, containing sentences that differ by at most one word from sentences in the training set. Yet, the performance on the new test set is substantially worse across systems trained on SNLI, demonstrating that these systems are limited in their generalization ability, failing to capture many simple inferences.

pdf bib
Proceedings of ACL 2018, Student Research Workshop
Vered Shwartz | Jeniya Tabassum | Rob Voigt | Wanxiang Che | Marie-Catherine de Marneffe | Malvina Nissim
Proceedings of ACL 2018, Student Research Workshop

2017

pdf bib
A Consolidated Open Knowledge Representation for Multiple Texts
Rachel Wities | Vered Shwartz | Gabriel Stanovsky | Meni Adler | Ori Shapira | Shyam Upadhyay | Dan Roth | Eugenio Martinez Camara | Iryna Gurevych | Ido Dagan
Proceedings of the 2nd Workshop on Linking Models of Lexical, Sentential and Discourse-level Semantics

We propose to move from Open Information Extraction (OIE) ahead to Open Knowledge Representation (OKR), aiming to represent information conveyed jointly in a set of texts in an open text-based manner. We do so by consolidating OIE extractions using entity and predicate coreference, while modeling information containment between coreferring elements via lexical entailment. We suggest that generating OKR structures can be a useful step in the NLP pipeline, to give semantic applications an easy handle on consolidated information across multiple texts.

pdf bib
Acquiring Predicate Paraphrases from News Tweets
Vered Shwartz | Gabriel Stanovsky | Ido Dagan
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)

We present a simple method for ever-growing extraction of predicate paraphrases from news headlines in Twitter. Analysis of the output of ten weeks of collection shows that the accuracy of paraphrases with different support levels is estimated between 60-86 %. We also demonstrate that our resource is to a large extent complementary to existing resources, providing many novel paraphrases. Our resource is publicly available, continuously expanding based on daily news.

pdf bib
Hypernyms under Siege : Linguistically-motivated Artillery for Hypernymy Detection
Vered Shwartz | Enrico Santus | Dominik Schlechtweg
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

The fundamental role of hypernymy in NLP has motivated the development of many methods for the automatic identification of this relation, most of which rely on word distribution. We investigate an extensive number of such unsupervised measures, using several distributional semantic models that differ by context type and feature weighting. We analyze the performance of the different methods based on their linguistic motivation. Comparison to the state-of-the-art supervised methods shows that while supervised methods generally outperform the unsupervised ones, the former are sensitive to the distribution of training instances, hurting their reliability. Being based on general linguistic hypotheses and independent from training data, unsupervised measures are more robust, and therefore are still useful artillery for hypernymy detection.