Right for the Wrong Reasons : Diagnosing Syntactic Heuristics in Natural Language Inference

Tom McCoy, Ellie Pavlick, Tal Linzen


Abstract
A machine learning system can score well on a given test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another. We hypothesize that statistical NLI models may adopt three fallible syntactic heuristics : the lexical overlap heuristic, the subsequence heuristic, and the constituent heuristic. To determine whether models have adopted these heuristics, we introduce a controlled evaluation set called HANS (Heuristic Analysis for NLI Systems), which contains many examples where the heuristics fail. We find that models trained on MNLI, including BERT, a state-of-the-art model, perform very poorly on HANS, suggesting that they have indeed adopted these heuristics. We conclude that there is substantial room for improvement in NLI systems, and that the HANS dataset can motivate and measure progress in this area.
Anthology ID:
P19-1334
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3428–3448
Language:
URL:
https://aclanthology.org/P19-1334
DOI:
10.18653/v1/P19-1334
Bibkey:
Cite (ACL):
Tom McCoy, Ellie Pavlick, and Tal Linzen. 2019. Right for the Wrong Reasons : Diagnosing Syntactic Heuristics in Natural Language Inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Right for the Wrong Reasons : Diagnosing Syntactic Heuristics in Natural Language Inference (McCoy et al., ACL 2019)
Copy Citation:
PDF:
https://aclanthology.org/P19-1334.pdf
Video:
 https://vimeo.com/384776891
Code
 tommccoy1/hans +  additional community code
Data
SNLI
Terminologies: