Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Jasmijn Bastings, Yonatan Belinkov, Emmanuel Dupoux, Mario Giulianelli, Dieuwke Hupkes, Yuval Pinter, Hassan Sajjad (Editors)


Anthology ID:
2021.blackboxnlp-1
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venues:
BlackboxNLP | EMNLP
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2021.blackboxnlp-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
Jasmijn Bastings | Yonatan Belinkov | Emmanuel Dupoux | Mario Giulianelli | Dieuwke Hupkes | Yuval Pinter | Hassan Sajjad

pdf bib
Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings
Hendrik Schuff | Hsiu-Yu Yang | Heike Adel | Ngoc Thang Vu

Natural language inference (NLI) requires models to learn and apply commonsense knowledge. These reasoning abilities are particularly important for explainable NLI systems that generate a natural language explanation in addition to their label prediction. The integration of external knowledge has been shown to improve NLI systems, here we investigate whether it can also improve their explanation capabilities. For this, we investigate different sources of external knowledge and evaluate the performance of our models on in-domain data as well as on special transfer datasets that are designed to assess fine-grained reasoning capabilities. We find that different sources of knowledge have a different effect on reasoning abilities, for example, implicit knowledge stored in language models can hinder reasoning on numbers and negations. Finally, we conduct the largest and most fine-grained explainable NLI crowdsourcing study to date. It reveals that even large differences in automatic performance scores do neither reflect in human ratings of label, explanation, commonsense nor grammar correctness.

pdf bib
On the Limits of Minimal Pairs in Contrastive Evaluation
Jannis Vamvas | Rico Sennrich

Minimal sentence pairs are frequently used to analyze the behavior of language models. It is often assumed that model behavior on contrastive pairs is predictive of model behavior at large. We argue that two conditions are necessary for this assumption to hold : First, a tested hypothesis should be well-motivated, since experiments show that contrastive evaluation can lead to false positives. Secondly, test data should be chosen such as to minimize distributional discrepancy between evaluation time and deployment time. For a good approximation of deployment-time decoding, we recommend that minimal pairs are created based on machine-generated text, as opposed to human-written references. We present a contrastive evaluation suite for EnglishGerman MT that implements this recommendation.

pdf bib
What Models Know About Their Attackers : Deriving Attacker Information From Latent Representations
Zhouhang Xie | Jonathan Brophy | Adam Noack | Wencong You | Kalyani Asthana | Carter Perkins | Sabrina Reis | Zayd Hammoudeh | Daniel Lowd | Sameer Singh

Adversarial attacks curated against NLP models are increasingly becoming practical threats. Although various methods have been developed to detect adversarial attacks, securing learning-based NLP systems in practice would require more than identifying and evading perturbed instances. To address these issues, we propose a new set of adversary identification tasks, Attacker Attribute Classification via Textual Analysis (AACTA), that attempts to obtain more detailed information about the attackers from adversarial texts. Specifically, given a piece of adversarial text, we hope to accomplish tasks such as localizing perturbed tokens, identifying the attacker’s access level to the target model, determining the evasion mechanism imposed, and specifying the perturbation type employed by the attacking algorithm. Our contributions are as follows : we formalize the task of classifying attacker attributes, and create a benchmark on various target models from sentiment classification and abuse detection domains. We show that signals from BERT models and target models can be used to train classifiers that reveal the properties of the attacking algorithms. We demonstrate that adversarial attacks leave interpretable traces in both feature spaces of pre-trained language models and target models, making AACTA a promising direction towards more trustworthy NLP systems.

pdf bib
ProSPer : Probing Human and Neural Network Language Model Understanding of Spatial PerspectiveProSPer: Probing Human and Neural Network Language Model Understanding of Spatial Perspective
Tessa Masis | Carolyn Anderson

Understanding perspectival language is important for applications like dialogue systems and human-robot interaction. We propose a probe task that explores how well language models understand spatial perspective. We present a dataset for evaluating perspective inference in English, ProSPer, and use it to explore how humans and Transformer-based language models infer perspective. Although the best bidirectional model performs similarly to humans, they display different strengths : humans outperform neural networks in conversational contexts, while RoBERTa excels at written genres.

pdf bib
Transferring Knowledge from Vision to Language : How to Achieve it and how to Measure it?
Tobias Norlund | Lovisa Hagström | Richard Johansson

Large language models are known to suffer from the hallucination problem in that they are prone to output statements that are false or inconsistent, indicating a lack of knowledge. A proposed solution to this is to provide the model with additional data modalities that complements the knowledge obtained through text. We investigate the use of visual data to complement the knowledge of large language models by proposing a method for evaluating visual knowledge transfer to text for uni- or multimodal language models. The method is based on two steps, 1) a novel task querying for knowledge of memory colors, i.e. typical colors of well-known objects, and 2) filtering of model training data to clearly separate knowledge contributions. Additionally, we introduce a model architecture that involves a visual imagination step and evaluate it with our proposed method. We find that our method can successfully be used to measure visual knowledge transfer capabilities in models and that our novel model architecture shows promising results for leveraging multimodal knowledge in a unimodal setting.

pdf bib
A howling success or a working sea? Testing what BERT knows about metaphorsBERT knows about metaphors
Paolo Pedinotti | Eliana Di Palma | Ludovica Cerini | Alessandro Lenci

Metaphor is a widespread linguistic and cognitive phenomenon that is ruled by mechanisms which have received attention in the literature. Transformer Language Models such as BERT have brought improvements in metaphor-related tasks. However, they have been used only in application contexts, while their knowledge of the phenomenon has not been analyzed. To test what BERT knows about metaphors, we challenge it on a new dataset that we designed to test various aspects of this phenomenon such as variations in linguistic structure, variations in conventionality, the boundaries of the plausibility of a metaphor and the interpretations that we attribute to metaphoric expressions. Results bring out some tendencies that suggest that the model can reproduce some human intuitions about metaphors.

pdf bib
How Length Prediction Influence the Performance of Non-Autoregressive Translation?
Minghan Wang | Guo Jiaxin | Yuxia Wang | Yimeng Chen | Su Chang | Hengchao Shang | Min Zhang | Shimin Tao | Hao Yang

Length prediction is a special task in a series of NAT models where target length has to be determined before generation. However, the performance of length prediction and its influence on translation quality has seldom been discussed. In this paper, we present comprehensive analyses on length prediction task of NAT, aiming to find the factors that influence performance, as well as how it associates with translation quality. We mainly perform experiments based on Conditional Masked Language Model (CMLM) (Ghazvininejad et al., 2019), a representative NAT model, and evaluate it on two language pairs, En-De and En-Ro. We draw two conclusions : 1) The performance of length prediction is mainly influenced by properties of language pairs such as alignment pattern, word order or intrinsic length ratio, and is also affected by the usage of knowledge distilled data. 2) There is a positive correlation between the performance of the length prediction and the BLEU score.

pdf bib
On the Language-specificity of Multilingual BERT and the Impact of Fine-tuningBERT and the Impact of Fine-tuning
Marc Tanti | Lonneke van der Plas | Claudia Borg | Albert Gatt

Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two components : a language-specific and a language-neutral one. This paper analyses the relationship between them, in the context of fine-tuning on two tasks POS tagging and natural language inference which require the model to bring to bear different degrees of language-specific knowledge. Visualisations reveal that mBERT loses the ability to cluster representations by language after fine-tuning, a result that is supported by evidence from language identification experiments. However, further experiments on ‘unlearning’ language-specific representations using gradient reversal and iterative adversarial learning are shown not to add further improvement to the language-independent component over and above the effect of fine-tuning. The results presented here suggest that the process of fine-tuning causes a reorganisation of the model’s limited representational capacity, enhancing language-independent representations at the expense of language-specific ones.

pdf bib
Variation and generality in encoding of syntactic anomaly information in sentence embeddings
Qinxuan Wu | Allyson Ettinger

While sentence anomalies have been applied periodically for testing in NLP, we have yet to establish a picture of the precise status of anomaly information in representations from NLP models. In this paper we aim to fill two primary gaps, focusing on the domain of syntactic anomalies. First, we explore fine-grained differences in anomaly encoding by designing probing tasks that vary the hierarchical level at which anomalies occur in a sentence. Second, we test not only models’ ability to detect a given anomaly, but also the generality of the detected anomaly signal, by examining transfer between distinct anomaly types. Results suggest that all models encode some information supporting anomaly detection, but detection performance varies between anomalies, and only representations from more re- cent transformer models show signs of generalized knowledge of anomalies. Follow-up analyses support the notion that these models pick up on a legitimate, general notion of sentence oddity, while coarser-grained word position information is likely also a contributor to the observed anomaly detection.

pdf bib
Enhancing Interpretable Clauses Semantically using Pretrained Word Representation
Rohan Kumar Yadav | Lei Jiao | Ole-Christoffer Granmo | Morten Goodwin

Tsetlin Machine (TM) is an interpretable pattern recognition algorithm based on propositional logic, which has demonstrated competitive performance in many Natural Language Processing (NLP) tasks, including sentiment analysis, text classification, and Word Sense Disambiguation. To obtain human-level interpretability, legacy TM employs Boolean input features such as bag-of-words (BOW). However, the BOW representation makes it difficult to use any pre-trained information, for instance, word2vec and GloVe word representations. This restriction has constrained the performance of TM compared to deep neural networks (DNNs) in NLP. To reduce the performance gap, in this paper, we propose a novel way of using pre-trained word representations for TM. The approach significantly enhances the performance and interpretability of TM. We achieve this by extracting semantically related words from pre-trained word representations as input features to the TM. Our experiments show that the accuracy of the proposed approach is significantly higher than the previous BOW-based TM, reaching the level of DNN-based models.

pdf bib
An in-depth look at Euclidean disk embeddings for structure preserving parsingEuclidean disk embeddings for structure preserving parsing
Federico Fancellu | Lan Xiao | Allan Jepson | Afsaneh Fazly

Preserving the structural properties of trees or graphs when embedding them into a metric space allows for a high degree of interpretability, and has been shown beneficial for downstream tasks (e.g., hypernym detection, natural language inference, multimodal retrieval). However, whereas the majority of prior work looks at using structure-preserving embeddings when encoding a structure given as input, e.g., WordNet (Fellbaum, 1998), there is little exploration on how to use such embeddings when predicting one. We address this gap for two structure generation tasks, namely dependency and semantic parsing. We test the applicability of disk embeddings (Suzuki et al., 2019) that has been proposed for embedding Directed Acyclic Graphs (DAGs) but has not been tested on tasks that generate such structures. Our experimental results show that for both tasks the original disk embedding formulation leads to much worse performance when compared to non-structure-preserving baselines. We propose enhancements to this formulation and show that they almost close the performance gap for dependency parsing. However, the gap still remains notable for semantic parsing due to the complexity of meaning representation graphs, suggesting a challenge for generating interpretable semantic parse representations.

pdf bib
Assessing the Generalization Capacity of Pre-trained Language Models through Japanese Adversarial Natural Language InferenceJapanese Adversarial Natural Language Inference
Hitomi Yanaka | Koji Mineshima

Despite the success of multilingual pre-trained language models, it remains unclear to what extent these models have human-like generalization capacity across languages. The aim of this study is to investigate the out-of-distribution generalization of pre-trained language models through Natural Language Inference (NLI) in Japanese, the typological properties of which are different from those of English. We introduce a synthetically generated Japanese NLI dataset, called the Japanese Adversarial NLI (JaNLI) dataset, which is inspired by the English HANS dataset and is designed to require understanding of Japanese linguistic phenomena and illuminate the vulnerabilities of models. Through a series of experiments to evaluate the generalization performance of both Japanese and multilingual BERT models, we demonstrate that there is much room to improve current models trained on Japanese NLI tasks. Furthermore, a comparison of human performance and model performance on the different types of garden-path sentences in the JaNLI dataset shows that structural phenomena that ease interpretation of garden-path sentences for human readers do not help models in the same way, highlighting a difference between human readers and the models.

pdf bib
Investigating Negation in Pre-trained Vision-and-language Models
Radina Dobreva | Frank Keller

Pre-trained vision-and-language models have achieved impressive results on a variety of tasks, including ones that require complex reasoning beyond object recognition. However, little is known about how they achieve these results or what their limitations are. In this paper, we focus on a particular linguistic capability, namely the understanding of negation. We borrow techniques from the analysis of language models to investigate the ability of pre-trained vision-and-language models to handle negation. We find that these models severely underperform in the presence of negation.

pdf bib
Learning Mathematical Properties of Integers
Maria Ryskina | Kevin Knight

Embedding words in high-dimensional vector spaces has proven valuable in many natural language applications. In this work, we investigate whether similarly-trained embeddings of integers can capture concepts that are useful for mathematical applications. We probe the integer embeddings for mathematical knowledge, apply them to a set of numerical reasoning tasks, and show that by learning the representations from mathematical sequence data, we can substantially improve over number embeddings learned from English text corpora.

pdf bib
An Investigation of Language Model Interpretability via Sentence Editing
Samuel Stevens | Yu Su

Pre-trained language models (PLMs) like BERT are being used for almost all language-related tasks, but interpreting their behavior still remains a significant challenge and many important questions remain largely unanswered. In this work, we re-purpose a sentence editing dataset, where faithful high-quality human rationales can be automatically extracted and compared with extracted model rationales, as a new testbed for interpretability. This enables us to conduct a systematic investigation on an array of questions regarding PLMs’ interpretability, including the role of pre-training procedure, comparison of rationale extraction methods, and different layers in the PLM. The investigation generates new insights, for example, contrary to the common understanding, we find that attention weights correlate well with human rationales and work better than gradient-based saliency in extracting model rationales. Both the dataset and code will be released to facilitate future interpretability research.

pdf bib
Controlled tasks for model analysis : Retrieving discrete information from sequences
Ionut-Teodor Sorodoc | Gemma Boleda | Marco Baroni

In recent years, the NLP community has shown increasing interest in analysing how deep learning models work. Given that large models trained on complex tasks are difficult to inspect, some of this work has focused on controlled tasks that emulate specific aspects of language. We propose a new set of such controlled tasks to explore a crucial aspect of natural language processing that has not received enough attention : the need to retrieve discrete information from sequences. We also study model behavior on the tasks with simple instantiations of Transformers and LSTMs. Our results highlight the beneficial role of decoder attention and its sometimes unexpected interaction with other components. Moreover, we show that, for most of the tasks, these simple models still show significant difficulties. We hope that the community will take up the analysis possibilities that our tasks afford, and that a clearer understanding of model behavior on the tasks will lead to better and more transparent models.

pdf bib
Do Language Models Know the Way to Rome?Rome?
Bastien Liétard | Mostafa Abdou | Anders Søgaard

The global geometry of language models is important for a range of applications, but language model probes tend to evaluate rather local relations, for which ground truths are easily obtained. In this paper we exploit the fact that in geography, ground truths are available beyond local relations. In a series of experiments, we evaluate the extent to which language model representations of city and country names are isomorphic to real-world geography, e.g., if you tell a language model where Paris and Berlin are, does it know the way to Rome? We find that language models generally encode limited geographic information, but with larger models performing the best, suggesting that geographic knowledge can be induced from higher-order co-occurrence statistics.can be induced from higher-order co-occurrence statistics.