Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures

Eneko Agirre, Marianna Apidianaki, Ivan Vulić (Editors)


Anthology ID:
2020.deelio-1
Month:
November
Year:
2020
Address:
Online
Venues:
DeeLIO | EMNLP
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2020.deelio-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
Proceedings of Deep Learning Inside Out (DeeLIO): The First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures
Eneko Agirre | Marianna Apidianaki | Ivan Vulić

pdf bib
Generalization to Mitigate Synonym Substitution Attacks
Basemah Alshemali | Jugal Kalita

Studies have shown that deep neural networks (DNNs) are vulnerable to adversarial examples perturbed inputs that cause DNN-based models to produce incorrect results. One robust adversarial attack in the NLP domain is the synonym substitution. In attacks of this variety, the adversary substitutes words with synonyms. Since synonym substitution perturbations aim to satisfy all lexical, grammatical, and semantic constraints, they are difficult to detect with automatic syntax check as well as by humans. In this paper, we propose a structure-free defensive method that is capable of improving the performance of DNN-based models with both clean and adversarial data. Our findings show that replacing the embeddings of the important words in the input samples with the average of their synonyms’ embeddings can significantly improve the generalization of DNN-based classifiers. By doing so, we reduce model sensitivity to particular words in the input samples. Our results indicate that the proposed defense is not only capable of defending against adversarial attacks, but is also capable of improving the performance of DNN-based models when tested on benign data. On average, the proposed defense improved the classification accuracy of the CNN and Bi-LSTM models by 41.30 % and 55.66 %, respectively, when tested under adversarial attacks. Extended investigation shows that our defensive method can improve the robustness of nonneural models, achieving an average of 17.62 % and 22.93 % classification accuracy increase on the SVM and XGBoost models, respectively. The proposed defensive method has also shown an average of 26.60 % classification accuracy improvement when tested with the infamous BERT model.