Simon Dobnik


2022

pdf bib
Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer
Nikolai Ilinykh | Simon Dobnik
Findings of the Association for Computational Linguistics: ACL 2022

We explore how a multi-modal transformer trained for generation of longer image descriptions learns syntactic and semantic representations about entities and relations grounded in objects at the level of masked self-attention (text generation) and cross-modal attention (information fusion). We observe that cross-attention learns the visual grounding of noun phrases into objects and high-level semantic information about spatial relations, while text-to-text attention captures low-level syntactic knowledge between words. This concludes that language models in a multi-modal task learn different semantic information about objects and relations cross-modally and uni-modally (text-only). Our code is available here: https://github.com/GU-CLASP/attention-as-grounding.

2021

pdf bib
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Simon Dobnik | Lilja Øvrelid
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)

pdf bib
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)
Christine Howes | Simon Dobnik | Ellen Breitholtz | Stergios Chatzikyriakidis
Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)

pdf bib
How Vision Affects Language : Comparing Masked Self-Attention in Uni-Modal and Multi-Modal Transformer
Nikolai Ilinykh | Simon Dobnik
Proceedings of the 1st Workshop on Multimodal Semantic Representations (MMSR)

The problem of interpretation of knowledge learned by multi-head self-attention in transformers has been one of the central questions in NLP. However, a lot of work mainly focused on models trained for uni-modal tasks, e.g. machine translation. In this paper, we examine masked self-attention in a multi-modal transformer trained for the task of image captioning. In particular, we test whether the multi-modality of the task objective affects the learned attention patterns. Our visualisations of masked self-attention demonstrate that (i) it can learn general linguistic knowledge of the textual input, and (ii) its attention patterns incorporate artefacts from visual modality even though it has never accessed it directly. We compare our transformer’s attention patterns with masked attention in distilgpt-2 tested for uni-modal text generation of image captions. Based on the maps of extracted attention weights, we argue that masked self-attention in image captioning transformer seems to be enhanced with semantic knowledge from images, exemplifying joint language-and-vision information in its attention patterns.

2019

pdf bib
Normalising Non-standardised Orthography in Algerian Code-switched User-generated DataAlgerian Code-switched User-generated Data
Wafia Adouane | Jean-Philippe Bernardy | Simon Dobnik
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

We work with Algerian, an under-resourced non-standardised Arabic variety, for which we compile a new parallel corpus consisting of user-generated textual data matched with normalised and corrected human annotations following data-driven and our linguistically motivated standard. We use an end-to-end deep neural model designed to deal with context-dependent spelling correction and normalisation. Results indicate that a model with two CNN sub-network encoders and an LSTM decoder performs the best, and that word context matters. Additionally, pre-processing data token-by-token with an edit-distance based aligner significantly improves the performance. We get promising results for the spelling correction and normalisation, as a pre-processing step for downstream tasks, on detecting binary Semantic Textual Similarity.

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Long Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Short Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg
Proceedings of the 13th International Conference on Computational Semantics - Short Papers

pdf bib
Proceedings of the 13th International Conference on Computational Semantics - Student Papers
Simon Dobnik | Stergios Chatzikyriakidis | Vera Demberg | Kathrein Abu Kwaik | Vladislav Maraev
Proceedings of the 13th International Conference on Computational Semantics - Student Papers

2018

pdf bib
Improving Neural Network Performance by Injecting Background Knowledge : Detecting Code-switching and Borrowing in Algerian textsAlgerian texts
Wafia Adouane | Jean-Philippe Bernardy | Simon Dobnik
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

We explore the effect of injecting background knowledge to different deep neural network (DNN) configurations in order to mitigate the problem of the scarcity of annotated data when applying these models on datasets of low-resourced languages. The background knowledge is encoded in the form of lexicons and pre-trained sub-word embeddings. The DNN models are evaluated on the task of detecting code-switching and borrowing points in non-standardised user-generated Algerian texts. Overall results show that DNNs benefit from adding background knowledge. However, the gain varies between models and categories. The proposed DNN architectures are generic and could be applied to other low-resourced languages.

2017

pdf bib
Identification of Languages in Algerian Arabic Multilingual DocumentsAlgerian Arabic Multilingual Documents
Wafia Adouane | Simon Dobnik
Proceedings of the Third Arabic Natural Language Processing Workshop

This paper presents a language identification system designed to detect the language of each word, in its context, in a multilingual documents as generated in social media by bilingual / multilingual communities, in our case speakers of Algerian Arabic. We frame the task as a sequence tagging problem and use supervised machine learning with standard methods like HMM and Ngram classification tagging. We also experiment with a lexicon-based method. Combining all the methods in a fall-back mechanism and introducing some linguistic rules, to deal with unseen tokens and ambiguous words, gives an overall accuracy of 93.14 %. Finally, we introduced rules for language identification from sequences of recognised words.