Dietrich Klakow


pdf bib
Do we read what we hear? Modeling orthographic influences on spoken word recognition
Nicole Macher | Badr M. Abdullah | Harm Brouwer | Dietrich Klakow
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Theories and models of spoken word recognition aim to explain the process of accessing lexical knowledge given an acoustic realization of a word form. There is consensus that phonological and semantic information is crucial for this process. However, there is accumulating evidence that orthographic information could also have an impact on auditory word recognition. This paper presents two models of spoken word recognition that instantiate different hypotheses regarding the influence of orthography on this process. We show that these models reproduce human-like behavior in different ways and provide testable hypotheses for future research on the source of orthographic effects in spoken word recognition.

pdf bib
Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
Marius Mosbach | Michael A. Hedderich | Sandro Pezzelle | Aditya Mogadala | Dietrich Klakow | Marie-Francine Moens | Zeynep Akata
Proceedings of the Third Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

pdf bib
To Share or not to Share : Predicting Sets of Sources for Model Transfer LearningPredicting Sets of Sources for Model Transfer Learning
Lukas Lange | Jannik Strötgen | Heike Adel | Dietrich Klakow
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In low-resource settings, model transfer can help to overcome a lack of labeled data for many tasks and domains. However, predicting useful transfer sources is a challenging problem, as even the most similar sources might lead to unexpected negative transfer results. Thus, ranking methods based on task and text similarity as suggested in prior work may not be sufficient to identify promising sources. To tackle this problem, we propose a new approach to automatically determine which and how many sources should be exploited. For this, we study the effects of model transfer on sequence labeling across various domains and tasks and show that our methods based on model similarity and support vector machines are able to predict promising sources, resulting in performance increases of up to 24 F1 points.

pdf bib
Discourse-based Argument Segmentation and Annotation
Ekaterina Saveleva | Volha Petukhova | Marius Mosbach | Dietrich Klakow
Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation

The paper presents a discourse-based approach to the analysis of argumentative texts departing from the assumption that the coherence of a text should capture argumentation structure as well and, therefore, existing discourse analysis tools can be successfully applied for argument segmentation and annotation tasks. We tested the widely used Penn Discourse Tree Bank full parser (Lin et al., 2010) and the state-of-the-art neural network NeuralEDUSeg (Wang et al., 2018) and XLNet (Yang et al., 2019) models on the two-stage discourse segmentation and discourse relation recognition. The two-stage approach outperformed the PDTB parser by broad margin, i.e. the best achieved F1 scores of 21.2 % for PDTB parser vs 66.37 % for NeuralEDUSeg and XLNet models. Neural network models were fine-tuned and evaluated on the argumentative corpus showing a promising accuracy of 60.22 %. The complete argument structures were reconstructed for further argumentation mining tasks. The reference Dagstuhl argumentative corpus containing 2,222 elementary discourse unit pairs annotated with the top-level and fine-grained PDTB relations will be released to the research community.


pdf bib
Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence
Xiaoyu Shen | Ernie Chang | Hui Su | Cheng Niu | Dietrich Klakow
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

The neural attention model has achieved great success in data-to-text generation tasks. Though usually excelling at producing fluent text, it suffers from the problem of information missing, repetition and hallucination. Due to the black-box nature of the neural attention architecture, avoiding these problems in a systematic way is non-trivial. To address this concern, we propose to explicitly segment target text into fragment units and align them with their data correspondences. The segmentation and correspondence are jointly learned as latent variables without any human annotations. We further impose a soft statistical constraint to regularize the segmental granularity. The resulting architecture maintains the same expressive power as neural attention models, while being able to generate fully interpretable outputs with several times less computational cost. On both E2E and WebNLG benchmarks, we show the proposed model consistently outperforms its neural attention counterparts.

pdf bib
Label Propagation-Based Semi-Supervised Learning for Hate Speech Classification
Ashwin Geet D’Sa | Irina Illina | Dominique Fohr | Dietrich Klakow | Dana Ruiter
Proceedings of the First Workshop on Insights from Negative Results in NLP

Research on hate speech classification has received increased attention. In real-life scenarios, a small amount of labeled hate speech data is available to train a reliable classifier. Semi-supervised learning takes advantage of a small amount of labeled data and a large amount of unlabeled data. In this paper, label propagation-based semi-supervised learning is explored for the task of hate speech classification. The quality of labeling the unlabeled set depends on the input representations. In this work, we show that pre-trained representations are label agnostic, and when used with label propagation yield poor results. Neural network-based fine-tuning can be adopted to learn task-specific representations using a small amount of labeled data. We show that fully fine-tuned representations may not always be the best representations for the label propagation and intermediate representations may perform better in a semi-supervised setup.

pdf bib
Defining Explanation in an AI ContextAI Context
Tejaswani Verma | Christoph Lingenfelder | Dietrich Klakow
Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

With the increase in the use of AI systems, a need for explanation systems arises. Building an explanation system requires a definition of explanation. However, the natural language term explanation is difficult to define formally as it includes multiple perspectives from different domains such as psychology, philosophy, and cognitive sciences. We study multiple perspectives and aspects of explainability of recommendations or predictions made by AI systems, and provide a generic definition of explanation. The proposed definition is ambitious and challenging to apply. With the intention to bridge the gap between theory and application, we also propose a possible architecture of an automated explanation system based on our definition of explanation.

pdf bib
Proceedings of the Second Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
Aditya Mogadala | Sandro Pezzelle | Dietrich Klakow | Marie-Francine Moens | Zeynep Akata
Proceedings of the Second Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)


pdf bib
Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy LabelsNER Labeling with Noisy Labels
Lukas Lange | Michael A. Hedderich | Dietrich Klakow
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In low-resource settings, the performance of supervised labeling models can be improved with automatically annotated or distantly supervised data, which is cheap to create but often noisy. Previous works have shown that significant improvements can be reached by injecting information about the confusion between clean and noisy labels in this additional training data into the classifier training. However, for noise estimation, these approaches either do not take the input features (in our case word embeddings) into account, or they need to learn the noise modeling from scratch which can be difficult in a low-resource setting. We propose to cluster the training data using the input features and then compute different confusion matrices for each cluster. To the best of our knowledge, our approach is the first to leverage feature-dependent noise modeling with pre-initialized confusion matrices. We evaluate on low-resource named entity recognition settings in several languages, showing that our methods improve upon other confusion-matrix based methods by up to 9 %.

pdf bib
Improving Latent Alignment in Text Summarization by Generalizing the Pointer Generator
Xiaoyu Shen | Yang Zhao | Hui Su | Dietrich Klakow
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Pointer Generators have been the de facto standard for modern summarization systems. However, this architecture faces two major drawbacks : Firstly, the pointer is limited to copying the exact words while ignoring possible inflections or abstractions, which restricts its power of capturing richer latent alignment. Secondly, the copy mechanism results in a strong bias towards extractive generations, where most sentences are produced by simply copying from the source text. In this paper, we address these problems by allowing the model to edit pointed tokens instead of always hard copying them. The editing is performed by transforming the pointed word vector into a target space with a learned relation embedding. On three large-scale summarization dataset, we show the model is able to (1) capture more latent alignment relations than exact word matches, (2) improve word alignment accuracy, allowing for better model interpretation and controlling, (3) generate higher-quality summaries validated by both qualitative and quantitative evaluations and (4) bring more abstraction to the generated summaries.

pdf bib
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
Aditya Mogadala | Dietrich Klakow | Sandro Pezzelle | Marie-Francine Moens
Proceedings of the Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)

pdf bib
Using Multi-Sense Vector Embeddings for Reverse Dictionaries
Michael A. Hedderich | Andrew Yates | Dietrich Klakow | Gerard de Melo
Proceedings of the 13th International Conference on Computational Semantics - Long Papers

Popular word embedding methods such as word2vec and GloVe assign a single vector representation to each word, even if a word has multiple distinct meanings. Multi-sense embeddings instead provide different vectors for each sense of a word. However, they typically can not serve as a drop-in replacement for conventional single-sense embeddings, because the correct sense vector needs to be selected for each word. In this work, we study the effect of multi-sense embeddings on the task of reverse dictionaries. We propose a technique to easily integrate them into an existing neural network architecture using an attention mechanism. Our experiments demonstrate that large improvements can be obtained when employing multi-sense embeddings both in the input sequence as well as for the target representation. An analysis of the sense distributions and of the learned attention is provided as well.

pdf bib Toolbox for Calculating Linguistic Distances and Asymmetries between Related Languages
Marius Mosbach | Irina Stenger | Tania Avgustinova | Dietrich Klakow
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Languages may be differently distant from each other and their mutual intelligibility may be asymmetric. In this paper we introduce, a toolbox for calculating linguistic distances and asymmetries between related languages. allows linguist experts to quickly and easily perform statistical analyses and compare those with experimental results. We demonstrate the efficacy of in an incomprehension experiment on two Slavic languages : Bulgarian and Russian. Using we were able to validate three methods to measure linguistic distances and asymmetries : Levenshtein distance, word adaptation surprisal, and conditional entropy as predictors of success in a reading intercomprehension experiment.


pdf bib
Closing Brackets with Recurrent Neural Networks
Natalia Skachkova | Thomas Trost | Dietrich Klakow
Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Many natural and formal languages contain words or symbols that require a matching counterpart for making an expression well-formed. The combination of opening and closing brackets is a typical example of such a construction. Due to their commonness, the ability to follow such rules is important for language modeling. Currently, recurrent neural networks (RNNs) are extensively used for this task. We investigate whether they are capable of learning the rules of opening and closing brackets by applying them to synthetic Dyck languages that consist of different types of brackets. We provide an analysis of the statistical properties of these languages as a baseline and show strengths and limits of Elman-RNNs, GRUs and LSTMs in experiments on random samples of these languages. In terms of perplexity and prediction accuracy, the RNNs get close to the theoretical baseline in most cases.

pdf bib
Toward Bayesian Synchronous Tree Substitution Grammars for Sentence PlanningBayesian Synchronous Tree Substitution Grammars for Sentence Planning
David M. Howcroft | Dietrich Klakow | Vera Demberg
Proceedings of the 11th International Conference on Natural Language Generation

Developing conventional natural language generation systems requires extensive attention from human experts in order to craft complex sets of sentence planning rules. We propose a Bayesian nonparametric approach to learn sentence planning rules by inducing synchronous tree substitution grammars for pairs of text plans and morphosyntactically-specified dependency trees. Our system is able to learn rules which can be used to generate novel texts after training on small datasets.


pdf bib
Parameter Free Hierarchical Graph-Based Clustering for Analyzing Continuous Word Embeddings
Thomas Alexander Trost | Dietrich Klakow
Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing

Word embeddings are high-dimensional vector representations of words and are thus difficult to interpret. In order to deal with this, we introduce an unsupervised parameter free method for creating a hierarchical graphical clustering of the full ensemble of word vectors and show that this structure is a geometrically meaningful representation of the original relations between the words. This newly obtained representation can be used for better understanding and thus improving the embedding algorithm and exhibits semantic meaning, so it can also be utilized in a variety of language processing tasks like categorization or measuring similarity.