Michael Strube


2022

pdf bib
Entity-based Neural Local Coherence Modeling
Sungho Jeon | Michael Strube
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we propose an entity-based neural local coherence model which is linguistically more sound than previously proposed neural coherence models. Recent neural coherence models encode the input document using large-scale pretrained language models. Hence their basis for computing local coherence are words and even sub-words. The analysis of their output shows that these models frequently compute coherence on the basis of connections between (sub-)words which, from a linguistic perspective, should not play a role. Still, these models achieve state-of-the-art performance in several end applications. In contrast to these models, we compute coherence on the basis of entities by constraining the input to noun phrases and proper names. This provides us with an explicit representation of the most important items in sentences leading to the notion of focus. This brings our model linguistically in line with pre-neural models of computing coherence. It also gives us better insight into the behaviour of the model thus leading to better explainability. Our approach is also in accord with a recent study (O’Connor and Andreas, 2021), which shows that most usable information is captured by nouns and verbs in transformer-based language models. We evaluate our model on three downstream tasks showing that it is not only linguistically more sound than previous models but also that it outperforms them in end applications.

2021

pdf bib
Proceedings of the 2nd Workshop on Computational Approaches to Discourse
Chloé Braud | Christian Hardmeier | Junyi Jessy Li | Annie Louis | Michael Strube | Amir Zeldes
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

pdf bib
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

2020

pdf bib
Centering-based Neural Coherence Modeling with Hierarchical Discourse Segments
Sungho Jeon | Michael Strube
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Previous neural coherence models have focused on identifying semantic relations between adjacent sentences. However, they do not have the means to exploit structural information. In this work, we propose a coherence model which takes discourse structural information into account without relying on human annotations. We approximate a linguistic theory of coherence, Centering theory, which we use to track the changes of focus between discourse segments. Our model first identifies the focus of each sentence, recognized with regards to the context, and constructs the structural relationship for discourse segments by tracking the changes of the focus. The model then incorporates this structural information into a structure-aware transformer. We evaluate our model on two tasks, automated essay scoring and assessing writing quality. Our results demonstrate that our model, built on top of a pretrained language model, achieves state-of-the-art performance on both tasks. We next statistically examine the identified trees of texts assigned to different quality scores. Finally, we investigate what our model learns in terms of theoretical claims.

pdf bib
Proceedings of the First Workshop on Computational Approaches to Discourse
Chloé Braud | Christian Hardmeier | Junyi Jessy Li | Annie Louis | Michael Strube
Proceedings of the First Workshop on Computational Approaches to Discourse

2019

pdf bib
Adapting Deep Learning Methods for Mental Health Prediction on Social Media
Ivan Sekulic | Michael Strube
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Mental health poses a significant challenge for an individual’s well-being. Text analysis of rich resources, like social media, can contribute to deeper understanding of illnesses and provide means for their early detection. We tackle a challenge of detecting social media users’ mental status through deep learning-based models, moving away from traditional approaches to the task. In a binary classification task on predicting if a user suffers from one of nine different disorders, a hierarchical attention network outperforms previously set benchmarks for four of the disorders. Furthermore, we explore the limitations of our model and analyze phrases relevant for classification by inspecting the model’s word-level attention weights.

pdf bib
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials
Anoop Sarkar | Michael Strube
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials

pdf bib
Using Automatically Extracted Minimum Spans to Disentangle Coreference Evaluation from Boundary Detection
Nafise Sadat Moosavi | Leo Born | Massimo Poesio | Michael Strube
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The common practice in coreference resolution is to identify and evaluate the maximum span of mentions. The use of maximum spans tangles coreference evaluation with the challenges of mention boundary detection like prepositional phrase attachment. To address this problem, minimum spans are manually annotated in smaller corpora. However, this additional annotation is costly and therefore, this solution does not scale to large corpora. In this paper, we propose the MINA algorithm for automatically extracting minimum spans to benefit from minimum span evaluation in all corpora. We show that the extracted minimum spans by MINA are consistent with those that are manually annotated by experts. Our experiments show that using minimum spans is in particular important in cross-dataset coreference evaluation, in which detected mention boundaries are noisier due to domain shift. We have integrated MINA into https://github.com/ns-moosavi/coval for reporting standard coreference scores based on both maximum and automatically detected minimum spans.

pdf bib
On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages
Yi Zhu | Benjamin Heinzerling | Ivan Vulić | Michael Strube | Roi Reichart | Anna Korhonen
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Recent work has validated the importance of subword information for word representation learning. Since subwords increase parameter sharing ability in neural models, their value should be even more pronounced in low-data regimes. In this work, we therefore provide a comprehensive analysis focused on the usefulness of subwords for word representation learning in truly low-resource scenarios and for three representative morphological tasks : fine-grained entity typing, morphological tagging, and named entity recognition. We conduct a systematic study that spans several dimensions of comparison : 1) type of data scarcity which can stem from the lack of task-specific training data, or even from the lack of unannotated data required to train word embeddings, or both ; 2) language type by working with a sample of 16 typologically diverse languages including some truly low-resource ones (e.g. Rusyn, Buryat, and Zulu) ; 3) the choice of the subword-informed word representation method. Our main results show that subword-informed models are universally useful across all language types, with large gains over subword-agnostic embeddings. They also suggest that the effective use of subwords largely depends on the language (type) and the task at hand, as well as on the amount of available data for training the embeddings and task-based models, where having sufficient in-task data is a more critical requirement.

2018

pdf bib
Using Linguistic Features to Improve the Generalization Capability of Neural Coreference Resolvers
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Coreference resolution is an intermediate step for text understanding. It is used in tasks and domains for which we do not necessarily have coreference annotated corpora. Therefore, generalization is of special importance for coreference resolution. However, while recent coreference resolvers have notable improvements on the CoNLL dataset, they struggle to generalize properly to new domains or datasets. In this paper, we investigate the role of linguistic features in building more generalizable coreference resolvers. We show that generalization improves only slightly by merely using a set of additional linguistic features. However, employing features and subsets of their values that are informative for coreference resolution, considerably improves generalization. Thanks to better generalization, our system achieves state-of-the-art results in out-of-domain evaluations, e.g., on WikiCoref, our system, which is trained on CoNLL, achieves on-par performance with a system designed for this dataset.

pdf bib
A Neural Local Coherence Model for Text Quality Assessment
Mohsen Mesgar | Michael Strube
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We propose a local coherence model that captures the flow of what semantically connects adjacent sentences in a text. We represent the semantics of a sentence by a vector and capture its state at each word of the sentence. We model what relates two adjacent sentences based on the two most similar semantic states, each of which is in one of the sentences. We encode the perceived coherence of a text by a vector, which represents patterns of changes in salient information that relates adjacent sentences. Our experiments demonstrate that our approach is beneficial for two downstream tasks : Readability assessment, in which our model achieves new state-of-the-art results ; and essay scoring, in which the combination of our coherence vectors and other task-dependent features significantly improves the performance of a strong essay scorer.

pdf bib
Unrestricted Bridging Resolution
Yufang Hou | Katja Markert | Michael Strube
Computational Linguistics, Volume 44, Issue 2 - June 2018

In contrast to identity anaphors, which indicate coreference between a noun phrase and its antecedent, bridging anaphors link to their antecedent(s) via lexico-semantic, frame, or encyclopedic relations. Bridging resolution involves recognizing bridging anaphors and finding links to antecedents. In contrast to most prior work, we tackle both problems. Our work also follows a more wide-ranging definition of bridging than most previous work and does not impose any restrictions on the type of bridging anaphora or relations between anaphor and antecedent. We create a corpus (ISNotes) annotated for information status (IS), bridging being one of the IS subcategories. The annotations reach high reliability for all categories and marginal reliability for the bridging subcategory. We use a two-stage statistical global inference method for bridging resolution. Given all mentions in a document, the first stage, bridging anaphora recognition, recognizes bridging anaphors as a subtask of learning fine-grained IS. We use a cascading collective classification method where (i) collective classification allows us to investigate relations among several mentions and autocorrelation among IS classes and (ii) cascaded classification allows us to tackle class imbalance, important for minority classes such as bridging. We show that our method outperforms current methods both for IS recognition overall as well as for bridging, specifically. The second stage, bridging antecedent selection, finds the antecedents for all predicted bridging anaphors. We investigate the phenomenon of semantically or syntactically related bridging anaphors that share the same antecedent, a phenomenon we call sibling anaphors.

pdf bib
Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing
Mark Alfano | Dirk Hovy | Margaret Mitchell | Michael Strube
Proceedings of the Second ACL Workshop on Ethics in Natural Language Processing

2017

pdf bib
Event Argument Identification on Dependency Graphs with Bidirectional LSTMsLSTMs
Alex Judea | Michael Strube
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In this paper we investigate the performance of event argument identification. We show that the performance is tied to syntactic complexity. Based on this finding, we propose a novel and effective system for event argument identification. Recurrent Neural Networks learn to produce meaningful representations of long and short dependency paths. Convolutional Neural Networks learn to decompose the lexical context of argument candidates. They are combined into a simple system which outperforms a feature-based, state-of-the-art event argument identifier without any manual feature engineering.

pdf bib
Proceedings of the IJCNLP 2017, Tutorial Abstracts
Sadao Kurohashi | Michael Strube
Proceedings of the IJCNLP 2017, Tutorial Abstracts

pdf bib
Lexical Features in Coreference Resolution : To be Used With Caution
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Lexical features are a major source of information in state-of-the-art coreference resolvers. Lexical features implicitly model some of the linguistic phenomena at a fine granularity level. They are especially useful for representing the context of mentions. In this paper we investigate a drawback of using many lexical features in state-of-the-art coreference resolvers. We show that if coreference resolvers mainly rely on lexical features, they can hardly generalize to unseen domains. Furthermore, we show that the current coreference resolution evaluation is clearly flawed by only evaluating on a specific split of a specific dataset in which there is a notable overlap between the training, development and test sets.

pdf bib
Use Generalized Representations, But Do Not Forget Surface Features
Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

Only a year ago, all state-of-the-art coreference resolvers were using an extensive amount of surface features. Recently, there was a paradigm shift towards using word embeddings and deep neural networks, where the use of surface features is very limited. In this paper, we show that a simple SVM model with surface features outperforms more complex neural models for detecting anaphoric mentions. Our analysis suggests that using generalized representations and surface features have different strength that should be both taken into account for improving coreference resolution.

pdf bib
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing
Dirk Hovy | Shannon Spruit | Margaret Mitchell | Emily M. Bender | Michael Strube | Hanna Wallach
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

pdf bib
Using a Graph-based Coherence Model in Document-Level Machine Translation
Leo Born | Mohsen Mesgar | Michael Strube
Proceedings of the Third Workshop on Discourse in Machine Translation

Although coherence is an important aspect of any text generation system, it has received little attention in the context of machine translation (MT) so far. We hypothesize that the quality of document-level translation can be improved if MT models take into account the semantic relations among sentences during translation. We integrate the graph-based coherence model proposed by Mesgar and Strube, (2016) with Docent (Hardmeier et al., 2012, Hardmeier, 2014) a document-level machine translation system. The application of this graph-based coherence modeling approach is novel in the context of machine translation. We evaluate the coherence model and its effects on the quality of the machine translation. The result of our experiments shows that our coherence model slightly improves the quality of translation in terms of the average Meteor score.

pdf bib
Revisiting Selectional Preferences for Coreference Resolution
Benjamin Heinzerling | Nafise Sadat Moosavi | Michael Strube
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Selectional preferences have long been claimed to be essential for coreference resolution. However, they are modeled only implicitly by current coreference resolvers. We propose a dependency-based embedding model of selectional preferences which allows fine-grained compatibility judgments with high coverage. Incorporating our model improves performance, matching state-of-the-art results of a more complex system. However, it comes with a cost that makes it debatable how worthwhile are such improvements.

pdf bib
Trust, but Verify ! Better Entity Linking through Automatic Verification
Benjamin Heinzerling | Michael Strube | Chin-Yew Lin
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We introduce automatic verification as a post-processing step for entity linking (EL). The proposed method trusts EL system results collectively, by assuming entity mentions are mostly linked correctly, in order to create a semantic profile of the given text using geospatial and temporal information, as well as fine-grained entity types. This profile is then used to automatically verify each linked mention individually, i.e., to predict whether it has been linked correctly or not. Verification allows leveraging a rich set of global and pairwise features that would be prohibitively expensive for EL systems employing global inference. Evaluation shows consistent improvements across datasets and systems. In particular, when applied to state-of-the-art systems, our method yields an absolute improvement in linking performance of up to 1.7 F1 on AIDA / CoNLL’03 and up to 2.4 F1 on the English TAC KBP 2015 TEDL dataset.