Jacob Eisenstein


pdf bib
Learning to Recognize Dialect Features
Dorottya Demszky | Devyani Sharma | Jonathan Clark | Vinodkumar Prabhakaran | Jacob Eisenstein
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Building NLP systems that serve everyone requires accounting for dialect differences. But dialects are not monolithic entities : rather, distinctions between and within dialects are captured by the presence, absence, and frequency of dozens of dialect features in speech and text, such as the deletion of the copula in He running. In this paper, we introduce the task of dialect feature detection, and present two multitask learning approaches, both based on pretrained transformers. For most dialects, large-scale annotated corpora for these features are unavailable, making it difficult to train recognizers. We train our models on a small number of minimal pairs, building on how linguists typically define dialect features. Evaluation on a test set of 22 dialect features of Indian English demonstrates that these models learn to recognize many features with high accuracy, and that a few minimal pairs can be as effective for training as thousands of labeled examples. We also demonstrate the downstream applicability of dialect feature detection both as a measure of dialect density and as a dialect classifier.

pdf bib
Proceedings of the First Workshop on Causal Inference and NLP
Amir Feder | Katherine Keith | Emaad Manzoor | Reid Pryzant | Dhanya Sridhar | Zach Wood-Doughty | Jacob Eisenstein | Justin Grimmer | Roi Reichart | Molly Roberts | Uri Shalit | Brandon Stewart | Victor Veitch | Diyi Yang
Proceedings of the First Workshop on Causal Inference and NLP


pdf bib
AdvAug : Robust Adversarial Augmentation for Neural Machine TranslationAdvAug: Robust Adversarial Augmentation for Neural Machine Translation
Yong Cheng | Lu Jiang | Wolfgang Macherey | Jacob Eisenstein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

In this paper, we propose a new adversarial augmentation method for Neural Machine Translation (NMT). The main idea is to minimize the vicinal risk over virtual sentences sampled from two vicinity distributions, in which the crucial one is a novel vicinity distribution for adversarial sentences that describes a smooth interpolated embedding space centered around observed training sentence pairs. We then discuss our approach, AdvAug, to train NMT models using the embeddings of virtual sentences in sequence-to-sequence learning. Experiments on Chinese-English, English-French, and English-German translation benchmarks show that AdvAug achieves significant improvements over theTransformer (up to 4.9 BLEU points), and substantially outperforms other data augmentation techniques (e.g.back-translation) without using extra corpora.


pdf bib
Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation
Vladimir Karpukhin | Omer Levy | Jacob Eisenstein | Marjan Ghazvininejad
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

Contemporary machine translation systems achieve greater coverage by applying subword models such as BPE and character-level CNNs, but these methods are highly sensitive to orthographical variations such as spelling mistakes. We show how training on a mild amount of random synthetic noise can dramatically improve robustness to these variations, without diminishing performance on clean text. We focus on translation performance on natural typos, and show that robustness to such noise can be achieved using a balanced diet of simple synthetic noises at training time, without access to the natural noise data or distribution.

pdf bib
Character Eyes : Seeing Language through Character-Level Taggers
Yuval Pinter | Marc Marone | Jacob Eisenstein
Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

Character-level models have been used extensively in recent years in NLP tasks as both supplements and replacements for closed-vocabulary token-level word representations. In one popular architecture, character-level LSTMs are used to feed token representations into a sequence tagger predicting token-level annotations such as part-of-speech (POS) tags. In this work, we examine the behavior of POS taggers across languages from the perspective of individual hidden units within the character LSTM. We aggregate the behavior of these units into language-level metrics which quantify the challenges that taggers face on languages with different morphological properties, and identify links between synthesis and affixation preference and emergent behavior of the hidden tagger layer. In a comparative experiment, we show how modifying the balance between forward and backward hidden units affects model arrangement and performance in these types of languages.

pdf bib
Measuring and Modeling Language Change
Jacob Eisenstein
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials

This tutorial is designed to help researchers answer the following sorts of questions :-Are people happier on the weekend?-What was 1861’s word of the year?-Are Democrats and Republicans more different than ever?-When did gay stop meaning happy?-Are gender stereotypes getting weaker, stronger, or just different?-Who is a linguistic leader?-How can we get internet users to be more polite and objective? Such questions are fundamental to the social sciences and humanities, and scholars in these disciplines are increasingly turning to computational techniques for answers. Meanwhile, the ACL community is increasingly engaged with data that varies across time, and with the social insights that can be offered by analyzing temporal patterns and trends. The purpose of this tutorial is to facilitate this convergence in two main ways : 1. By synthesizing recent computational techniques for handling and modeling temporal data, such as dynamic word embeddings, the tutorial will provide a starting point for future computational research. It will also identify useful tools for social scientists and digital humanities scholars. The tutorial will provide an overview of techniques and datasets from the quantitative social sciences and the digital humanities, which are not well-known in the computational linguistics community. These techniques include vector autoregressive models, multiple comparisons corrections for hypothesis testing, and causal inference. Datasets include historical newspaper archives and corpora of contemporary political speech.


pdf bib
Making fetch happen : The influence of social and linguistic context on nonstandard word growth and decline
Ian Stewart | Jacob Eisenstein
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In an online community, new words come and go : today’s haha may be replaced by tomorrow’s lol. Changes in online writing are usually studied as a social process, with innovations diffusing through a network of individuals in a speech community. But unlike other types of innovation, language change is shaped and constrained by the grammatical system in which it takes part. To investigate the role of social and structural factors in language change, we undertake a large-scale analysis of the frequencies of non-standard words in Reddit. Dissemination across many linguistic contexts is a predictor of success : words that appear in more linguistic contexts grow faster and survive longer. Furthermore, social dissemination plays a less important role in explaining word growth and decline than previously hypothesized.

pdf bib
Interactional Stancetaking in Online Forums
Scott F. Kiesling | Umashanthi Pavalanathan | Jim Fitzpatrick | Xiaochuang Han | Jacob Eisenstein
Computational Linguistics, Volume 44, Issue 4 - December 2018

Language is shaped by the relationships between the speaker / writer and the audience, the object of discussion, and the talk itself. In turn, language is used to reshape these relationships over the course of an interaction. Computational researchers have succeeded in operationalizing sentiment, formality, and politeness, but each of these constructs captures only some aspects of social and relational meaning. Theories of interactional stancetaking have been put forward as holistic accounts, but until now, these theories have been applied only through detailed qualitative analysis of (portions of) a few individual conversations. In this article, we propose a new computational operationalization of interpersonal stancetaking. We begin with annotations of three linked stance dimensionsaffect, investment, and alignmenton 68 conversation threads from the online platform Reddit. Using these annotations, we investigate thread structure and linguistic properties of stancetaking in online conversations. We identify lexical features that characterize the extremes along each stancetaking dimension, and show that these stancetaking properties can be predicted with moderate accuracy from bag-of-words features, even with a relatively small labeled training set. These quantitative analyses are supplemented by extensive qualitative analysis, highlighting the compatibility of computational and qualitative methods in synthesizing evidence about the creation of interactional meaning.

pdf bib
Stylistic Variation in Social Media Part-of-Speech Tagging
Murali Raghu Babu Balusu | Taha Merghani | Jacob Eisenstein
Proceedings of the Second Workshop on Stylistic Variation

Social media features substantial stylistic variation, raising new challenges for syntactic analysis of online writing. However, this variation is often aligned with author attributes such as age, gender, and geography, as well as more readily-available social network metadata. In this paper, we report new evidence on the link between language and social networks in the task of part-of-speech tagging. We find that tagger error rates are correlated with network structure, with high accuracy in some parts of the network, and lower accuracy elsewhere. As a result, tagger accuracy depends on training from a balanced sample of the network, rather than training on texts from a narrow subcommunity. We also describe our attempts to add robustness to stylistic variation, by building a mixture-of-experts model in which each expert is associated with a region of the social network. While prior work found that similar approaches yield performance improvements in sentiment analysis and entity linking, we were unable to obtain performance improvements in part-of-speech tagging, despite strong evidence for the link between part-of-speech error rates and social network structure.

pdf bib
Explainable Prediction of Medical Codes from Clinical Text
James Mullenbach | Sarah Wiegreffe | Jon Duke | Jimeng Sun | Jacob Eisenstein
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Clinical notes are text documents that are created by clinicians for each patient encounter. They are typically accompanied by medical codes, which describe the diagnosis and treatment. Annotating these codes is labor intensive and error prone ; furthermore, the connection between the codes and the text is not annotated, obscuring the reasons and details behind specific diagnoses and treatments. We present an attentional convolutional network that predicts medical codes from clinical text. Our method aggregates information across the document using a convolutional neural network, and uses an attention mechanism to select the most relevant segments for each of the thousands of possible codes. The method is accurate, achieving precision@8 of 0.71 and a Micro-F1 of 0.54, which are both better than the prior state of the art. Furthermore, through an interpretability evaluation by a physician, we show that the attention mechanism identifies meaningful explanations for each code assignment.

pdf bib
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Yoav Artzi | Jacob Eisenstein
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts


pdf bib
Overcoming Language Variation in Sentiment Analysis with Social Attention
Yi Yang | Jacob Eisenstein
Transactions of the Association for Computational Linguistics, Volume 5

Variation in language is ubiquitous, particularly in newer forms of writing such as social media. Fortunately, variation is not random ; it is often linked to social properties of the author. In this paper, we show how to exploit social networks to make sentiment analysis more robust to social language variation. The key idea is linguistic homophily : the tendency of socially linked individuals to use language in similar ways. We formalize this idea in a novel attention-based neural network architecture, in which attention is divided among several basis models, depending on the author’s position in the social network. This has the effect of smoothing the classification function across the social network, and makes it possible to induce personalized classifiers even for authors for whom there is no labeled data or demographic metadata. This model significantly improves the accuracies of sentiment analysis on Twitter and on review data.

pdf bib
A Kernel Independence Test for Geographical Language Variation
Dong Nguyen | Jacob Eisenstein
Computational Linguistics, Volume 43, Issue 3 - September 2017

Quantifying the degree of spatial dependence for linguistic variables is a key task for analyzing dialectal variation. However, existing approaches have important drawbacks. First, they are based on parametric models of dependence, which limits their power in cases where the underlying parametric assumptions are violated. Second, they are not applicable to all types of linguistic data : Some approaches apply only to frequencies, others to boolean indicators of whether a linguistic variable is present. We present a new method for measuring geographical language variation, which solves both of these problems. Our approach builds on Reproducing Kernel Hilbert Space (RKHS) representations for nonparametric statistics, and takes the form of a test statistic that is computed from pairs of individual geotagged observations without aggregation into predefined geographical bins. We compare this test with prior work using synthetic data as well as a diverse set of real data sets : a corpus of Dutch tweets, a Dutch syntactic atlas, and a data set of letters to the editor in North American newspapers. Our proposed test is shown to support robust inferences across a broad range of scenarios and types of data.

pdf bib
Mimicking Word Embeddings using Subword RNNsRNNs
Yuval Pinter | Robert Guthrie | Jacob Eisenstein
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP tasks is limited by out-of-vocabulary (OOV) words, for which embeddings do not exist. In this paper, we present MIMICK, an approach to generating OOV word embeddings compositionally, by learning a function from spellings to distributional embeddings. Unlike prior work, MIMICK does not require re-training on the original word embedding corpus ; instead, learning is performed at the type level. Intrinsic and extrinsic evaluations demonstrate the power of this simple approach. On 23 languages, MIMICK improves performance over a word-based baseline for tagging part-of-speech and morphosyntactic attributes. It is competitive with (and complementary to) a supervised character-based model in low resource settings.