Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

David Bamman, Dirk Hovy, David Jurgens, Brendan O'Connor, Svitlana Volkova (Editors)

Anthology ID:
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science
David Bamman | Dirk Hovy | David Jurgens | Brendan O'Connor | Svitlana Volkova

pdf bib
Swimming with the Tide? Positional Claim Detection across Political Text Types
Nico Blokker | Erenay Dayanik | Gabriella Lapesa | Sebastian Padó

Manifestos are official documents of political parties, providing a comprehensive topical overview of the electoral programs. Voters, however, seldom read them and often prefer other channels, such as newspaper articles, to understand the party positions on various policy issues. The natural question to ask is how compatible these two formats (manifesto and newspaper reports) are in their representation of party positioning. We address this question with an approach that combines political science (manual annotation and analysis) and natural language processing (supervised claim identification) in a cross-text type setting : we train a classifier on annotated newspaper data and test its performance on manifestos. Our findings show a) strong performance for supervised classification even across text types and b) a substantive overlap between the two formats in terms of party positioning, with differences regarding the salience of specific issues.

pdf bib
Identifying Worry in Twitter : Beyond Emotion AnalysisTwitter: Beyond Emotion Analysis
Reyha Verma | Christian von der Weth | Jithin Vachery | Mohan Kankanhalli

Identifying the worries of individuals and societies plays a crucial role in providing social support and enhancing policy decision-making. Due to the popularity of social media platforms such as Twitter, users share worries about personal issues (e.g., health, finances, relationships) and broader issues (e.g., changes in society, environmental concerns, terrorism) freely. In this paper, we explore and evaluate a wide range of machine learning models to predict worry on Twitter. While this task has been closely associated with emotion prediction, we argue and show that identifying worry needs to be addressed as a separate task given the unique challenges associated with it. We conduct a user study to provide evidence that social media posts express two basic kinds of worry normative and pathological as stated in psychology literature. In addition, we show that existing emotion detection techniques underperform, especially while capturing normative worry. Finally, we discuss the current limitations of our approach and propose future applications of the worry identification system.

pdf bib
Recalibrating classifiers for interpretable abusive content detection
Bertie Vidgen | Scott Hale | Sam Staton | Tom Melham | Helen Margetts | Ohad Kammar | Marcin Szymczak

We investigate the use of machine learning classifiers for detecting online abuse in empirical research. We show that uncalibrated classifiers (i.e. where the ‘raw’ scores are used) align poorly with human evaluations. This limits their use for understanding the dynamics, patterns and prevalence of online abuse. We examine two widely used classifiers (created by Perspective and Davidson et al.) on a dataset of tweets directed against candidates in the UK’s 2017 general election. A Bayesian approach is presented to recalibrate the raw scores from the classifiers, using probabilistic programming and newly annotated data. We argue that interpretability evaluation and recalibration is integral to the application of abusive content classifiers.

pdf bib
Mapping Local News Coverage : Precise location extraction in textual news content using fine-tuned BERT based language modelBERT based language model
Sarang Gupta | Kumari Nishu

Mapping local news coverage from textual content is a challenging problem that requires extracting precise location mentions from news articles. While traditional named entity taggers are able to extract geo-political entities and certain non geo-political entities, they can not recognize precise location mentions such as addresses, streets and intersections that are required to accurately map the news article. We fine-tune a BERT-based language model for achieving high level of granularity in location extraction. We incorporate the model into an end-to-end tool that further geocodes the extracted locations for the broader objective of mapping news coverage.

pdf bib
Foreigner-directed speech is simpler than native-directed : Evidence from social media
Aleksandrs Berdicevskis

I test two hypotheses that play an important role in modern sociolinguistics and language evolution studies : first, that non-native production is simpler than native ; second, that production addressed to non-native speakers is simpler than that addressed to natives. The second hypothesis is particularly important for theories about contact-induced simplification, since the accommodation to non-natives may explain how the simplification can spread from adult learners to the whole community. To test the hypotheses, I create a very large corpus of native and non-native written speech in four languages (English, French, Italian, Spanish), extracting data from an internet forum where native languages of the participants are known and the structure of the interactions can be inferred. The corpus data yield inconsistent evidence with respect to the first hypothesis, but largely support the second one, suggesting that foreigner-directed speech is indeed simpler than native-directed. Importantly, when testing the first hypothesis, I contrast production of different speakers, which can introduce confounds and is a likely reason for the inconsistencies. When testing the second hypothesis, the comparison is always within the production of the same speaker (but with different addressees), which makes it more reliable.

pdf bib
Understanding Weekly COVID-19 Concerns through Dynamic Content-Specific LDA Topic ModelingCOVID-19 Concerns through Dynamic Content-Specific LDA Topic Modeling
Mohammadzaman Zamani | H. Andrew Schwartz | Johannes Eichstaedt | Sharath Chandra Guntuku | Adithya Virinchipuram Ganesan | Sean Clouston | Salvatore Giorgi

The novelty and global scale of the COVID-19 pandemic has lead to rapid societal changes in a short span of time. As government policy and health measures shift, public perceptions and concerns also change, an evolution documented within discourse on social media. We propose a dynamic content-specific LDA topic modeling technique that can help to identify different domains of COVID-specific discourse that can be used to track societal shifts in concerns or views. Our experiments show that these model-derived topics are more coherent than standard LDA topics, and also provide new features that are more helpful in prediction of COVID-19 related outcomes including social mobility and unemployment rate.

pdf bib
Emoji and Self-Identity in Twitter BiosTwitter Bios
Jinhang Li | Giorgos Longinos | Steven Wilson | Walid Magdy

Emoji are widely used to express emotions and concepts on social media, and prior work has shown that users’ choice of emoji reflects the way that they wish to present themselves to the world. Emoji usage is typically studied in the context of posts made by users, and this view has provided important insights into phenomena such as emotional expression and self-representation. In addition to making posts, however, social media platforms like Twitter allow for users to provide a short bio, which is an opportunity to briefly describe their account as a whole. In this work, we focus on the use of emoji in these bio statements. We explore the ways in which users include emoji in these self-descriptions, finding different patterns than those observed around emoji usage in tweets. We examine the relationships between emoji used in bios and the content of users’ tweets, showing that the topics and even the average sentiment of tweets varies for users with different emoji in their bios. Lastly, we confirm that homophily effects exist with respect to the types of emoji that are included in bios of users and their followers.