Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media

Malvina Nissim, Viviana Patti, Barbara Plank, Claudia Wagner (Editors)

Anthology ID:
New Orleans, Louisiana, USA
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media
Malvina Nissim | Viviana Patti | Barbara Plank | Claudia Wagner

pdf bib
Social and Emotional Correlates of Capitalization on TwitterTwitter
Sophia Chan | Alona Fyshe

Social media text is replete with unusual capitalization patterns. We posit that capitalizing a token like THIS performs two expressive functions : it marks a person socially, and marks certain parts of an utterance as more salient than others. Focusing on gender and sentiment, we illustrate using a corpus of tweets that capitalization appears in more negative than positive contexts, and is used more by females compared to males. Yet we find that both genders use capitalization in a similar way when expressing sentiment.

pdf bib
The Social and the Neural Network : How to Make Natural Language Processing about People again
Dirk Hovy

Over the years, natural language processing has increasingly focused on tasks that can be solved by statistical models, but ignored the social aspects of language. These limitations are in large part due to historically available data and the limitations of the models, but have narrowed our focus and biased the tools demographically. However, with the increased availability of data sets including socio-demographic information and more expressive (neural) models, we have the opportunity to address both issues. I argue that this combination can broaden the focus of NLP to solve a whole new range of tasks, enable us to generate novel linguistic insights, and provide fairer tools for everyone.

pdf bib
Observational Comparison of Geo-tagged and Randomly-drawn Tweets
Tom Lippincott | Annabelle Carrell

Twitter is a ubiquitous source of micro-blog social media data, providing the academic, industrial, and public sectors real-time access to actionable information. A particularly attractive property of some tweets is * geo-tagging *, where a user account has opted-in to attaching their current location to each message. Unfortunately (from a researcher’s perspective) only a fraction of Twitter accounts agree to this, and these accounts are likely to have systematic diffences with the general population. This work is an exploratory study of these differences across the full range of Twitter content, and complements previous studies that focus on the English-language subset. Additionally, we compare methods for querying users by self-identified properties, finding that the constrained semantics of the description field provides cleaner, higher-volume results than more complex regular expressions.

pdf bib
Understanding the Effect of Gender and Stance in Opinion Expression in Debates on Abortion
Esin Durmus | Claire Cardie

In this paper, we focus on understanding linguistic differences across groups with different self-identified gender and stance in expressing opinions about ABORTION. We provide a new dataset consisting of users’ gender, stance on ABORTION as well as the debates in ABORTION drawn from We use the gender and stance information to identify significant linguistic differences across individuals with different gender and stance. We show the importance of considering the stance information along with the gender since we observe significant linguistic differences across individuals with different stance even within the same gender group.

pdf bib
Frustrated, Polite, or Formal : Quantifying Feelings and Tone in Email
Niyati Chhaya | Kushal Chawla | Tanya Goyal | Projjal Chanda | Jaya Singh

Email conversations are the primary mode of communication in enterprises. The email content expresses an individual’s needs, requirements and intentions. Affective information in the email text can be used to get an insight into the sender’s mood or emotion. We present a novel approach to model human frustration in text. We identify linguistic features that influence human perception of frustration and model it as a supervised learning task. The paper provides a detailed comparison across traditional regression and word distribution-based models. We report a mean-squared error (MSE) of 0.018 against human-annotated frustration for the best performing model. The approach establishes the importance of affect features in frustration prediction for email data. We further evaluate the efficacy of the proposed feature set and model in predicting other tone or affects in text, namely formality and politeness ; results demonstrate a comparable performance against the state-of-the-art baselines.

pdf bib
Reddit : A Gold Mine for Personality PredictionReddit: A Gold Mine for Personality Prediction
Matej Gjurković | Jan Šnajder

Automated personality prediction from social media is gaining increasing attention in natural language processing and social sciences communities. However, due to high labeling costs and privacy issues, the few publicly available datasets are of limited size and low topic diversity. We address this problem by introducing a large-scale dataset derived from Reddit, a source so far overlooked for personality prediction. The dataset is labeled with Myers-Briggs Type Indicators (MBTI) and comes with a rich set of features for more than 9k users. We carry out a preliminary feature analysis, revealing marked differences between the MBTI dimensions and poles. Furthermore, we use the dataset to train and evaluate benchmark personality prediction models, achieving macro F1-scores between 67 % and 82 % on the individual dimensions and 82 % accuracy for exact or one-off accurate type prediction. These results are encouraging and comparable with the reliability of standardized tests.

pdf bib
Predicting Authorship and Author Traits from Keystroke Dynamics
Barbara Plank

Written text transmits a good deal of nonverbal information related to the author’s identity and social factors, such as age, gender and personality. However, it is less known to what extent behavioral biometric traces transmit such information. We use typist data to study the predictiveness of authorship, and present first experiments on predicting both age and gender from keystroke dynamics. Our results show that the model based on keystroke features, while being two orders of magnitude smaller, leads to significantly higher accuracies for authorship than the text-based system. For user attribute prediction, the best approach is to combine the two, suggesting that extralinguistic factors are disclosed to a larger degree in written text, while author identity is better transmitted in typing behavior.

pdf bib
Predicting Twitter User Demographics from Names AloneTwitter User Demographics from Names Alone
Zach Wood-Doughty | Nicholas Andrews | Rebecca Marvin | Mark Dredze

Social media analysis frequently requires tools that can automatically infer demographics to contextualize trends. These tools often require hundreds of user-authored messages for each user, which may be prohibitive to obtain when analyzing millions of users. We explore character-level neural models that learn a representation of a user’s name and screen name to predict gender and ethnicity, allowing for demographic inference with minimal data. We release trained models1 which may enable new demographic analyses that would otherwise require enormous amounts of data collection

pdf bib
Modeling Personality Traits of Filipino Twitter UsersFilipino Twitter Users
Edward Tighe | Charibeth Cheng

Recent studies in the field of text-based personality recognition experiment with different languages, feature extraction techniques, and machine learning algorithms to create better and more accurate models ; however, little focus is placed on exploring the language use of a group of individuals defined by nationality. Individuals of the same nationality share certain practices and communicate certain ideas that can become embedded into their natural language. Many nationals are also not limited to speaking just one language, such as how Filipinos speak Filipino and English, the two national languages of the Philippines. The addition of several regional / indigenous languages, along with the commonness of code-switching, allow for a Filipino to have a rich vocabulary. This presents an opportunity to create a text-based personality model based on how Filipinos speak, regardless of the language they use. To do so, data was collected from 250 Filipino Twitter users. Different combinations of data processing techniques were experimented upon to create personality models for each of the Big Five. The results for both regression and classification show that Conscientiousness is consistently the easiest trait to model, followed by Extraversion. Classification models for Agreeableness and Neuroticism had subpar performances, but performed better than those of Openness. An analysis on personality trait score representation showed that classifying extreme outliers generally produce better results for all traits except for Neuroticism and Openness.

pdf bib
Grounding the Semantics of Part-of-Day Nouns Worldwide using TwitterTwitter
David Vilares | Carlos Gómez-Rodríguez

The usage of part-of-day nouns, such as ‘night’, and their time-specific greetings (‘good night’), varies across languages and cultures. We show the possibilities that Twitter offers for studying the semantics of these terms and its variability between countries. We mine a worldwide sample of multilingual tweets with temporal greetings, and study how their frequencies vary in relation with local time. The results provide insights into the semantics of these temporal expressions and the cultural and sociological factors influencing their usage.