Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Orphee De Clercq, Alexandra Balahur, Joao Sedoc, Valentin Barriere, Shabnam Tafreshi, Sven Buechel, Veronique Hoste (Editors)


Anthology ID:
2021.wassa-1
Month:
April
Year:
2021
Address:
Online
Venues:
EACL | WASSA
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2021.wassa-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Orphee De Clercq | Alexandra Balahur | Joao Sedoc | Valentin Barriere | Shabnam Tafreshi | Sven Buechel | Veronique Hoste

pdf bib
Emotion Ratings : How Intensity, Annotation Confidence and Agreements are Entangled
Enrica Troiano | Sebastian Padó | Roman Klinger

When humans judge the affective content of texts, they also implicitly assess the correctness of such judgment, that is, their confidence. We hypothesize that people’s (in)confidence that they performed well in an annotation task leads to (dis)agreements among each other. If this is true, confidence may serve as a diagnostic tool for systematic differences in annotations. To probe our assumption, we conduct a study on a subset of the Corpus of Contemporary American English, in which we ask raters to distinguish neutral sentences from emotion-bearing ones, while scoring the confidence of their answers. Confidence turns out to approximate inter-annotator disagreements. Further, we find that confidence is correlated to emotion intensity : perceiving stronger affect in text prompts annotators to more certain classification performances. This insight is relevant for modelling studies of intensity, as it opens the question wether automatic regressors or classifiers actually predict intensity, or rather human’s self-perceived confidence.

pdf bib
Disentangling Document Topic and Author Gender in Multiple Languages : Lessons for Adversarial Debiasing
Erenay Dayanik | Sebastian Padó

Text classification is a central tool in NLP. However, when the target classes are strongly correlated with other textual attributes, text classification models can pick up wrong features, leading to bad generalization and biases. In social media analysis, this problem surfaces for demographic user classes such as language, topic, or gender, which influence the generate text to a substantial extent. Adversarial training has been claimed to mitigate this problem, but thorough evaluation is missing. In this paper, we experiment with text classification of the correlated attributes of document topic and author gender, using a novel multilingual parallel corpus of TED talk transcripts. Our findings are : (a) individual classifiers for topic and author gender are indeed biased ; (b) debiasing with adversarial training works for topic, but breaks down for author gender ; (c) gender debiasing results differ across languages. We interpret the result in terms of feature space overlap, highlighting the role of linguistic surface realization of the target classes.

pdf bib
Universal Joy A Data Set and Results for Classifying Emotions Across Languages
Sotiris Lamprinidis | Federico Bianchi | Daniel Hardt | Dirk Hovy

While emotions are universal aspects of human psychology, they are expressed differently across different languages and cultures. We introduce a new data set of over 530k anonymized public Facebook posts across 18 languages, labeled with five different emotions. Using multilingual BERT embeddings, we show that emotions can be reliably inferred both within and across languages. Zero-shot learning produces promising results for low-resource languages. Following established theories of basic emotions, we provide a detailed analysis of the possibilities and limits of cross-lingual emotion classification. We find that structural and typological similarity between languages facilitates cross-lingual learning, as well as linguistic diversity of training data. Our results suggest that there are commonalities underlying the expression of emotion in different languages. We publicly release the anonymized data for future research.

pdf bib
An End-to-End Network for Emotion-Cause Pair Extraction
Aaditya Singh | Shreeshail Hingane | Saim Wani | Ashutosh Modi

The task of Emotion-Cause Pair Extraction (ECPE) aims to extract all potential clause-pairs of emotions and their corresponding causes in a document. Unlike the more well-studied task of Emotion Cause Extraction (ECE), ECPE does not require the emotion clauses to be provided as annotations. Previous works on ECPE have either followed a multi-stage approach where emotion extraction, cause extraction, and pairing are done independently or use complex architectures to resolve its limitations. In this paper, we propose an end-to-end model for the ECPE task. Due to the unavailability of an English language ECPE corpus, we adapt the NTCIR-13 ECE corpus and establish a baseline for the ECPE task on this dataset. On this dataset, the proposed method produces significant performance improvements (6.5 % increase in F1 score) over the multi-stage approach and achieves comparable performance to the state-of-the-art methods.

pdf bib
PVG at WASSA 2021 : A Multi-Input, Multi-Task, Transformer-Based Architecture for Empathy and Distress PredictionPVG at WASSA 2021: A Multi-Input, Multi-Task, Transformer-Based Architecture for Empathy and Distress Prediction
Atharva Kulkarni | Sunanda Somwase | Shivam Rajput | Manisha Marathe

Active research pertaining to the affective phenomenon of empathy and distress is invaluable for improving human-machine interaction. Predicting intensities of such complex emotions from textual data is difficult, as these constructs are deeply rooted in the psychological theory. Consequently, for better prediction, it becomes imperative to take into account ancillary factors such as the psychological test scores, demographic features, underlying latent primitive emotions, along with the text’s undertone and its psychological complexity. This paper proffers team PVG’s solution to the WASSA 2021 Shared Task on Predicting Empathy and Emotion in Reaction to News Stories. Leveraging the textual data, demographic features, psychological test score, and the intrinsic interdependencies of primitive emotions and empathy, we propose a multi-input, multi-task framework for the task of empathy score prediction. Here, the empathy score prediction is considered the primary task, while emotion and empathy classification are considered secondary auxiliary tasks. For the distress score prediction task, the system is further boosted by the addition of lexical features. Our submission ranked 1st based on the average correlation (0.545) as well as the distress correlation (0.574), and 2nd for the empathy Pearson correlation (0.517).

pdf bib
WASSA@IITK at WASSA 2021 : Multi-task Learning and Transformer Finetuning for Emotion Classification and Empathy PredictionWASSA@IITK at WASSA 2021: Multi-task Learning and Transformer Finetuning for Emotion Classification and Empathy Prediction
Jay Mundra | Rohan Gupta | Sagnik Mukherjee

This paper describes our contribution to the WASSA 2021 shared task on Empathy Prediction and Emotion Classification. The broad goal of this task was to model an empathy score, a distress score and the overall level of emotion of an essay written in response to a newspaper article associated with harm to someone. We have used the ELECTRA model abundantly and also advanced deep learning approaches like multi-task learning. Additionally, we also leveraged standard machine learning techniques like ensembling. Our system achieves a Pearson Correlation Coefficient of 0.533 on sub-task I and a macro F1 score of 0.5528 on sub-task II. We ranked 1st in Emotion Classification sub-task and 3rd in Empathy Prediction sub-task.

pdf bib
Exploring Stylometric and Emotion-Based Features for Multilingual Cross-Domain Hate Speech Detection
Ilia Markov | Nikola Ljubešić | Darja Fišer | Walter Daelemans

In this paper, we describe experiments designed to evaluate the impact of stylometric and emotion-based features on hate speech detection : the task of classifying textual content into hate or non-hate speech classes. Our experiments are conducted for three languages English, Slovene, and Dutch both in in-domain and cross-domain setups, and aim to investigate hate speech using features that model two linguistic phenomena : the writing style of hateful social media content operationalized as function word usage on the one hand, and emotion expression in hateful messages on the other hand. The results of experiments with features that model different combinations of these phenomena support our hypothesis that stylometric and emotion-based features are robust indicators of hate speech. Their contribution remains persistent with respect to domain and language variation. We show that the combination of features that model the targeted phenomena outperforms words and character n-gram features under cross-domain conditions, and provides a significant boost to deep learning models, which currently obtain the best results, when combined with them in an ensemble.

pdf bib
Creating and Evaluating Resources for Sentiment Analysis in the Low-resource Language : SindhiSindhi
Wazir Ali | Naveed Ali | Yong Dai | Jay Kumar | Saifullah Tumrani | Zenglin Xu

In this paper, we develop Sindhi subjective lexicon using a merger of existing English resources : NRC lexicon, list of opinion words, SentiWordNet, Sindhi-English bilingual dictionary, and collection of Sindhi modifiers. The positive or negative sentiment score is assigned to each Sindhi opinion word. Afterwards, we determine the coverage of the proposed lexicon with subjectivity analysis. Moreover, we crawl multi-domain tweet corpus of news, sports, and finance. The crawled corpus is annotated by experienced annotators using the Doccano text annotation tool. The sentiment annotated corpus is evaluated by employing support vector machine (SVM), recurrent neural network (RNN) variants, and convolutional neural network (CNN).

pdf bib
L3CubeMahaSent : A Marathi Tweet-based Sentiment Analysis DatasetL3CubeMahaSent: A Marathi Tweet-based Sentiment Analysis Dataset
Atharva Kulkarni | Meet Mandhane | Manali Likhitkar | Gayatri Kshirsagar | Raviraj Joshi

Sentiment analysis is one of the most fundamental tasks in Natural Language Processing. Popular languages like English, Arabic, Russian, Mandarin, and also Indian languages such as Hindi, Bengali, Tamil have seen a significant amount of work in this area. However, the Marathi language which is the third most popular language in India still lags behind due to the absence of proper datasets. In this paper, we present the first major publicly available Marathi Sentiment Analysis Dataset-L3CubeMahaSent. It is curated using tweets extracted from various Maharashtrian personalities’ Twitter accounts. Our dataset consists of ~16,000 distinct tweets classified in three broad classes viz. positive, negative, and neutral. We also present the guidelines using which we annotated the tweets. Finally, we present the statistics of our dataset and baseline classification results using CNN, LSTM, ULMFiT, and BERT based models.

pdf bib
Me, myself, and ire : Effects of automatic transcription quality on emotion, sarcasm, and personality detection
John Culnan | Seongjin Park | Meghavarshini Krishnaswamy | Rebecca Sharp

In deployment, systems that use speech as input must make use of automated transcriptions. Yet, typically when these systems are evaluated, gold transcriptions are assumed. We explicitly examine the impact of transcription errors on the downstream performance of a multi-modal system on three related tasks from three datasets : emotion, sarcasm, and personality detection. We include three separate transcription tools and show that while all automated transcriptions propagate errors that substantially impact downstream performance, the open-source tools fair worse than the paid tool, though not always straightforwardly, and word error rates do not correlate well with downstream performance. We further find that the inclusion of audio features partially mitigates transcription errors, but that a naive usage of a multi-task setup does not.

pdf bib
Emotional RobBERT and Insensitive BERTje : Combining Transformers and Affect Lexica for Dutch Emotion DetectionRobBERT and Insensitive BERTje: Combining Transformers and Affect Lexica for Dutch Emotion Detection
Luna De Bruyne | Orphee De Clercq | Veronique Hoste

In a first step towards improving Dutch emotion detection, we try to combine the Dutch transformer models BERTje and RobBERT with lexicon-based methods. We propose two architectures : one in which lexicon information is directly injected into the transformer model and a meta-learning approach where predictions from transformers are combined with lexicon features. The models are tested on 1,000 Dutch tweets and 1,000 captions from TV-shows which have been manually annotated with emotion categories and dimensions. We find that RobBERT clearly outperforms BERTje, but that directly adding lexicon information to transformers does not improve performance. In the meta-learning approach, lexicon information does have a positive effect on BERTje, but not on RobBERT. This suggests that more emotional information is already contained within this latter language model.

pdf bib
EmpNa at WASSA 2021 : A Lightweight Model for the Prediction of Empathy, Distress and Emotions from Reactions to News StoriesEmpNa at WASSA 2021: A Lightweight Model for the Prediction of Empathy, Distress and Emotions from Reactions to News Stories
Giuseppe Vettigli | Antonio Sorgente

This paper describes our submission for the WASSA 2021 shared task regarding the prediction of empathy, distress and emotions from news stories. The solution is based on combining the frequency of words, lexicon-based information, demographics of the annotators and personality of the annotators into a linear model. The prediction of empathy and distress is performed using Linear Regression while the prediction of emotions is performed using Logistic Regression. Both tasks are performed using the same features. Our models rank 4th for the prediction of emotions and 2nd for the prediction of empathy and distress. These results are particularly interesting when considered that the computational requirements of the solution are minimal.