Proceedings of the Fourth Workshop on Online Abuse and Harms

Seyi Akiwowo, Bertie Vidgen, Vinodkumar Prabhakaran, Zeerak Waseem (Editors)


Anthology ID:
2020.alw-1
Month:
November
Year:
2020
Address:
Online
Venues:
ALW | EMNLP
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2020.alw-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote

pdf bib
Proceedings of the Fourth Workshop on Online Abuse and Harms
Seyi Akiwowo | Bertie Vidgen | Vinodkumar Prabhakaran | Zeerak Waseem

pdf bib
Fine-tuning for multi-domain and multi-label uncivil language detection
Kadir Bulut Ozler | Kate Kenski | Steve Rains | Yotam Shmargad | Kevin Coe | Steven Bethard

Incivility is a problem on social media, and it comes in many forms (name-calling, vulgarity, threats, etc.) and domains (microblog posts, online news comments, Wikipedia edits, etc.). Training machine learning models to detect such incivility must handle the multi-label and multi-domain nature of the problem. We present a BERT-based model for incivility detection and propose several approaches for training it for multi-label and multi-domain datasets. We find that individual binary classifiers outperform a joint multi-label classifier, and that simply combining multiple domains of training data outperforms other recently-proposed fine tuning strategies. We also establish new state-of-the-art performance on several incivility detection datasets.

pdf bib
HurtBERT : Incorporating Lexical Features with BERT for the Detection of Abusive LanguageHurtBERT: Incorporating Lexical Features with BERT for the Detection of Abusive Language
Anna Koufakou | Endang Wahyu Pamungkas | Valerio Basile | Viviana Patti

The detection of abusive or offensive remarks in social texts has received significant attention in research. In several related shared tasks, BERT has been shown to be the state-of-the-art. In this paper, we propose to utilize lexical features derived from a hate lexicon towards improving the performance of BERT in such tasks. We explore different ways to utilize the lexical features in the form of lexicon-based encodings at the sentence level or embeddings at the word level. We provide an extensive dataset evaluation that addresses in-domain as well as cross-domain detection of abusive content to render a complete picture. Our results indicate that our proposed models combining BERT with lexical features help improve over a baseline BERT model in many of our in-domain and cross-domain experiments.

pdf bib
Attending the Emotions to Detect Online Abusive Language
Niloofar Safi Samghabadi | Afsheen Hatami | Mahsa Shafaei | Sudipta Kar | Thamar Solorio

In recent years, abusive behavior has become a serious issue in online social networks. In this paper, we present a new corpus for the task of abusive language detection that is collected from a semi-anonymous online platform, and unlike the majority of other available resources, is not created based on a specific list of bad words. We also develop computational models to incorporate emotions into textual cues to improve aggression identification. We evaluate our proposed methods on a set of corpora related to the task and show promising results with respect to abusive language detection.

pdf bib
Countering hate on social media : Large scale classification of hate and counter speech
Joshua Garland | Keyan Ghazi-Zahedi | Jean-Gabriel Young | Laurent Hébert-Dufresne | Mirta Galesic

Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence. A potential solution to this growing global problem is citizen-generated counter speech where citizens actively engage with hate speech to restore civil non-polarized discourse. However, its actual effectiveness in curbing the spread of hatred is unknown and hard to quantify. One major obstacle to researching this question is a lack of large labeled data sets for training automated classifiers to identify counter speech. Here we use a unique situation in Germany where self-labeling groups engaged in organized online hate and counter speech. We use an ensemble learning algorithm which pairs a variety of paragraph embeddings with regularized logistic regression functions to classify both hate and counter speech in a corpus of millions of relevant tweets from these two groups. Our pipeline achieves macro F1 scores on out of sample balanced test sets ranging from 0.76 to 0.97accuracy in line and even exceeding the state of the art. We then use the classifier to discover hate and counter speech in more than 135,000 fully-resolved Twitter conversations occurring from 2013 to 2018 and study their frequency and interaction. Altogether, our results highlight the potential of automated methods to evaluate the impact of coordinated counter speech in stabilizing conversations on social media.

pdf bib
Moderating Our (Dis)Content : Renewing the Regulatory Approach
Claire Pershan

As online platforms become central to our democracies, the problem of toxic content threatens the free flow of information and the enjoyment of fundamental rights. But effective policy response to toxic content must grasp the idiosyncrasies and interconnectedness of content moderation across a fragmented online landscape. This report urges regulators and legislators to consider a range of platforms and moderation approaches in the regulation. In particular, it calls for a holistic, process-oriented regulatory approach that accounts for actors beyond the handful of dominant platforms that currently shape public debate.

pdf bib
Detecting East Asian Prejudice on Social MediaEast Asian Prejudice on Social Media
Bertie Vidgen | Scott Hale | Ella Guest | Helen Margetts | David Broniatowski | Zeerak Waseem | Austin Botelho | Matthew Hall | Rebekah Tromble

During COVID-19 concerns have heightened about the spread of aggressive and hateful language online, especially hostility directed against East Asia and East Asian people. We report on a new dataset and the creation of a machine learning classifier that categorizes social media posts from Twitter into four classes : Hostility against East Asia, Criticism of East Asia, Meta-discussions of East Asian prejudice, and a neutral class. The classifier achieves a macro-F1 score of 0.83. We then conduct an in-depth ground-up error analysis and show that the model struggles with edge cases and ambiguous content. We provide the 20,000 tweet training dataset (annotated by experienced analysts), which also contains several secondary categories and additional flags. We also provide the 40,000 original annotations (before adjudication), the full codebook, annotations for COVID-19 relevance and East Asian relevance and stance for 1,000 hashtags, and the final model.

pdf bib
On Cross-Dataset Generalization in Automatic Detection of Online Abuse
Isar Nejadgholi | Svetlana Kiritchenko

NLP research has attained high performances in abusive language detection as a supervised classification task. While in research settings, training and test datasets are usually obtained from similar data samples, in practice systems are often applied on data that are different from the training set in topic and class distributions. Also, the ambiguity in class definitions inherited in this task aggravates the discrepancies between source and target datasets. We explore the topic bias and the task formulation bias in cross-dataset generalization. We show that the benign examples in the Wikipedia Detox dataset are biased towards platform-specific topics. We identify these examples using unsupervised topic modeling and manual inspection of topics’ keywords. Removing these topics increases cross-dataset generalization, without reducing in-domain classification performance. For a robust dataset design, we suggest applying inexpensive unsupervised methods to inspect the collected data and downsize the non-generalizable content before manually annotating for class labels.

pdf bib
Investigating Annotator Bias with a Graph-Based Approach
Maximilian Wich | Hala Al Kuwatly | Georg Groh

A challenge that many online platforms face is hate speech or any other form of online abuse. To cope with this, hate speech detection systems are developed based on machine learning to reduce manual work for monitoring these platforms. Unfortunately, machine learning is vulnerable to unintended bias in training data, which could have severe consequences, such as a decrease in classification performance or unfair behavior (e.g., discriminating minorities). In the scope of this study, we want to investigate annotator bias a form of bias that annotators cause due to different knowledge in regards to the task and their subjective perception. Our goal is to identify annotation bias based on similarities in the annotation behavior from annotators. To do so, we build a graph based on the annotations from the different annotators, apply a community detection algorithm to group the annotators, and train for each group classifiers whose performances we compare. By doing so, we are able to identify annotator bias within a data set. The proposed method and collected insights can contribute to developing fairer and more reliable hate speech classification models.