Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion

Bharathi Raja Chakravarthi, John P. McCrae, Manel Zarrouk, Kalika Bali, Paul Buitelaar (Editors)

Anthology ID:
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion
Bharathi Raja Chakravarthi | John P. McCrae | Manel Zarrouk | Kalika Bali | Paul Buitelaar

pdf bib
Impact of COVID-19 in Natural Language Processing Publications : a Disaggregated Study in Gender, Contribution and ExperienceCOVID-19 in Natural Language Processing Publications: a Disaggregated Study in Gender, Contribution and Experience
Christine Basta | Marta R. Costa-jussa

This study sheds light on the effects of COVID-19 in the particular field of Computational Linguistics and Natural Language Processing within Artificial Intelligence. We provide an inter-sectional study on gender, contribution, and experience that considers one school year (from August 2019 to August 2020) as a pandemic year. August is included twice for the purpose of an inter-annual comparison. While the trend in publications increased with the crisis, the results show that the ratio between female and male publications decreased. This only helps to reduce the importance of the female role in the scientific contributions of computational linguistics (it is now far below its peak of 0.24). The pandemic has a particularly negative effect on the production of female senior researchers in the first position of authors (maximum work), followed by the female junior researchers in the last position of authors (supervision or collaborative work).

pdf bib
hBERT + BiasCorp-Fighting Racism on the WebBERT + BiasCorp - Fighting Racism on the Web
Olawale Onabola | Zhuang Ma | Xie Yang | Benjamin Akera | Ibraheem Abdulrahman | Jia Xue | Dianbo Liu | Yoshua Bengio

Subtle and overt racism is still present both in physical and online communities today and has impacted many lives in different segments of the society. In this short piece of work, we present how we’re tackling this societal issue with Natural Language Processing. We are releasing BiasCorp, a dataset containing 139,090 comments and news segment from three specific sources-Fox News, BreitbartNews and YouTube. The first batch (45,000 manually annotated) is ready for publication. We are currently in the final phase of manually labeling the remaining dataset using Amazon Mechanical Turk. BERT has been used widely in several downstream tasks. In this work, we present hBERT, where we modify certain layers of the pretrained BERT model with the new Hopfield Layer. hBert generalizes well across different distributions with the added advantage of a reduced model complexity. We are also releasing a JavaScript library 3 and a Chrome Extension Application, to help developers make use of our trained model in web applications (say chat application) and for users to identify and report racially biased contents on the web respectively

pdf bib
An Overview of Fairness in Data Illuminating the Bias in Data Pipeline
Senthil Kumar B | Aravindan Chandrabose | Bharathi Raja Chakravarthi

Data in general encodes human biases by default ; being aware of this is a good start, and the research around how to handle it is ongoing. The term ‘bias’ is extensively used in various contexts in NLP systems. In our research the focus is specific to biases such as gender, racism, religion, demographic and other intersectional views on biases that prevail in text processing systems responsible for systematically discriminating specific population, which is not ethical in NLP. These biases exacerbate the lack of equality, diversity and inclusion of specific population while utilizing the NLP applications. The tools and technology at the intermediate level utilize biased data, and transfer or amplify this bias to the downstream applications. However, it is not enough to be colourblind, gender-neutral alone when designing a unbiased technology instead, we should take a conscious effort by designing a unified framework to measure and benchmark the bias. In this paper, we recommend six measures and one augment measure based on the observations of the bias in data, annotations, text representations and debiasing techniques.

pdf bib
GEPSA, a tool for monitoring social challenges in digital pressGEPSA, a tool for monitoring social challenges in digital press
Iñaki San Vicente | Xabier Saralegi | Nerea Zubia

This papers presents a platform for monitoring press narratives with respect to several social challenges, including gender equality, migrations and minority languages. As narratives are encoded in natural language, we have to use natural processing techniques to automate their analysis. Thus, crawled news are processed by means of several NLP modules, including named entity recognition, keyword extraction, document classification for social challenge detection, and sentiment analysis. A Flask powered interface provides data visualization for a user-based analysis of the data. This paper presents the architecture of the system and describes in detail its different components. Evaluation is provided for the modules related to extraction and classification of information regarding social challenges.

pdf bib
Finding Spoiler Bias in Tweets by Zero-shot Learning and Knowledge Distilling from Neural Text Simplification
Avi Bleiweiss

Automatic detection of critical plot information in reviews of media items poses unique challenges to both social computing and computational linguistics. In this paper we propose to cast the problem of discovering spoiler bias in online discourse as a text simplification task. We conjecture that for an item-user pair, the simpler the user review we learn from an item summary the higher its likelihood to present a spoiler. Our neural model incorporates the advanced transformer network to rank the severity of a spoiler in user tweets. We constructed a sustainable high-quality movie dataset scraped from unsolicited review tweets and paired with a title summary and meta-data extracted from a movie specific domain. To a large extent, our quantitative and qualitative results weigh in on the performance impact of named entity presence in plot summaries. Pretrained on a split-and-rephrase corpus with knowledge distilled from English Wikipedia and fine-tuned on our movie dataset, our neural model shows to outperform both a language modeler and monolingual translation baselines.

pdf bib
IIITT@LT-EDI-EACL2021-Hope Speech Detection : There is always hope in TransformersIIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always hope in Transformers
Karthik Puranik | Adeep Hande | Ruba Priyadharshini | Sajeetha Thavareesan | Bharathi Raja Chakravarthi

In a world with serious challenges like climate change, religious and political conflicts, global pandemics, terrorism, and racial discrimination, an internet full of hate speech, abusive and offensive content is the last thing we desire for. In this paper, we work to identify and promote positive and supportive content on these platforms. We work with several transformer-based models to classify social media comments as hope speech or not hope speech in English, Malayalam, and Tamil languages. This paper portrays our work for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI 2021- EACL 2021. The codes for our best submission can be viewed.

pdf bib
ZYJ@LT-EDI-EACL2021 : XLM-RoBERTa-Based Model with Attention for Hope Speech DetectionZYJ@LT-EDI-EACL2021:XLM-RoBERTa-Based Model with Attention for Hope Speech Detection
Yingjia Zhao | Xin Tao

Due to the development of modern computer technology and the increase in the number of online media users, we can see all kinds of posts and comments everywhere on the internet. Hope speech can not only inspire the creators but also make other viewers pleasant. It is necessary to effectively and automatically detect hope speech. This paper describes the approach of our team in the task of hope speech detection. We use the attention mechanism to adjust the weight of all the output layers of XLM-RoBERTa to make full use of the information extracted from each layer, and use the weighted sum of all the output layers to complete the classification task. And we use the Stratified-K-Fold method to enhance the training data set. We achieve a weighted average F1-score of 0.59, 0.84, and 0.92 for Tamil, Malayalam, and English language, ranked 3rd, 2nd, and 2nd.

pdf bib
TeamUNCC@LT-EDI-EACL2021 : Hope Speech Detection using Transfer Learning with TransformersTeamUNCC@LT-EDI-EACL2021: Hope Speech Detection using Transfer Learning with Transformers
Khyati Mahajan | Erfan Al-Hossami | Samira Shaikh

In this paper, we describe our approach towards utilizing pre-trained models for the task of hope speech detection. We participated in Task 2 : Hope Speech Detection for Equality, Diversity and Inclusion at LT-EDI-2021 @ EACL2021. The goal of this task is to predict the presence of hope speech, along with the presence of samples that do not belong to the same language in the dataset. We describe our approach to fine-tuning RoBERTa for Hope Speech detection in English and our approach to fine-tuning XLM-RoBERTa for Hope Speech detection in Tamil and Malayalam, two low resource Indic languages. We demonstrate the performance of our approach on classifying text into hope-speech, non-hope and not-language. Our approach ranked 1st in English (F1 = 0.93), 1st in Tamil (F1 = 0.61) and 3rd in Malayalam (F1 = 0.83).

pdf bib
Autobots@LT-EDI-EACL2021 : One World, One Family : Hope Speech Detection with BERT Transformer ModelLT-EDI-EACL2021: One World, One Family: Hope Speech Detection with BERT Transformer Model
Sunil Gundapu | Radhika Mamidi

The rapid rise of online social networks like YouTube, Facebook, Twitter allows people to express their views more widely online. However, at the same time, it can lead to an increase in conflict and hatred among consumers in the form of freedom of speech. Therefore, it is essential to take a positive strengthening method to research on encouraging, positive, helping, and supportive social media content. In this paper, we describe a Transformer-based BERT model for Hope speech detection for equality, diversity, and inclusion, submitted for LT-EDI-2021 Task 2. Our model achieves a weighted averaged f1-score of 0.93 on the test set.

pdf bib
Hopeful NLP@LT-EDI-EACL2021 : Finding Hope in YouTube Comment SectionNLP@LT-EDI-EACL2021: Finding Hope in YouTube Comment Section
Vasudev Awatramani

The proliferation of Hate Speech and misinformation in social media is fast becoming a menace to society. In compliment, the dissemination of hate-diffusing, promising and anti-oppressive messages become a unique alternative. Unfortunately, due to its complex nature as well as the relatively limited manifestation in comparison to hostile and neutral content, the identification of Hope Speech becomes a challenge. This work revolves around the detection of Hope Speech in Youtube comments, for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion. We achieve an f-score of 0.93, ranking 1st on the leaderboard for English comments.

pdf bib
NLP-CUET@LT-EDI-EACL2021 : Multilingual Code-Mixed Hope Speech Detection using Cross-lingual Representation LearnerNLP-CUET@LT-EDI-EACL2021: Multilingual Code-Mixed Hope Speech Detection using Cross-lingual Representation Learner
Eftekhar Hossain | Omar Sharif | Mohammed Moshiul Hoque

In recent years, several systems have been developed to regulate the spread of negativity and eliminate aggressive, offensive or abusive contents from the online platforms. Nevertheless, a limited number of researches carried out to identify positive, encouraging and supportive contents. In this work, our goal is to identify whether a social media post / comment contains hope speech or not. We propose three distinct models to identify hope speech in English, Tamil and Malayalam language to serve this purpose. To attain this goal, we employed various machine learning (SVM, LR, ensemble), deep learning (CNN+BiLSTM) and transformer (m-BERT, Indic-BERT, XLNet, XLM-R) based methods. Results indicate that XLM-R outdoes all other techniques by gaining a weighted f_1-score of 0.93, 0.60 and 0.85 respectively for English, Tamil and Malayalam language. Our team has achieved 1st, 2nd and 1st rank in these three tasks respectively.

pdf bib
Spartans@LT-EDI-EACL2021 : Inclusive Speech Detection using Pretrained Language ModelsLT-EDI-EACL2021: Inclusive Speech Detection using Pretrained Language Models
Megha Sharma | Gaurav Arora

We describe our system that ranked first in Hope Speech Detection (HSD) shared task and fourth in Offensive Language Identification (OLI) shared task, both in Tamil language. The goal of HSD and OLI is to identify if a code-mixed comment or post contains hope speech or offensive content respectively. We pre-train a transformer-based model RoBERTa using synthetically generated code-mixed data and use it in an ensemble along with their pre-trained ULMFiT model available from iNLTK.