Lexical and Computational Semantics and Semantic Evaluation (formerly Workshop on Sense Evaluation) (2021)


up

pdf (full)
bib (full)
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

pdf bib
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)
Alexis Palmer | Nathan Schneider | Natalie Schluter | Guy Emerson | Aurelie Herbelot | Xiaodan Zhu

pdf bib
SemEval-2021 Task 1 : Lexical Complexity PredictionSemEval-2021 Task 1: Lexical Complexity Prediction
Matthew Shardlow | Richard Evans | Gustavo Henrique Paetzold | Marcos Zampieri

This paper presents the results and main findings of SemEval-2021 Task 1-Lexical Complexity Prediction. We provided participants with an augmented version of the CompLex Corpus (Shardlow et al. CompLex is an English multi-domain corpus in which words and multi-word expressions (MWEs) were annotated with respect to their complexity using a five point Likert scale. SemEval-2021 Task 1 featured two Sub-tasks : Sub-task 1 focused on single words and Sub-task 2 focused on MWEs. The competition attracted 198 teams in total, of which 54 teams submitted official runs on the test data to Sub-task 1 and 37 to Sub-task 2.

pdf bib
SemEval-2021 Task 6 : Detection of Persuasion Techniques in Texts and ImagesSemEval-2021 Task 6: Detection of Persuasion Techniques in Texts and Images
Dimitar Dimitrov | Bishr Bin Ali | Shaden Shaar | Firoj Alam | Fabrizio Silvestri | Hamed Firooz | Preslav Nakov | Giovanni Da San Martino

We describe SemEval-2021 task 6 on Detection of Persuasion Techniques in Texts and Images : the data, the annotation guidelines, the evaluation setup, the results, and the participating systems. The task focused on memes and had three subtasks : (i) detecting the techniques in the text, (ii) detecting the text spans where the techniques are used, and (iii) detecting techniques in the entire meme, i.e., both in the text and in the image. It was a popular task, attracting 71 registrations, and 22 teams that eventually made an official submission on the test set. The evaluation results for the third subtask confirmed the importance of both modalities, the text and the image. Moreover, some teams reported benefits when not just combining the two modalities, e.g., by using early or late fusion, but rather modeling the interaction between them in a joint model.

pdf bib
Alpha at SemEval-2021 Task 6 : Transformer Based Propaganda ClassificationSemEval-2021 Task 6: Transformer Based Propaganda Classification
Zhida Feng | Jiji Tang | Jiaxiang Liu | Weichong Yin | Shikun Feng | Yu Sun | Li Chen

This paper describes our system participated in Task 6 of SemEval-2021 : the task focuses on multimodal propaganda technique classification and it aims to classify given image and text into 22 classes. In this paper, we propose to use transformer based architecture to fuse the clues from both image and text. We explore two branches of techniques including fine-tuning the text pretrained transformer with extended visual features, and fine-tuning the multimodal pretrained transformers. For the visual features, we have tested both grid features based on ResNet and salient region features from pretrained object detector. Among the pretrained multimodal transformers, we choose ERNIE-ViL, a two-steam cross-attended transformers pretrained on large scale image-caption aligned data. Fine-tuing ERNIE-ViL for our task produce a better performance due to general joint multimodal representation for text and image learned by ERNIE-ViL. Besides, as the distribution of the classification labels is very unbalanced, we also make a further attempt on the loss function and the experiment result shows that focal loss would perform better than cross entropy loss. Last we have won first for subtask C in the final competition.

pdf bib
SemEval 2021 Task 7 : HaHackathon, Detecting and Rating Humor and OffenseSemEval 2021 Task 7: HaHackathon, Detecting and Rating Humor and Offense
J. A. Meaney | Steven Wilson | Luis Chiruzzo | Adam Lopez | Walid Magdy

SemEval 2021 Task 7, HaHackathon, was the first shared task to combine the previously separate domains of humor detection and offense detection. We collected 10,000 texts from Twitter and the Kaggle Short Jokes dataset, and had each annotated for humor and offense by 20 annotators aged 18-70. Our subtasks were binary humor detection, prediction of humor and offense ratings, and a novel controversy task : to predict if the variance in the humor ratings was higher than a specific threshold. The subtasks attracted 36-58 submissions, with most of the participants choosing to use pre-trained language models. Many of the highest performing teams also implemented additional optimization techniques, including task-adaptive training and adversarial training. The results suggest that the participating systems are well suited to humor detection, but that humor controversy is a more challenging task. We discuss which models excel in this task, which auxiliary techniques boost their performance, and analyze the errors which were not captured by the best systems.

pdf bib
Complex words identification using word-level features for SemEval-2020 Task 1SemEval-2020 Task 1
Jenny A. Ortiz-Zambrano | Arturo Montejo-Ráez

This article describes a system to predict the complexity of words for the Lexical Complexity Prediction (LCP) shared task hosted at SemEval 2021 (Task 1) with a new annotated English dataset with a Likert scale. Located in the Lexical Semantics track, the task consisted of predicting the complexity value of the words in context. A machine learning approach was carried out based on the frequency of the words and several characteristics added at word level. Over these features, a supervised random forest regression algorithm was trained. Several runs were performed with different values to observe the performance of the algorithm. For the evaluation, our best results reported a M.A.E score of 0.07347, M.S.E. of 0.00938, and R.M.S.E. of 0.096871. Our experiments showed that, with a greater number of characteristics, the precision of the classification increases.

pdf bib
Uppsala NLP at SemEval-2021 Task 2 : Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context DisambiguationUppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation
Huiling You | Xingran Zhu | Sara Stymne

We describe the Uppsala NLP submission to SemEval-2021 Task 2 on multilingual and cross-lingual word-in-context disambiguation. We explore the usefulness of three pre-trained multilingual language models, XLM-RoBERTa (XLMR), Multilingual BERT (mBERT) and multilingual distilled BERT (mDistilBERT). We compare these three models in two setups, fine-tuning and as feature extractors. In the second case we also experiment with using dependency-based information. We find that fine-tuning is better than feature extraction. XLMR performs better than mBERT in the cross-lingual setting both with fine-tuning and feature extraction, whereas these two models give a similar performance in the multilingual setting. mDistilBERT performs poorly with fine-tuning but gives similar results to the other models when used as a feature extractor. We submitted our two best systems, fine-tuned with XLMR and mBERT.

pdf bib
SkoltechNLP at SemEval-2021 Task 2 : Generating Cross-Lingual Training Data for the Word-in-Context TaskSkoltechNLP at SemEval-2021 Task 2: Generating Cross-Lingual Training Data for the Word-in-Context Task
Anton Razzhigaev | Nikolay Arefyev | Alexander Panchenko

In this paper, we present a system for the solution of the cross-lingual and multilingual word-in-context disambiguation task. Task organizers provided monolingual data in several languages, but no cross-lingual training data were available. To address the lack of the officially provided cross-lingual training data, we decided to generate such data ourselves. We describe a simple yet effective approach based on machine translation and back translation of the lexical units to the original language used in the context of this shared task. In our experiments, we used a neural system based on the XLM-R, a pre-trained transformer-based masked language model, as a baseline. We show the effectiveness of the proposed approach as it allows to substantially improve the performance of this strong neural baseline model. In addition, in this study, we present multiple types of the XLM-R based classifier, experimenting with various ways of mixing information from the first and second occurrences of the target word in two samples.

pdf bib
Zhestyatsky at SemEval-2021 Task 2 : ReLU over Cosine Similarity for BERT Fine-tuningSemEval-2021 Task 2: ReLU over Cosine Similarity for BERT Fine-tuning
Boris Zhestiankin | Maria Ponomareva

This paper presents our contribution to SemEval-2021 Task 2 : Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC). Our experiments cover English (EN-EN) sub-track from the multilingual setting of the task. We experiment with several pre-trained language models and investigate an impact of different top-layers on fine-tuning. We find the combination of Cosine Similarity and ReLU activation leading to the most effective fine-tuning procedure. Our best model results in accuracy 92.7 %, which is the fourth-best score in EN-EN sub-track.

pdf bib
SzegedAI at SemEval-2021 Task 2 : Zero-shot Approach for Multilingual and Cross-lingual Word-in-Context DisambiguationSzegedAI at SemEval-2021 Task 2: Zero-shot Approach for Multilingual and Cross-lingual Word-in-Context Disambiguation
Gábor Berend

In this paper, we introduce our system that we participated with at the multilingual and cross-lingual word-in-context disambiguation SemEval 2021 shared task. In our experiments, we investigated the possibility of using an all-words fine-grained word sense disambiguation system trained purely on sense-annotated data in English and draw predictions on the semantic equivalence of words in context based on the similarity of the ranked lists of the (English) WordNet synsets returned for the target words decisions had to be made for. We overcame the multi,-and cross-lingual aspects of the shared task by applying a multilingual transformer for encoding the texts written in either Arabic, English, French, Russian and Chinese. While our results lag behind top scoring submissions, it has the benefit that it not only provides a binary flag whether two words in their context have the same meaning, but also provides a more tangible output in the form of a ranked list of (English) WordNet synsets irrespective of the language of the input texts. As our framework is designed to be as generic as possible, it can be applied as a baseline for basically any language (supported by the multilingual transformed architecture employed) even in the absence of any additional form of language specific training data.

pdf bib
ECNU_ICA_1 SemEval-2021 Task 4 : Leveraging Knowledge-enhanced Graph Attention Networks for Reading Comprehension of Abstract MeaningECNU_ICA_1 SemEval-2021 Task 4: Leveraging Knowledge-enhanced Graph Attention Networks for Reading Comprehension of Abstract Meaning
Pingsheng Liu | Linlin Wang | Qian Zhao | Hao Chen | Yuxi Feng | Xin Lin | Liang He

This paper describes our system for SemEval-2021 Task 4 : Reading Comprehension of Abstract Meaning. To accomplish this task, we utilize the Knowledge-Enhanced Graph Attention Network (KEGAT) architecture with a novel semantic space transformation strategy. It leverages heterogeneous knowledge to learn adequate evidences, and seeks for an effective semantic space of abstract concepts to better improve the ability of a machine in understanding the abstract meaning of natural language. Experimental results show that our system achieves strong performance on this task in terms of both imperceptibility and nonspecificity.

pdf bib
LRG at SemEval-2021 Task 4 : Improving Reading Comprehension with Abstract Words using Augmentation, Linguistic Features and VotingLRG at SemEval-2021 Task 4: Improving Reading Comprehension with Abstract Words using Augmentation, Linguistic Features and Voting
Abheesht Sharma | Harshit Pandey | Gunjan Chhablani | Yash Bhartia | Tirtharaj Dash

We present our approaches and methods for SemEval-2021 Task-4 Reading Comprehension of Abstract Meaning. Given a question with a fill-in-the-blank, and a corresponding context, the task is to predict the most suitable word from a list of 5 options. There are three subtasks : Imperceptibility, Non-Specificity and Intersection. We use encoders of transformers-based models pretrained on the MLM task to build our Fill-in-the-blank (FitB) models. Moreover, to model imperceptibility, we define certain linguistic features, and to model non-specificity, we leverage information from hypernyms and hyponyms provided by a lexical database. Specifically, for non-specificity, we try out augmentation techniques, and other statistical techniques. We also propose variants, namely Chunk Voting and Max Context, to take care of input length restrictions for BERT, etc. Additionally, we perform a thorough ablation study, and use Integrated Gradients to explain our predictions on a few samples. Our models achieve accuracies of 75.31 % and 77.84 %, on the test sets for subtask-I and subtask-II, respectively. For subtask-III, we achieve accuracies of 65.64 % and 64.27 %.

pdf bib
NLP-IIS@UT at SemEval-2021 Task 4 : Machine Reading Comprehension using the Long Document TransformerNLP-IIS@UT at SemEval-2021 Task 4: Machine Reading Comprehension using the Long Document Transformer
Hossein Basafa | Sajad Movahedi | Ali Ebrahimi | Azadeh Shakery | Heshaam Faili

This paper presents a technical report of our submission to the 4th task of SemEval-2021, titled : Reading Comprehension of Abstract Meaning. In this task, we want to predict the correct answer based on a question given a context. Usually, contexts are very lengthy and require a large receptive field from the model. Thus, common contextualized language models like BERT miss fine representation and performance due to the limited capacity of the input tokens. To tackle this problem, we used the longformer model to better process the sequences. Furthermore, we utilized the method proposed in the longformer benchmark on wikihop dataset which improved the accuracy on our task data from (23.01 % and 22.95 %) achieved by the baselines for subtask 1 and 2, respectively, to (70.30 % and 64.38 %).

pdf bib
HamiltonDinggg at SemEval-2021 Task 5 : Investigating Toxic Span Detection using RoBERTa Pre-trainingHamiltonDinggg at SemEval-2021 Task 5: Investigating Toxic Span Detection using RoBERTa Pre-training
Huiyang Ding | David Jurgens

This paper presents our system submission to task 5 : Toxic Spans Detection of the SemEval-2021 competition. The competition aims at detecting the spans that make a toxic span toxic. In this paper, we demonstrate our system for detecting toxic spans, which includes expanding the toxic training set with Local Interpretable Model-Agnostic Explanations (LIME), fine-tuning RoBERTa model for detection, and error analysis. We found that feeding the model with an expanded training set using Reddit comments of polarized-toxicity and labeling with LIME on top of logistic regression classification could help RoBERTa more accurately learn to recognize toxic spans. We achieved a span-level F1 score of 0.6715 on the testing phase. Our quantitative and qualitative results show that the predictions from our system could be a good supplement to the gold training set’s annotations.

pdf bib
WVOQ at SemEval-2021 Task 6 : BART for Span Detection and ClassificationWVOQ at SemEval-2021 Task 6: BART for Span Detection and Classification
Cees Roele

Simultaneous span detection and classification is a task not currently addressed in standard NLP frameworks. The present paper describes why and how an EncoderDecoder model was used to combine span detection and classification to address subtask 2 of SemEval-2021 Task 6.

pdf bib
HumorHunter at SemEval-2021 Task 7 : Humor and Offense Recognition with Disentangled AttentionHumorHunter at SemEval-2021 Task 7: Humor and Offense Recognition with Disentangled Attention
Yubo Xie | Junze Li | Pearl Pu

In this paper, we describe our system submitted to SemEval 2021 Task 7 : HaHackathon : Detecting and Rating Humor and Offense. The task aims at predicting whether the given text is humorous, the average humor rating given by the annotators, and whether the humor rating is controversial. In addition, the task also involves predicting how offensive the text is. Our approach adopts the DeBERTa architecture with disentangled attention mechanism, where the attention scores between words are calculated based on their content vectors and relative position vectors. We also took advantage of the pre-trained language models and fine-tuned the DeBERTa model on all the four subtasks. We experimented with several BERT-like structures and found that the large DeBERTa model generally performs better. During the evaluation phase, our system achieved an F-score of 0.9480 on subtask 1a, an RMSE of 0.5510 on subtask 1b, an F-score of 0.4764 on subtask 1c, and an RMSE of 0.4230 on subtask 2a (rank 3 on the leaderboard).

pdf bib
Grenzlinie at SemEval-2021 Task 7 : Detecting and Rating Humor and OffenseSemEval-2021 Task 7: Detecting and Rating Humor and Offense
Renyuan Liu | Xiaobing Zhou

This paper introduces the result of Team Grenzlinie’s experiment in SemEval-2021 task 7 : HaHackathon : Detecting and Rating Humor and Offense. This task has two subtasks. Subtask1 includes the humor detection task, the humor rating prediction task, and the humor controversy detection task. Subtask2 is an offensive rating prediction task. Detection task is a binary classification task, and the rating prediction task is a regression task between 0 to 5. 0 means the task is not humorous or not offensive, 5 means the task is very humorous or very offensive. For all the tasks, this paper chooses RoBERTa as the pre-trained model. In classification tasks, Bi-LSTM and adversarial training are adopted. In the regression task, the Bi-LSTM is also adopted. And then we propose a new approach named compare method. Finally, our system achieves an F1-score of 95.05 % in the humor detection task, F1-score of 61.74 % in the humor controversy detection task, 0.6143 RMSE in humor rating task, 0.4761 RMSE in the offensive rating task on the test datasets.

pdf bib
Humor@IITK at SemEval-2021 Task 7 : Large Language Models for Quantifying Humor and OffensivenessIITK at SemEval-2021 Task 7: Large Language Models for Quantifying Humor and Offensiveness
Aishwarya Gupta | Avik Pal | Bholeshwar Khurana | Lakshay Tyagi | Ashutosh Modi

Humor and Offense are highly subjective due to multiple word senses, cultural knowledge, and pragmatic competence. Hence, accurately detecting humorous and offensive texts has several compelling use cases in Recommendation Systems and Personalized Content Moderation. However, due to the lack of an extensive labeled dataset, most prior works in this domain have n’t explored large neural models for subjective humor understanding. This paper explores whether large neural models and their ensembles can capture the intricacies associated with humor / offense detection and rating. Our experiments on the SemEval-2021 Task 7 : HaHackathon show that we can develop reasonable humor and offense detection systems with such models. Our models are ranked 3rd in subtask 1b and consistently ranked around the top 33 % of the leaderboard for the remaining subtasks.

pdf bib
RoMa at SemEval-2021 Task 7 : A Transformer-based Approach for Detecting and Rating Humor and OffenseRoMa at SemEval-2021 Task 7: A Transformer-based Approach for Detecting and Rating Humor and Offense
Roberto Labadie | Mariano Jason Rodriguez | Reynier Ortega | Paolo Rosso

In this paper we describe the systems used by the RoMa team in the shared task on Detecting and Rating Humor and Offense (HaHackathon) at SemEval 2021. Our systems rely on data representations learned through fine-tuned neural language models. Particularly, we explore two distinct architectures. The first one is based on a Siamese Neural Network (SNN) combined with a graph-based clustering method. The SNN model is used for learning a latent space where instances of humor and non-humor can be distinguished. The clustering method is applied to build prototypes of both classes which are used for training and classifying new messages. The second one combines neural language model representations with a linear regression model which makes the final ratings. Our systems achieved the best results for humor classification using model one, whereas for offensive and humor rating the second model obtained better performance. In the case of the controversial humor prediction, the most significant improvement was achieved by a fine-tuning of the neural language model. In general, the results achieved are encouraging and give us a starting point for further improvements.

pdf bib
BLCUFIGHT at SemEval-2021 Task 10 : Novel Unsupervised Frameworks For Source-Free Domain AdaptationBLCUFIGHT at SemEval-2021 Task 10: Novel Unsupervised Frameworks For Source-Free Domain Adaptation
Weikang Wang | Yi Wu | Yixiang Liu | Pengyuan Liu

Domain adaptation assumes that samples from source and target domains are freely accessible during a training phase. However, such assumption is rarely plausible in the real-world and may causes data-privacy issues, especially when the label of the source domain can be a sensitive attribute as an identifier. SemEval-2021 task 10 focuses on these issues. We participate in the task and propose novel frameworks based on self-training method. In our systems, two different frameworks are designed to solve text classification and sequence labeling. These approaches are tested to be effective which ranks the third among all system in subtask A, and ranks the first among all system in subtask B.

pdf bib
BOUN at SemEval-2021 Task 9 : Text Augmentation Techniques for Fact Verification in Tabular DataBOUN at SemEval-2021 Task 9: Text Augmentation Techniques for Fact Verification in Tabular Data
Abdullatif Köksal | Yusuf Yüksel | Bekir Yıldırım | Arzucan Özgür

In this paper, we present our text augmentation based approach for the Table Statement Support Subtask (Phase A) of SemEval-2021 Task 9. We experiment with different text augmentation techniques such as back translation and synonym swapping using Word2Vec and WordNet. We show that text augmentation techniques lead to 2.5 % improvement in F1 on the test set. Further, we investigate the impact of domain adaptation and joint learning on fact verification in tabular data by utilizing the SemTabFacts and TabFact datasets. We observe that joint learning improves the F1 scores on the SemTabFacts and TabFact test sets by 3.31 % and 0.77 %, respectively.

pdf bib
IITK at SemEval-2021 Task 10 : Source-Free Unsupervised Domain Adaptation using Class PrototypesIITK at SemEval-2021 Task 10: Source-Free Unsupervised Domain Adaptation using Class Prototypes
Harshit Kumar | Jinang Shah | Nidhi Hegde | Priyanshu Gupta | Vaibhav Jindal | Ashutosh Modi

Recent progress in deep learning has primarily been fueled by the availability of large amounts of annotated data that is obtained from highly expensive manual annotating pro-cesses. To tackle this issue of availability of annotated data, a lot of research has been done on unsupervised domain adaptation that tries to generate systems for an unlabelled target domain data, given labeled source domain data. However, the availability of annotated or labelled source domain dataset ca n’t always be guaranteed because of data-privacy issues. This is especially the case with medical data, as it may contain sensitive information of the patients. Source-free domain adaptation (SFDA) aims to resolve this issue by us-ing models trained on the source data instead of using the original annotated source data. In this work, we try to build SFDA systems for semantic processing by specifically focusing on the negation detection subtask of the SemEval2021 Task 10. We propose two approaches -ProtoAUGandAdapt-ProtoAUGthat use the idea of self-entropy to choose reliable and high confidence samples, which are then used for data augmentation and subsequent training of the models. Our methods report an improvement of up to 7 % in F1 score over the baseline for the Negation Detection subtask.

pdf bib
PTST-UoM at SemEval-2021 Task 10 : Parsimonious Transfer for Sequence TaggingPTST-UoM at SemEval-2021 Task 10: Parsimonious Transfer for Sequence Tagging
Kemal Kurniawan | Lea Frermann | Philip Schulz | Trevor Cohn

This paper describes PTST, a source-free unsupervised domain adaptation technique for sequence tagging, and its application to the SemEval-2021 Task 10 on time expression recognition. PTST is an extension of the cross-lingual parsimonious parser transfer framework, which uses high-probability predictions of the source model as a supervision signal in self-training. We extend the framework to a sequence prediction setting, and demonstrate its applicability to unsupervised domain adaptation. PTST achieves F1 score of 79.6 % on the official test set, with the precision of 90.1 %, the highest out of 14 submissions.

pdf bib
Self-Adapter at SemEval-2021 Task 10 : Entropy-based Pseudo-Labeler for Source-free Domain AdaptationSemEval-2021 Task 10: Entropy-based Pseudo-Labeler for Source-free Domain Adaptation
Sangwon Yoon | Yanghoon Kim | Kyomin Jung

Source-free domain adaptation is an emerging line of work in deep learning research since it is closely related to the real-world environment. We study the domain adaption in the sequence labeling problem where the model trained on the source domain data is given. We propose two methods : Self-Adapter and Selective Classifier Training. Self-Adapter is a training method that uses sentence-level pseudo-labels filtered by the self-entropy threshold to provide supervision to the whole model. Selective Classifier Training uses token-level pseudo-labels and supervises only the classification layer of the model. The proposed methods are evaluated on data provided by SemEval-2021 task 10 and Self-Adapter achieves 2nd rank performance.

pdf bib
The University of Arizona at SemEval-2021 Task 10 : Applying Self-training, Active Learning and Data Augmentation to Source-free Domain AdaptationUniversity of Arizona at SemEval-2021 Task 10: Applying Self-training, Active Learning and Data Augmentation to Source-free Domain Adaptation
Xin Su | Yiyun Zhao | Steven Bethard

This paper describes our systems for negation detection and time expression recognition in SemEval 2021 Task 10, Source-Free Domain Adaptation for Semantic Processing. We show that self-training, active learning and data augmentation techniques can improve the generalization ability of the model on the unlabeled target domain data without accessing source domain data. We also perform detailed ablation studies and error analyses for our time expression recognition systems to identify the source of the performance improvement and give constructive feedback on the temporal normalization annotation guidelines.

pdf bib
YNU-HPCC at SemEval-2021 Task 11 : Using a BERT Model to Extract Contributions from NLP Scholarly ArticlesYNU-HPCC at SemEval-2021 Task 11: Using a BERT Model to Extract Contributions from NLP Scholarly Articles
Xinge Ma | Jin Wang | Xuejie Zhang

This paper describes the system we built as the YNU-HPCC team in the SemEval-2021 Task 11 : NLPContributionGraph. This task involves first identifying sentences in the given natural language processing (NLP) scholarly articles that reflect research contributions through binary classification ; then identifying the core scientific terms and their relation phrases from these contribution sentences by sequence labeling ; and finally, these scientific terms and relation phrases are categorized, identified, and organized into subject-predicate-object triples to form a knowledge graph with the help of multiclass classification and multi-label classification. We developed a system for this task using a pre-trained language representation model called BERT that stands for Bidirectional Encoder Representations from Transformers, and achieved good results. The average F1-score for Evaluation Phase 2, Part 1 was 0.4562 and ranked 7th, and the average F1-score for Evaluation Phase 2, Part 2 was 0.6541, and also ranked 7th.

pdf bib
ITNLP at SemEval-2021 Task 11 : Boosting BERT with Sampling and Adversarial Training for Knowledge ExtractionITNLP at SemEval-2021 Task 11: Boosting BERT with Sampling and Adversarial Training for Knowledge Extraction
Genyu Zhang | Yu Su | Changhong He | Lei Lin | Chengjie Sun | Lili Shan

This paper describes the winning system in the End-to-end Pipeline phase for the NLPContributionGraph task. The system is composed of three BERT-based models and the three models are used to extract sentences, entities and triples respectively. Experiments show that sampling and adversarial training can greatly boost the system. In End-to-end Pipeline phase, our system got an average F1 of 0.4703, significantly higher than the second-placed system which got an average F1 of 0.3828.

pdf bib
Duluth at SemEval-2021 Task 11 : Applying DeBERTa to Contributing Sentence Selection and Dependency Parsing for Entity ExtractionDuluth at SemEval-2021 Task 11: Applying DeBERTa to Contributing Sentence Selection and Dependency Parsing for Entity Extraction
Anna Martin | Ted Pedersen

This paper describes the Duluth system that participated in SemEval-2021 Task 11, NLP Contribution Graph. It details the extraction of contribution sentences and scientific entities and their relations from scholarly articles in the domain of Natural Language Processing. Our solution uses deBERTa for multi-class sentence classification to extract the contributing sentences and their type, and dependency parsing to outline each sentence and extract subject-predicate-object triples. Our system ranked fifth of seven for Phase 1 : end-to-end pipeline, sixth of eight for Phase 2 Part 1 : phrases and triples, and fifth of eight for Phase 2 Part 2 : triples extraction.

pdf bib
INNOVATORS at SemEval-2021 Task-11 : A Dependency Parsing and BERT-based model for Extracting Contribution Knowledge from Scientific PapersINNOVATORS at SemEval-2021 Task-11: A Dependency Parsing and BERT-based model for Extracting Contribution Knowledge from Scientific Papers
Hardik Arora | Tirthankar Ghosal | Sandeep Kumar | Suraj Patwal | Phil Gooch

In this work, we describe our system submission to the SemEval 2021 Task 11 : NLP Contribution Graph Challenge. We attempt all the three sub-tasks in the challenge and report our results. Subtask 1 aims to identify the contributing sentences in a given publication. Subtask 2 follows from Subtask 1 to extract the scientific term and predicate phrases from the identified contributing sentences. The final Subtask 3 entails extracting triples (subject, predicate, object) from the phrases and categorizing them under one or more defined information units. With the NLPContributionGraph Shared Task, the organizers formalized the building of a scholarly contributions-focused graph over NLP scholarly articles as an automated task. Our approaches include a BERT-based classification model for identifying the contributing sentences in a research publication, a rule-based dependency parsing for phrase extraction, followed by a CNN-based model for information units classification, and a set of rules for triples extraction. The quantitative results show that we obtain the 5th, 5th, and 7th rank respectively in three evaluation phases. We make our codes available at https://github.com/HardikArora17/SemEval-2021-INNOVATORS.triples (subject, predicate, object) from the phrases and categorizing them under one or more defined information units. With the NLPContributionGraph Shared Task, the organizers formalized the building of a scholarly contributions-focused graph over NLP scholarly articles as an automated task. Our approaches include a BERT-based classification model for identifying the contributing sentences in a research publication, a rule-based dependency parsing for phrase extraction, followed by a CNN-based model for information units classification, and a set of rules for triples extraction. The quantitative results show that we obtain the 5th, 5th, and 7th rank respectively in three evaluation phases. We make our codes available at https://github.com/HardikArora17/SemEval-2021-INNOVATORS.

pdf bib
HITSZ-HLT at SemEval-2021 Task 5 : Ensemble Sequence Labeling and Span Boundary Detection for Toxic Span DetectionHITSZ-HLT at SemEval-2021 Task 5: Ensemble Sequence Labeling and Span Boundary Detection for Toxic Span Detection
Qinglin Zhu | Zijie Lin | Yice Zhang | Jingyi Sun | Xiang Li | Qihui Lin | Yixue Dang | Ruifeng Xu

This paper presents the winning system that participated in SemEval-2021 Task 5 : Toxic Spans Detection. This task aims to locate those spans that attribute to the text’s toxicity within a text, which is crucial for semi-automated moderation in online discussions. We formalize this task as the Sequence Labeling (SL) problem and the Span Boundary Detection (SBD) problem separately and employ three state-of-the-art models. Next, we integrate predictions of these models to produce a more credible and complement result. Our system achieves a char-level score of 70.83 %, ranking 1/91. In addition, we also explore the lexicon-based method, which is strongly interpretable and flexible in practice.

pdf bib
UPB at SemEval-2021 Task 8 : Extracting Semantic Information on Measurements as Multi-Turn Question AnsweringUPB at SemEval-2021 Task 8: Extracting Semantic Information on Measurements as Multi-Turn Question Answering
Andrei-Marius Avram | George-Eduard Zaharia | Dumitru-Clementin Cercel | Mihai Dascalu

Extracting semantic information on measurements and counts is an important topic in terms of analyzing scientific discourses. The 8th task of SemEval-2021 : Counts and Measurements (MeasEval) aimed to boost research in this direction by providing a new dataset on which participants train their models to extract meaningful information on measurements from scientific texts. The competition is composed of five subtasks that build on top of each other : (1) quantity span identification, (2) unit extraction from the identified quantities and their value modifier classification, (3) span identification for measured entities and measured properties, (4) qualifier span identification, and (5) relation extraction between the identified quantities, measured entities, measured properties, and qualifiers. We approached these challenges by first identifying the quantities, extracting their units of measurement, classifying them with corresponding modifiers, and afterwards using them to jointly solve the last three subtasks in a multi-turn question answering manner. Our best performing model obtained an overlapping F1-score of 36.91 % on the test set.

pdf bib
IITK@LCP at SemEval-2021 Task 1 : Classification for Lexical Complexity Regression TaskIITK@LCP at SemEval-2021 Task 1: Classification for Lexical Complexity Regression Task
Neil Shirude | Sagnik Mukherjee | Tushar Shandhilya | Ananta Mukherjee | Ashutosh Modi

This paper describes our contribution to SemEval 2021 Task 1 (Shardlow et al., 2021): Lexical Complexity Prediction. In our approach, we leverage the ELECTRA model and attempt to mirror the data annotation scheme. Although the task is a regression task, we show that we can treat it as an aggregation of several classification and regression models. This somewhat counter-intuitive approach achieved an MAE score of 0.0654 for Sub-Task 1 and MAE of 0.0811 on Sub-Task 2. Additionally, we used the concept of weak supervision signals from Gloss-BERT in our work, and it significantly improved the MAE score in Sub-Task 1.

pdf bib
LCP-RIT at SemEval-2021 Task 1 : Exploring Linguistic Features for Lexical Complexity PredictionLCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction
Abhinandan Tejalkumar Desai | Kai North | Marcos Zampieri | Christopher Homan

This paper describes team LCP-RIT’s submission to the SemEval-2021 Task 1 : Lexical Complexity Prediction (LCP). The task organizers provided participants with an augmented version of CompLex (Shardlow et al., 2020), an English multi-domain dataset in which words in context were annotated with respect to their complexity using a five point Likert scale. Our system uses logistic regression and a wide range of linguistic features (e.g. psycholinguistic features, n-grams, word frequency, POS tags) to predict the complexity of single words in this dataset. We analyze the impact of different linguistic features on the classification performance and we evaluate the results in terms of mean absolute error, mean squared error, Pearson correlation, and Spearman correlation.

pdf bib
CompNA at SemEval-2021 Task 1 : Prediction of lexical complexity analyzing heterogeneous featuresCompNA at SemEval-2021 Task 1: Prediction of lexical complexity analyzing heterogeneous features
Giuseppe Vettigli | Antonio Sorgente

This paper describes the CompNa model that has been submitted to the Lexical Complexity Prediction (LCP) shared task hosted at SemEval 2021 (Task 1). The solution is based on combining features of different nature through an ensambling method based on Decision Trees and trained using Gradient Boosting. We discuss the results of the model and highlight the features with more predictive capabilities.

pdf bib
CS-UM6P at SemEval-2021 Task 1 : A Deep Learning Model-based Pre-trained Transformer Encoder for Lexical ComplexityCS-UM6P at SemEval-2021 Task 1: A Deep Learning Model-based Pre-trained Transformer Encoder for Lexical Complexity
Nabil El Mamoun | Abdelkader El Mahdaouy | Abdellah El Mekki | Kabil Essefar | Ismail Berrada

Lexical Complexity Prediction (LCP) involves assigning a difficulty score to a particular word or expression, in a text intended for a target audience. In this paper, we introduce a new deep learning-based system for this challenging task. The proposed system consists of a deep learning model, based on pre-trained transformer encoder, for word and Multi-Word Expression (MWE) complexity prediction. First, on top of the encoder’s contextualized word embedding, our model employs an attention layer on the input context and the complex word or MWE. Then, the attention output is concatenated with the pooled output of the encoder and passed to a regression module. We investigate both single-task and joint training on both Sub-Tasks data using multiple pre-trained transformer-based encoders. The obtained results are very promising and show the effectiveness of fine-tuning pre-trained transformers for LCP task.

pdf bib
RG PA at SemEval-2021 Task 1 : A Contextual Attention-based Model with RoBERTa for Lexical Complexity PredictionRG PA at SemEval-2021 Task 1: A Contextual Attention-based Model with RoBERTa for Lexical Complexity Prediction
Gang Rao | Maochang Li | Xiaolong Hou | Lianxin Jiang | Yang Mo | Jianping Shen

In this paper we propose a contextual attention based model with two-stage fine-tune training using RoBERTa. First, we perform the first-stage fine-tune on corpus with RoBERTa, so that the model can learn some prior domain knowledge. Then we get the contextual embedding of context words based on the token-level embedding with the fine-tuned model. And we use Kfold cross-validation to get K models and ensemble them to get the final result. Finally, we attain the 2nd place in the final evaluation phase of sub-task 2 with pearson correlation of 0.8575.

pdf bib
CLULEX at SemEval-2021 Task 1 : A Simple System Goes a Long WayCLULEX at SemEval-2021 Task 1: A Simple System Goes a Long Way
Greta Smolenska | Peter Kolb | Sinan Tang | Mironas Bitinis | Héctor Hernández | Elin Asklöv

This paper presents the system we submitted to the first Lexical Complexity Prediction (LCP) Shared Task 2021. The Shared Task provides participants with a new English dataset that includes context of the target word. We participate in the single-word complexity prediction sub-task and focus on feature engineering. Our best system is trained on linguistic features and word embeddings (Pearson’s score of 0.7942). We demonstrate, however, that a simpler feature set achieves comparable results and submit a model trained on 36 linguistic features (Pearson’s score of 0.7925).

pdf bib
UNBNLP at SemEval-2021 Task 1 : Predicting lexical complexity with masked language models and character-level encodersUNBNLP at SemEval-2021 Task 1: Predicting lexical complexity with masked language models and character-level encoders
Milton King | Ali Hakimi Parizi | Samin Fakharian | Paul Cook

In this paper, we present three supervised systems for English lexical complexity prediction of single and multiword expressions for SemEval-2021 Task 1. We explore the use of statistical baseline features, masked language models, and character-level encoders to predict the complexity of a target token in context. Our best system combines information from these three sources. The results indicate that information from masked language models and character-level encoders can be combined to improve lexical complexity prediction.

pdf bib
GX at SemEval-2021 Task 2 : BERT with Lemma Information for MCL-WiC TaskGX at SemEval-2021 Task 2: BERT with Lemma Information for MCL-WiC Task
Wanying Xie

This paper presents the GX system for the Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC) task. The purpose of the MCL-WiC task is to tackle the challenge of capturing the polysemous nature of words without relying on a fixed sense inventory in a multilingual and cross-lingual setting. To solve the problems, we use context-specific word embeddings from BERT to eliminate the ambiguity between words in different contexts. For languages without an available training corpus, such as Chinese, we use neuron machine translation model to translate the English data released by the organizers to obtain available pseudo-data. In this paper, we apply our system to the English and Chinese multilingual setting and the experimental results show that our method has certain advantages.

pdf bib
PALI at SemEval-2021 Task 2 : Fine-Tune XLM-RoBERTa for Word in Context DisambiguationPALI at SemEval-2021 Task 2: Fine-Tune XLM-RoBERTa for Word in Context Disambiguation
Shuyi Xie | Jian Ma | Haiqin Yang | Lianxin Jiang | Yang Mo | Jianping Shen

This paper presents the PALI team’s winning system for SemEval-2021 Task 2 : Multilingual and Cross-lingual Word-in-Context Disambiguation. We fine-tune XLM-RoBERTa model to solve the task of word in context disambiguation, i.e., to determine whether the target word in the two contexts contains the same meaning or not. In implementation, we first specifically design an input tag to emphasize the target word in the contexts. Second, we construct a new vector on the fine-tuned embeddings from XLM-RoBERTa and feed it to a fully-connected network to output the probability of whether the target word in the context has the same meaning or not. The new vector is attained by concatenating the embedding of the [ CLS ] token and the embeddings of the target word in the contexts. In training, we explore several tricks, such as the Ranger optimizer, data augmentation, and adversarial training, to improve the model prediction. Consequently, we attain the first place in all four cross-lingual tasks.

pdf bib
Cambridge at SemEval-2021 Task 2 : Neural WiC-Model with Data Augmentation and Exploration of RepresentationCambridge at SemEval-2021 Task 2: Neural WiC-Model with Data Augmentation and Exploration of Representation
Zheng Yuan | David Strohmaier

This paper describes the system of the Cambridge team submitted to the SemEval-2021 shared task on Multilingual and Cross-lingual Word-in-Context Disambiguation. Building on top of a pre-trained masked language model, our system is first pre-trained on out-of-domain data, and then fine-tuned on in-domain data. We demonstrate the effectiveness of the proposed two-step training strategy and the benefits of data augmentation from both existing examples and new resources. We further investigate different representations and show that the addition of distance-based features is helpful in the word-in-context disambiguation task. Our system yields highly competitive results in the cross-lingual track without training on any cross-lingual data ; and achieves state-of-the-art results in the multilingual track, ranking first in two languages (Arabic and Russian) and second in French out of 171 submitted systems.

pdf bib
LIORI at SemEval-2021 Task 2 : Span Prediction and Binary Classification approaches to Word-in-Context DisambiguationLIORI at SemEval-2021 Task 2: Span Prediction and Binary Classification approaches to Word-in-Context Disambiguation
Adis Davletov | Nikolay Arefyev | Denis Gordeev | Alexey Rey

This paper presents our approaches to SemEval-2021 Task 2 : Multilingual and Cross-lingual Word-in-Context Disambiguation task. The first approach attempted to reformulate the task as a question answering problem, while the second one framed it as a binary classification problem. Our best system, which is an ensemble of XLM-R based binary classifiers trained with data augmentation, is among the 3 best-performing systems for Russian, French and Arabic in the multilingual subtask. In the post-evaluation period, we experimented with batch normalization, subword pooling and target word occurrence aggregation methods, resulting in further performance improvements.

pdf bib
FII_CROSS at SemEval-2021 Task 2 : Multilingual and Cross-lingual Word-in-Context DisambiguationFII_CROSS at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation
Ciprian Bodnar | Andrada Tapuc | Cosmin Pintilie | Daniela Gifu | Diana Trandabat

This paper presents a word-in-context disambiguation system. The task focuses on capturing the polysemous nature of words in a multilingual and cross-lingual setting, without considering a strict inventory of word meanings. The system applies Natural Language Processing algorithms on datasets from SemEval 2021 Task 2, being able to identify the meaning of words for the languages Arabic, Chinese, English, French and Russian, without making use of any additional mono- or multilingual resources.

pdf bib
UoR at SemEval-2021 Task 4 : Using Pre-trained BERT Token Embeddings for Question Answering of Abstract MeaningUoR at SemEval-2021 Task 4: Using Pre-trained BERT Token Embeddings for Question Answering of Abstract Meaning
Thanet Markchom | Huizhi Liang

Most question answering tasks focuses on predicting concrete answers, e.g., named entities. These tasks can be normally achieved by understanding the contexts without additional information required. In Reading Comprehension of Abstract Meaning (ReCAM) task, the abstract answers are introduced. To understand abstract meanings in the context, additional knowledge is essential. In this paper, we propose an approach that leverages the pre-trained BERT Token embeddings as a prior knowledge resource. According to the results, our approach using the pre-trained BERT outperformed the baselines. It shows that the pre-trained BERT token embeddings can be used as additional knowledge for understanding abstract meanings in question answering.

pdf bib
Noobs at Semeval-2021 Task 4 : Masked Language Modeling for abstract answer predictionSemeval-2021 Task 4: Masked Language Modeling for abstract answer prediction
Shikhar Shukla | Sarthak Sarthak | Karm Veer Arya

This paper presents the system developed by our team for Semeval 2021 Task 4 : Reading Comprehension of Abstract Meaning. The aim of the task was to benchmark the NLP techniques in understanding the abstract concepts present in a passage, and then predict the missing word in a human written summary of the passage. We trained a Roberta-Large model trained with a masked language modeling objective. In cases where this model failed to predict one of the available options, another Roberta-Large model trained as a binary classifier was used to predict correct and incorrect options. We used passage summary generated by Pegasus model and question as inputs. Our best solution was an ensemble of these 2 systems. We achieved an accuracy of 86.22 % on subtask 1 and 87.10 % on subtask 2.

pdf bib
PINGAN Omini-Sinitic at SemEval-2021 Task 4 : Reading Comprehension of Abstract MeaningPINGAN Omini-Sinitic at SemEval-2021 Task 4:Reading Comprehension of Abstract Meaning
Ye Wang | Yanmeng Wang | Haijun Zhu | Bo Zeng | Zhenghong Hao | Shaojun Wang | Jing Xiao

This paper describes the winning system for subtask 2 and the second-placed system for subtask 1 in SemEval 2021 Task 4 : ReadingComprehension of Abstract Meaning. We propose to use pre-trianed Electra discriminator to choose the best abstract word from five candidates. An upper attention and auto denoising mechanism is introduced to process the long sequences. The experiment results demonstrate that this contribution greatly facilitatesthe contextual language modeling in reading comprehension task. The ablation study is also conducted to show the validity of our proposed methods.

pdf bib
GHOST at SemEval-2021 Task 5 : Is explanation all you need?GHOST at SemEval-2021 Task 5: Is explanation all you need?
Kamil Pluciński | Hanna Klimczak

This paper discusses different approaches to the Toxic Spans Detection task. The problem posed by the task was to determine which words contribute mostly to recognising a document as toxic. As opposed to binary classification of entire texts, word-level assessment could be of great use during comment moderation, also allowing for a more in-depth comprehension of the model’s predictions. As the main goal was to ensure transparency and understanding, this paper focuses on the current state-of-the-art approaches based on the explainable AI concepts and compares them to a supervised learning solution with word-level labels. The work consists of two xAI approaches that automatically provide the explanation for models trained for binary classification of toxic documents : an LSTM model with attention as a model-specific approach and the Shapley values for interpreting BERT predictions as a model-agnostic method. The competing approach considers this problem as supervised token classification, where models like BERT and its modifications were tested. The paper aims to explore, compare and assess the quality of predictions for different methods on the task. The advantages of each approach and further research direction are also discussed.

pdf bib
GoldenWind at SemEval-2021 Task 5 : Orthrus-An Ensemble Approach to Identify ToxicityGoldenWind at SemEval-2021 Task 5: Orthrus - An Ensemble Approach to Identify Toxicity
Marco Palomino | Dawid Grad | James Bedwell

Many new developments to detect and mitigate toxicity are currently being evaluated. We are particularly interested in the correlation between toxicity and the emotions expressed in online posts. While toxicity may be disguised by amending the wording of posts, emotions will not. Therefore, we describe here an ensemble method to identify toxicity and classify the emotions expressed on a corpus of annotated posts published by Task 5 of SemEval 2021our analysis shows that the majority of such posts express anger, sadness and fear. Our method to identify toxicity combines a lexicon-based approach, which on its own achieves an F1 score of 61.07 %, with a supervised learning approach, which on its own achieves an F1 score of 60 %. When both methods are combined, the ensemble achieves an F1 score of 66.37 %.

pdf bib
NLP_UIOWA at Semeval-2021 Task 5 : Transferring Toxic Sets to Tag Toxic SpansNLP_UIOWA at Semeval-2021 Task 5: Transferring Toxic Sets to Tag Toxic Spans
Jonathan Rusert

We leverage a BLSTM with attention to identify toxic spans in texts. We explore different dimensions which affect the model’s performance. The first dimension explored is the toxic set the model is trained on. Besides the provided dataset, we explore the transferability of 5 different toxic related sets, including offensive, toxic, abusive, and hate sets. We find that the solely offensive set shows the highest promise of transferability. The second dimension we explore is methodology, including leveraging attention, employing a greedy remove method, using a frequency ratio, and examining hybrid combinations of multiple methods. We conduct an error analysis to examine which types of toxic spans were missed and which were wrongly inferred as toxic along with the main reasons why they occurred. Finally, we extend our method via ensembles, which achieves our highest F1 score of 55.1.

pdf bib
S-NLP at SemEval-2021 Task 5 : An Analysis of Dual Networks for Sequence TaggingS-NLP at SemEval-2021 Task 5: An Analysis of Dual Networks for Sequence Tagging
Viet Anh Nguyen | Tam Minh Nguyen | Huy Quang Dao | Quang Huu Pham

The SemEval 2021 task 5 : Toxic Spans Detection is a task of identifying considered-toxic spans in text, which provides a valuable, automatic tool for moderating online contents. This paper represents the second-place method for the task, an ensemble of two approaches. While one approach relies on combining different embedding methods to extract diverse semantic and syntactic representations of words in context ; the other utilizes extra data with a slightly customized Self-training, a semi-supervised learning technique, for sequence tagging problems. Both of our architectures take advantage of a strong language model, which was fine-tuned on a toxic classification task. Although experimental evidence indicates higher effectiveness of the first approach than the second one, combining them leads to our best results of 70.77 F1-score on the test dataset.

pdf bib
MIPT-NSU-UTMN at SemEval-2021 Task 5 : Ensembling Learning with Pre-trained Language Models for Toxic Spans DetectionMIPT-NSU-UTMN at SemEval-2021 Task 5: Ensembling Learning with Pre-trained Language Models for Toxic Spans Detection
Mikhail Kotyushev | Anna Glazkova | Dmitry Morozov

This paper describes our system for SemEval-2021 Task 5 on Toxic Spans Detection. We developed ensemble models using BERT-based neural architectures and post-processing to combine tokens into spans. We evaluated several pre-trained language models using various ensemble techniques for toxic span identification and achieved sizable improvements over our baseline fine-tuned BERT models. Finally, our system obtained a F1-score of 67.55 % on test data.

pdf bib
UIT-E10dot3 at SemEval-2021 Task 5 : Toxic Spans Detection with Named Entity Recognition and Question-Answering ApproachesUIT-E10dot3 at SemEval-2021 Task 5: Toxic Spans Detection with Named Entity Recognition and Question-Answering Approaches
Phu Gia Hoang | Luan Thanh Nguyen | Kiet Nguyen

The increment of toxic comments on online space is causing tremendous effects on other vulnerable users. For this reason, considerable efforts are made to deal with this, and SemEval-2021 Task 5 : Toxic Spans Detection is one of those. This task asks competitors to extract spans that have toxicity from the given texts, and we have done several analyses to understand its structure before doing experiments. We solve this task by two approaches, Named Entity Recognition with spaCy’s library and Question-Answering with RoBERTa combining with ToxicBERT, and the former gains the highest F1-score of 66.99 %.

pdf bib
YoungSheldon at SemEval-2021 Task 5 : Fine-tuning Pre-trained Language Models for Toxic Spans Detection using Token classification ObjectiveYoungSheldon at SemEval-2021 Task 5: Fine-tuning Pre-trained Language Models for Toxic Spans Detection using Token classification Objective
Mayukh Sharma | Ilanthenral Kandasamy | W.b. Vasantha

In this paper, we describe our system used for SemEval 2021 Task 5 : Toxic Spans Detection. Our proposed system approaches the problem as a token classification task. We trained our model to find toxic words and concatenate their spans to predict the toxic spans within a sentence. We fine-tuned Pre-trained Language Models (PLMs) for identifying the toxic words. For fine-tuning, we stacked the classification layer on top of the PLM features of each word to classify if it is toxic or not. PLMs are pre-trained using different objectives and their performance may differ on downstream tasks. We, therefore, compare the performance of BERT, ELECTRA, RoBERTa, XLM-RoBERTa, T5, XLNet, and MPNet for identifying toxic spans within a sentence. Our best performing system used RoBERTa. It performed well, achieving an F1 score of 0.6841 and secured a rank of 16 on the official leaderboard.

pdf bib
SINAI at SemEval-2021 Task 5 : Combining Embeddings in a BiLSTM-CRF model for Toxic Spans DetectionSINAI at SemEval-2021 Task 5: Combining Embeddings in a BiLSTM-CRF model for Toxic Spans Detection
Flor Miriam Plaza-del-Arco | Pilar López-Úbeda | L. Alfonso Ureña-López | M. Teresa Martín-Valdivia

This paper describes the participation of SINAI team at Task 5 : Toxic Spans Detection which consists of identifying spans that make a text toxic. Although several resources and systems have been developed so far in the context of offensive language, both annotation and tasks have mainly focused on classifying whether a text is offensive or not. However, detecting toxic spans is crucial to identify why a text is toxic and can assist human moderators to locate this type of content on social media. In order to accomplish the task, we follow a deep learning-based approach using a Bidirectional variant of a Long Short Term Memory network along with a stacked Conditional Random Field decoding layer (BiLSTM-CRF). Specifically, we test the performance of the combination of different pre-trained word embeddings for recognizing toxic entities in text. The results show that the combination of word embeddings helps in detecting offensive content. Our team ranks 29th out of 91 participants.

pdf bib
CSECU-DSG at SemEval-2021 Task 5 : Leveraging Ensemble of Sequence Tagging Models for Toxic Spans DetectionCSECU-DSG at SemEval-2021 Task 5: Leveraging Ensemble of Sequence Tagging Models for Toxic Spans Detection
Tashin Hossain | Jannatun Naim | Fareen Tasneem | Radiathun Tasnia | Abu Nowshed Chy

The upsurge of prolific blogging and microblogging platforms enabled the abusers to spread negativity and threats greater than ever. Detecting the toxic portions substantially aids to moderate or exclude the abusive parts for maintaining sound online platforms. This paper describes our participation in the SemEval 2021 toxic span detection task. The task requires detecting spans that convey toxic remarks from the given text. We explore an ensemble of sequence labeling models including the BiLSTM-CRF, spaCy NER model with custom toxic tags, and fine-tuned BERT model to identify the toxic spans. Finally, a majority voting ensemble method is used to determine the unified toxic spans. Experimental results depict the competitive performance of our model among the participants.

pdf bib
AIMH at SemEval-2021 Task 6 : Multimodal Classification Using an Ensemble of Transformer ModelsAIMH at SemEval-2021 Task 6: Multimodal Classification Using an Ensemble of Transformer Models
Nicola Messina | Fabrizio Falchi | Claudio Gennaro | Giuseppe Amato

This paper describes the system used by the AIMH Team to approach the SemEval Task 6. We propose an approach that relies on an architecture based on the transformer model to process multimodal content (text and images) in memes. Our architecture, called DVTT (Double Visual Textual Transformer), approaches Subtasks 1 and 3 of Task 6 as multi-label classification problems, where the text and/or images of the meme are processed, and the probabilities of the presence of each possible persuasion technique are returned as a result. DVTT uses two complete networks of transformers that work on text and images that are mutually conditioned. One of the two modalities acts as the main one and the second one intervenes to enrich the first one, thus obtaining two distinct ways of operation. The two transformers outputs are merged by averaging the inferred probabilities for each possible label, and the overall network is trained end-to-end with a binary cross-entropy loss.

pdf bib
1213Li at SemEval-2021 Task 6 : Detection of Propaganda with Multi-modal Attention and Pre-trained ModelsLi at SemEval-2021 Task 6: Detection of Propaganda with Multi-modal Attention and Pre-trained Models
Peiguang Li | Xuan Li | Xian Sun

This paper presents the solution proposed by the 1213Li team for subtask 3 in SemEval-2021 Task 6 : identifying the multiple persuasion techniques used in the multi-modal content of the meme. We explored various approaches in feature extraction and the detection of persuasion labels. Our final model employs pre-trained models including RoBERTa and ResNet-50 as a feature extractor for texts and images, respectively, and adopts a label embedding layer with multi-modal attention mechanism to measure the similarity of labels with the multi-modal information and fuse features for label prediction. Our proposed method outperforms the provided baseline method and achieves 3rd out of 16 participants with 0.54860/0.22830 for Micro / Macro F1 scores.

pdf bib
NLyticsFKIE at SemEval-2021 Task 6 : Detection of Persuasion Techniques In Texts And ImagesNLyticsFKIE at SemEval-2021 Task 6: Detection of Persuasion Techniques In Texts And Images
Albert Pritzkau

The following system description presents our approach to the detection of persuasion techniques in texts and images. The given task has been framed as a multi-label classification problem with the different techniques serving as class labels. The multi-label classification problem is one in which a list of target variables such as our class labels is associated with every input chunk and assumes that a document can simultaneously and independently be assigned to multiple labels or classes. In order to assign class labels to the given memes, we opted for RoBERTa (A Robustly Optimized BERT Pretraining Approach) as a neural network architecture for token and sequence classification. Starting off with a pre-trained model for language representation we fine-tuned this model on the given classification task with the provided annotated data in supervised training steps. To incorporate image features in the multi-modal setting, we rely on the pre-trained VGG-16 model architecture.

pdf bib
YNU-HPCC at SemEval-2021 Task 6 : Combining ALBERT and Text-CNN for Persuasion Detection in Texts and ImagesYNU-HPCC at SemEval-2021 Task 6: Combining ALBERT and Text-CNN for Persuasion Detection in Texts and Images
Xingyu Zhu | Jin Wang | Xuejie Zhang

In recent years, memes combining image and text have been widely used in social media, and memes are one of the most popular types of content used in online disinformation campaigns. In this paper, our study on the detection of persuasion techniques in texts and images in SemEval-2021 Task 6 is summarized. For propaganda technology detection in text, we propose a combination model of both ALBERT and Text CNN for text classification, as well as a BERT-based multi-task sequence labeling model for propaganda technology coverage span detection. For the meme classification task involved in text understanding and visual feature extraction, we designed a parallel channel model divided into text and image channels. Our method achieved a good performance on subtasks 1 and 3. The micro F1-scores of 0.492, 0.091, and 0.446 achieved on the test sets of the three subtasks ranked 12th, 7th, and 11th, respectively, and all are higher than the baseline model.

pdf bib
NLPIITR at SemEval-2021 Task 6 : RoBERTa Model with Data Augmentation for Persuasion Techniques DetectionNLPIITR at SemEval-2021 Task 6: RoBERTa Model with Data Augmentation for Persuasion Techniques Detection
Vansh Gupta | Raksha Sharma

This paper describes and examines different systems to address Task 6 of SemEval-2021 : Detection of Persuasion Techniques In Texts And Images, Subtask 1. The task aims to build a model for identifying rhetorical and psycho- logical techniques (such as causal oversimplification, name-calling, smear) in the textual content of a meme which is often used in a disinformation campaign to influence the users. The paper provides an extensive comparison among various machine learning systems as a solution to the task. We elaborate on the pre-processing of the text data in favor of the task and present ways to overcome the class imbalance. The results show that fine-tuning a RoBERTa model gave the best results with an F1-Micro score of 0.51 on the development set.

pdf bib
MinD at SemEval-2021 Task 6 : Propaganda Detection using Transfer Learning and Multimodal FusionMinD at SemEval-2021 Task 6: Propaganda Detection using Transfer Learning and Multimodal Fusion
Junfeng Tian | Min Gui | Chenliang Li | Ming Yan | Wenming Xiao

We describe our systems of subtask1 and subtask3 for SemEval-2021 Task 6 on Detection of Persuasion Techniques in Texts and Images. The purpose of subtask1 is to identify propaganda techniques given textual content, and the goal of subtask3 is to detect them given both textual and visual content. For subtask1, we investigate transfer learning based on pre-trained language models (PLMs) such as BERT, RoBERTa to solve data sparsity problems. For subtask3, we extract heterogeneous visual representations (i.e., face features, OCR features, and multimodal representations) and explore various multimodal fusion strategies to combine the textual and visual representations. The official evaluation shows our ensemble model ranks 1st for subtask1 and 2nd for subtask3.

pdf bib
CSECU-DSG at SemEval-2021 Task 6 : Orchestrating Multimodal Neural Architectures for Identifying Persuasion Techniques in Texts and ImagesCSECU-DSG at SemEval-2021 Task 6: Orchestrating Multimodal Neural Architectures for Identifying Persuasion Techniques in Texts and Images
Tashin Hossain | Jannatun Naim | Fareen Tasneem | Radiathun Tasnia | Abu Nowshed Chy

Inscribing persuasion techniques in memes is the most impactful way to influence peoples’ mindsets. People are more inclined to memes as they are more stimulating and convincing and hence memes are often exploited by tactfully engraving propaganda in its context with the intent of attaining specific agenda. This paper describes our participation in the three subtasks featured by SemEval 2021 task 6 on the detection of persuasion techniques in texts and images. We utilize a fusion of logistic regression, decision tree, and fine-tuned DistilBERT for tackling subtask 1. As for subtask 2, we propose a system that consolidates a span identification model and a multi-label classification model based on pre-trained BERT. We address the multi-modal multi-label classification of memes defined in subtask 3 by utilizing a ResNet50 based image model, DistilBERT based text model, and a multi-modal architecture based on multikernel CNN+LSTM and MLP model. The outcomes illustrated the competitive performance of our systems.

pdf bib
UMUTeam at SemEval-2021 Task 7 : Detecting and Rating Humor and Offense with Linguistic Features and Word EmbeddingsUMUTeam at SemEval-2021 Task 7: Detecting and Rating Humor and Offense with Linguistic Features and Word Embeddings
José Antonio García-Díaz | Rafael Valencia-García

In writing, humor is mainly based on figurative language in which words and expressions change their conventional meaning to refer to something without saying it directly. This flip in the meaning of the words prevents Natural Language Processing from revealing the real intention of a communication and, therefore, reduces the effectiveness of tasks such as Sentiment Analysis or Emotion Detection. In this manuscript we describe the participation of the UMUTeam in HaHackathon 2021, whose objective is to detect and rate humorous and controversial content. Our proposal is based on the combination of linguistic features with contextual and non-contextual word embeddings. We participate in all the proposed subtasks achieving our best result in the controversial humor subtask.

pdf bib
ES-JUST at SemEval-2021 Task 7 : Detecting and Rating Humor and Offensive Text Using Deep LearningES-JUST at SemEval-2021 Task 7: Detecting and Rating Humor and Offensive Text Using Deep Learning
Emran Al Bashabsheh | Sanaa Abu Alasal

This research presents the work of the team’s ES-JUST at semEval-2021 task 7 for detecting and rating humor and offensive text using deep learning. The team evaluates several approaches (i.e. Bert, Roberta, XLM-Roberta, and Bert embedding + Bi-LSTM) that employ in four sub-tasks. The first sub-task deal with whether the text is humorous or not. The second sub-task is the degree of humor in the text if the first sub-task is humorous. The third sub-task represents the text is controversial or not if it is humorous. While in the last task is the degree of an offensive in the text. However, Roberta pre-trained model outperforms other approaches and score the highest in all sub-tasks. We rank on the leader board at the evaluation phase are 14, 15, 20, and 5 through 0.9564 F-score, 0.5709 RMSE, 0.4888 F-score, and 0.4467 RMSE results, respectively, for each of the first, second, third, and fourth sub-task, respectively.i.e.Bert, Roberta, XLM-Roberta, and Bert embedding + Bi-LSTM) that employ in four sub-tasks. The first sub-task deal with whether the text is humorous or not. The second sub-task is the degree of humor in the text if the first sub-task is humorous. The third sub-task represents the text is controversial or not if it is humorous. While in the last task is the degree of an offensive in the text. However, Roberta pre-trained model outperforms other approaches and score the highest in all sub-tasks. We rank on the leader board at the evaluation phase are 14, 15, 20, and 5 through 0.9564 F-score, 0.5709 RMSE, 0.4888 F-score, and 0.4467 RMSE results, respectively, for each of the first, second, third, and fourth sub-task, respectively.

pdf bib
Tsia at SemEval-2021 Task 7 : Detecting and Rating Humor and OffenseSemEval-2021 Task 7: Detecting and Rating Humor and Offense
Zhengyi Guan | Xiaobing ZXB Zhou

This paper describes our contribution to SemEval-2021 Task 7 : Detecting and Rating Humor and Of-fense. This task contains two sub-tasks, sub-task 1and sub-task 2. Among them, sub-task 1 containsthree sub-tasks, sub-task 1a, sub-task 1b and sub-task 1c. Sub-task 1a is to predict if the text would beconsidered humorous. Sub-task 1c is described asfollows : if the text is classed as humorous, predictif the humor rating would be considered controver-sial, i.e. the variance of the rating between annota-tors is higher than the median.we combined threepre-trained model with CNN to complete these twoclassification sub-tasks. Sub-task 1b is to judge thedegree of humor. Sub-task 2 aims to predict how of-fensive a text would be with values between 0 and5.We use the idea of regression to deal with thesetwo sub-tasks. We analyze the performance of ourmethod and demonstrate the contribution of eachcomponent of our architecture. We have achievedgood results under the combination of multiple pre-training models and optimization methods.

pdf bib
DuluthNLP at SemEval-2021 Task 7 : Fine-Tuning RoBERTa Model for Humor Detection and Offense RatingDuluthNLP at SemEval-2021 Task 7: Fine-Tuning RoBERTa Model for Humor Detection and Offense Rating
Samuel Akrah

This paper presents the DuluthNLP submission to Task 7 of the SemEval 2021 competition on Detecting and Rating Humor and Offense. In it, we explain the approach used to train the model together with the process of fine-tuning our model in getting the results. We focus on humor detection, rating, and of-fense rating, representing three out of the four subtasks that were provided. We show that optimizing hyper-parameters for learning rate, batch size and number of epochs can increase the accuracy and F1 score for humor detection

pdf bib
EndTimes at SemEval-2021 Task 7 : Detecting and Rating Humor and Offense with BERT and EnsemblesEndTimes at SemEval-2021 Task 7: Detecting and Rating Humor and Offense with BERT and Ensembles
Chandan Kumar Pandey | Chirag Singh | Karan Mangla

This paper describes Humor-BERT, a set of BERT Large based models that we used in the SemEval-2021 Task 7 : Detecting and Rating Humor and Offense. It presents pre and post processing techniques, variable threshold learning, meta learning and Ensemble approach to solve various sub-tasks that were part of the challenge. We also present a comparative analysis of various models we tried. Our method was ranked 4th in Humor Controversy Detection, 8th in Humor Detection, 19th in Average Offense Score prediction and 40th in Average Humor Score prediction globally. F1 score obtained for Humor classification was 0.9655 and for Controversy detection it was 0.6261. Our user name on the leader board is ThisIstheEnd and team name is EndTimes.

pdf bib
IIITH at SemEval-2021 Task 7 : Leveraging transformer-based humourous and offensive text detection architectures using lexical and hurtlex features and task adaptive pretrainingIIITH at SemEval-2021 Task 7: Leveraging transformer-based humourous and offensive text detection architectures using lexical and hurtlex features and task adaptive pretraining
Tathagata Raha | Ishan Sanjeev Upadhyay | Radhika Mamidi | Vasudeva Varma

This paper describes our approach (IIITH) for SemEval-2021 Task 5 : HaHackathon : Detecting and Rating Humor and Offense. Our results focus on two major objectives : (i) Effect of task adaptive pretraining on the performance of transformer based models (ii) How does lexical and hurtlex features help in quantifying humour and offense. In this paper, we provide a detailed description of our approach along with comparisions mentioned above.

pdf bib
Volta at SemEval-2021 Task 9 : Statement Verification and Evidence Finding with Tables using TAPAS and Transfer LearningVolta at SemEval-2021 Task 9: Statement Verification and Evidence Finding with Tables using TAPAS and Transfer Learning
Devansh Gautam | Kshitij Gupta | Manish Shrivastava

Tables are widely used in various kinds of documents to present information concisely. Understanding tables is a challenging problem that requires an understanding of language and table structure, along with numerical and logical reasoning. In this paper, we present our systems to solve Task 9 of SemEval-2021 : Statement Verification and Evidence Finding with Tables (SEM-TAB-FACTS). The task consists of two subtasks : (A) Given a table and a statement, predicting whether the table supports the statement and (B) Predicting which cells in the table provide evidence for / against the statement. We fine-tune TAPAS (a model which extends BERT’s architecture to capture tabular structure) for both the subtasks as it has shown state-of-the-art performance in various table understanding tasks. In subtask A, we evaluate how transfer learning and standardizing tables to have a single header row improves TAPAS’ performance. In subtask B, we evaluate how different fine-tuning strategies can improve TAPAS’ performance. Our systems achieve an F1 score of 67.34 in subtask A three-way classification, 72.89 in subtask A two-way classification, and 62.95 in subtask B.

pdf bib
YNU-HPCC at SemEval-2021 Task 10 : Using a Transformer-based Source-Free Domain Adaptation Model for Semantic ProcessingYNU-HPCC at SemEval-2021 Task 10: Using a Transformer-based Source-Free Domain Adaptation Model for Semantic Processing
Zhewen Yu | Jin Wang | Xuejie Zhang

Data sharing restrictions are common in NLP datasets. The purpose of this task is to develop a model trained in a source domain to make predictions for a target domain with related domain data. To address the issue, the organizers provided the models that fine-tuned a large number of source domain data on pre-trained models and the dev data for participants. But the source domain data was not distributed. This paper describes the provided model to the NER (Name entity recognition) task and the ways to develop the model. As a little data provided, pre-trained models are suitable to solve the cross-domain tasks. The models fine-tuned by large number of another domain could be effective in new domain because the task had no change.

pdf bib
UOR at SemEval-2021 Task 12 : On Crowd Annotations ; Learning with Disagreements to optimise crowd truthUOR at SemEval-2021 Task 12: On Crowd Annotations; Learning with Disagreements to optimise crowd truth
Emmanuel Osei-Brefo | Thanet Markchom | Huizhi Liang

Crowdsourcing has been ubiquitously used for annotating enormous collections of data. However, the major obstacles to using crowd-sourced labels are noise and errors from non-expert annotations. In this work, two approaches dealing with the noise and errors in crowd-sourced labels are proposed. The first approach uses Sharpness-Aware Minimization (SAM), an optimization technique robust to noisy labels. The other approach leverages a neural network layer called softmax-Crowdlayer specifically designed to learn from crowd-sourced annotations. According to the results, the proposed approaches can improve the performance of the Wide Residual Network model and Multi-layer Perception model applied on crowd-sourced datasets in the image processing domain. It also has similar and comparable results with the majority voting technique when applied to the sequential data domain whereby the Bidirectional Encoder Representations from Transformers (BERT) is used as the base model in both instances.