Lexical and Computational Semantics and Semantic Evaluation (formerly Workshop on Sense Evaluation) (2020)


up

bib (full) Proceedings of the Fourteenth Workshop on Semantic Evaluation

pdf bib
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Aurelie Herbelot | Xiaodan Zhu | Alexis Palmer | Nathan Schneider | Jonathan May | Ekaterina Shutova

pdf bib
Discovery Team at SemEval-2020 Task 1 : Context-sensitive Embeddings Not Always Better than Static for Semantic Change DetectionSemEval-2020 Task 1: Context-sensitive Embeddings Not Always Better than Static for Semantic Change Detection
Matej Martinc | Syrielle Montariol | Elaine Zosa | Lidia Pivovarova

This paper describes the approaches used by the Discovery Team to solve SemEval-2020 Task 1-Unsupervised Lexical Semantic Change Detection. The proposed method is based on clustering of BERT contextual embeddings, followed by a comparison of cluster distributions across time. The best results were obtained by an ensemble of this method and static Word2Vec embeddings. According to the official results, our approach proved the best for Latin in Subtask 2.

pdf bib
GM-CTSC at SemEval-2020 Task 1 : Gaussian Mixtures Cross Temporal Similarity ClusteringGM-CTSC at SemEval-2020 Task 1: Gaussian Mixtures Cross Temporal Similarity Clustering
Pierluigi Cassotti | Annalina Caputo | Marco Polignano | Pierpaolo Basile

This paper describes the system proposed by the Random team for SemEval-2020 Task 1 : Unsupervised Lexical Semantic Change Detection. We focus our approach on the detection problem. Given the semantics of words captured by temporal word embeddings in different time periods, we investigate the use of unsupervised methods to detect when the target word has gained or lost senses. To this end, we define a new algorithm based on Gaussian Mixture Models to cluster the target similarities computed over the two periods. We compare the proposed approach with a number of similarity-based thresholds. We found that, although the performance of the detection methods varies across the word embedding algorithms, the combination of Gaussian Mixture with Temporal Referencing resulted in our best system.

pdf bib
RIJP at SemEval-2020 Task 1 : Gaussian-based Embeddings for Semantic Change DetectionRIJP at SemEval-2020 Task 1: Gaussian-based Embeddings for Semantic Change Detection
Ran Iwamoto | Masahiro Yukawa

This paper describes the model proposed and submitted by our RIJP team to SemEval 2020 Task1 : Unsupervised Lexical Semantic Change Detection. In the model, words are represented by Gaussian distributions. For Subtask 1, the model achieved average scores of 0.51 and 0.70 in the evaluation and post-evaluation processes, respectively. The higher score in the post-evaluation process than that in the evaluation process was achieved owing to appropriate parameter tuning. The results indicate that the proposed Gaussian-based embedding model is able to express semantic shifts while having a low computational

pdf bib
UiO-UvA at SemEval-2020 Task 1 : Contextualised Embeddings for Lexical Semantic Change DetectionUiO-UvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical Semantic Change Detection
Andrey Kutuzov | Mario Giulianelli

We apply contextualised word embeddings to lexical semantic change detection in the SemEval-2020 Shared Task 1. This paper focuses on Subtask 2, ranking words by the degree of their semantic drift over time. We analyse the performance of two contextualising architectures (BERT and ELMo) and three change detection algorithms. We find that the most effective algorithms rely on the cosine similarity between averaged token embeddings and the pairwise distances between token embeddings. They outperform strong baselines by a large margin (in the post-evaluation phase, we have the best Subtask 2 submission for SemEval-2020 Task 1), but interestingly, the choice of a particular algorithm depends on the distribution of gold scores in the test set.

pdf bib
BMEAUT at SemEval-2020 Task 2 : Lexical Entailment with Semantic GraphsBMEAUT at SemEval-2020 Task 2: Lexical Entailment with Semantic Graphs
Ádám Kovács | Kinga Gémes | Andras Kornai | Gábor Recski

In this paper we present a novel rule-based, language independent method for determining lexical entailment relations using semantic representations built from Wiktionary definitions. Combined with a simple WordNet-based method our system achieves top scores on the English and Italian datasets of the Semeval-2020 task Predicting Multilingual and Cross-lingual (graded) Lexical Entailment (Glava et al., 2020). A detailed error analysis of our output uncovers future di- rections for improving both the semantic parsing method and the inference process on semantic graphs.

pdf bib
BRUMS at SemEval-2020 Task 3 : Contextualised Embeddings for Predicting the (Graded) Effect of Context in Word SimilarityBRUMS at SemEval-2020 Task 3: Contextualised Embeddings for Predicting the (Graded) Effect of Context in Word Similarity
Hansi Hettiarachchi | Tharindu Ranasinghe

This paper presents the team BRUMS submission to SemEval-2020 Task 3 : Graded Word Similarity in Context. The system utilises state-of-the-art contextualised word embeddings, which have some task-specific adaptations, including stacked embeddings and average embeddings. Overall, the approach achieves good evaluation scores across all the languages, while maintaining simplicity. Following the final rankings, our approach is ranked within the top 5 solutions of each language while preserving the 1st position of Finnish subtask 2.

pdf bib
UZH at SemEval-2020 Task 3 : Combining BERT with WordNet Sense Embeddings to Predict Graded Word Similarity ChangesUZH at SemEval-2020 Task 3: Combining BERT with WordNet Sense Embeddings to Predict Graded Word Similarity Changes
Li Tang

CoSimLex is a dataset that can be used to evaluate the ability of context-dependent word embed- dings for modeling subtle, graded changes of meaning, as perceived by humans during reading. At SemEval-2020, task 3, subtask 1 is about predicting the (graded) effect of context in word similarity, using CoSimLex to quantify such a change of similarity for a pair of words, from one context to another. Here, a meaning shift is composed of two aspects, a) discrete changes observed between different word senses, and b) more subtle changes of meaning representation that are not captured in those discrete changes. Therefore, this SemEval task was designed to allow the evaluation of systems that can deal with a mix of both situations of semantic shift, as they occur in the human perception of meaning. The described system was developed to improve the BERT baseline provided with the task, by reducing distortions in the BERT semantic space, compared to the human semantic space. To this end, complementarity between 768- and 1024-dimensional BERT embeddings, and average word sense vectors were used. With this system, after some fine-tuning, the baseline performance of 0.705 (uncentered Pearson correlation with human semantic shift data from 27 annotators) was enhanced by more than 6 %, to 0.7645. We hope that this work can make a contribution to further our understanding of the semantic vector space of human perception, as it can be modeled with context-dependent word embeddings in natural language processing systems.

pdf bib
DCC-Uchile at SemEval-2020 Task 1 : Temporal Referencing Word EmbeddingsDCC-Uchile at SemEval-2020 Task 1: Temporal Referencing Word Embeddings
Frank D. Zamora-Reina | Felipe Bravo-Marquez

We present a system for the task of unsupervised lexical change detection : given a target word and two corpora spanning different periods of time, automatically detects whether the word has lost or gained senses from one corpus to another. Our system employs the temporal referencing method to obtain compatible representations of target words in different periods of time. This is done by concatenating corpora of different periods and performing a temporal referencing of target words i.e., treating occurrences of target words in different periods as two independent tokens. Afterwards, we train word embeddings on the joint corpus and compare the referenced vectors of each target word using cosine similarity. Our submission was ranked 7th among 34 teams for subtask 1, obtaining an average accuracy of 0.637, only 0.050 points behind the first ranked system.

pdf bib
SST-BERT at SemEval-2020 Task 1 : Semantic Shift Tracing by Clustering in BERT-based Embedding SpacesSST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in BERT-based Embedding Spaces
Vani Kanjirangat | Sandra Mitrovic | Alessandro Antonucci | Fabio Rinaldi

Lexical semantic change detection (also known as semantic shift tracing) is a task of identifying words that have changed their meaning over time. Unsupervised semantic shift tracing, focal point of SemEval2020, is particularly challenging. Given the unsupervised setup, in this work, we propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings. As such, disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages. To leverage this idea, clustering is performed on contextualized (BERT-based) embeddings of word occurrences. The obtained results show that our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.

pdf bib
TemporalTeller at SemEval-2020 Task 1 : Unsupervised Lexical Semantic Change Detection with Temporal ReferencingTemporalTeller at SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection with Temporal Referencing
Jinan Zhou | Jiaxin Li

This paper describes our TemporalTeller system for SemEval Task 1 : Unsupervised Lexical Semantic Change Detection. We develop a unified framework for the common semantic change detection pipelines including preprocessing, learning word embeddings, calculating vector distances and determining threshold. We also propose Gamma Quantile Threshold to distinguish between changed and stable words. Based on our system, we conduct a comprehensive comparison among BERT, Skip-gram, Temporal Referencing and alignment-based methods. Evaluation results show that Skip-gram with Temporal Referencing achieves the best performance of 66.5 % classification accuracy and 51.8 % Spearman’s Ranking Correlation.

pdf bib
Ferryman at SemEval-2020 Task 3 : Bert with TFIDF-Weighting for Predicting the Effect of Context in Word SimilaritySemEval-2020 Task 3: Bert with TFIDF-Weighting for Predicting the Effect of Context in Word Similarity
Weilong Chen | Xin Yuan | Sai Zhang | Jiehui Wu | Yanru Zhang | Yan Wang

Word similarity is widely used in machine learning applications like searching engine and recommendation. Measuring the changing meaning of the same word between two different sentences is not only a way to handle complex features in word usage (such as sentence syntax and semantics), but also an important method for different word polysemy modeling. In this paper, we present the methodology proposed by team Ferryman. Our system is based on the Bidirectional Encoder Representations from Transformers (BERT) model combined with term frequency-inverse document frequency (TF-IDF), applying the method on the provided datasets called CoSimLex, which covers four different languages including English, Croatian, Slovene, and Finnish. Our team Ferryman wins the the first position for English task and the second position for Finnish in the subtask 1.

pdf bib
JUSTMasters at SemEval-2020 Task 3 : Multilingual Deep Learning Model to Predict the Effect of Context in Word SimilarityJUSTMasters at SemEval-2020 Task 3: Multilingual Deep Learning Model to Predict the Effect of Context in Word Similarity
Nour Al-khdour | Mutaz Bni Younes | Malak Abdullah | Mohammad AL-Smadi

There is a growing research interest in studying word similarity. Without a doubt, two similar words in a context may considered different in another context. Therefore, this paper investigates the effect of the context in word similarity. The SemEval-2020 workshop has provided a shared task (Task 3 : Predicting the (Graded) Effect of Context in Word Similarity). In this task, the organizers provided unlabeled datasets for four languages, English, Croatian, Finnish and Slovenian. Our team, JUSTMasters, has participated in this competition in the two subtasks : A and B. Our approach has used a weighted average ensembling method for different pretrained embeddings techniques for each of the four languages. Our proposed model outperformed the baseline models in both subtasks and acheived the best result for subtask 2 in English and Finnish, with score 0.725 and 0.68 respectively. We have been ranked the sixth for subtask 1, with scores for English, Croatian, Finnish, and Slovenian as follows : 0.738, 0.44, 0.546, 0.512.

pdf bib
Will_Go at SemEval-2020 Task 3 : An Accurate Model for Predicting the (Graded) Effect of Context in Word Similarity Based on BERTWill_Go at SemEval-2020 Task 3: An Accurate Model for Predicting the (Graded) Effect of Context in Word Similarity Based on BERT
Wei Bao | Hongshu Che | Jiandong Zhang

Natural Language Processing (NLP) has been widely used in the semantic analysis in recent years. Our paper mainly discusses a methodology to analyze the effect that context has on human perception of similar words, which is the third task of SemEval 2020. We apply several methods in calculating the distance between two embedding vector generated by Bidirectional Encoder Representation from Transformer (BERT). Our team will go won the 1st place in Finnish language track of subtask1, the second place in English track of subtask1.

pdf bib
SemEval-2020 Task 6 : Definition Extraction from Free Text with the DEFT CorpusSemEval-2020 Task 6: Definition Extraction from Free Text with the DEFT Corpus
Sasha Spala | Nicholas Miller | Franck Dernoncourt | Carl Dockhorn

Research on definition extraction has been conducted for well over a decade, largely with significant constraints on the type of definitions considered. In this work, we present DeftEval, a SemEval shared task in which participants must extract definitions from free text using a term-definition pair corpus that reflects the complex reality of definitions in natural language. Definitions and glosses in free text often appear without explicit indicators, across sentences boundaries, or in an otherwise complex linguistic manner. DeftEval involved 3 distinct subtasks : 1) Sentence classification, 2) sequence labeling, and 3) relation extraction.

pdf bib
IIE-NLP-NUT at SemEval-2020 Task 4 : Guiding PLM with Prompt Template Reconstruction Strategy for ComVEIIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE
Luxi Xing | Yuqiang Xie | Yue Hu | Wei Peng

This paper introduces our systems for the first two subtasks of SemEval Task4 : Commonsense Validation and Explanation. To clarify the intention for judgment and inject contrastive information for selection, we propose the input reconstruction strategy with prompt templates. Specifically, we formalize the subtasks into the multiple-choice question answering format and construct the input with the prompt templates, then, the final prediction of question answering is considered as the result of subtasks. Experimental results show that our approaches achieve significant performance compared with the baseline systems. Our approaches secure the third rank on both official test sets of the first two subtasks with an accuracy of 96.4 and an accuracy of 94.3 respectively.

pdf bib
BUT-FIT at SemEval-2020 Task 4 : Multilingual CommonsenseBUT-FIT at SemEval-2020 Task 4: Multilingual Commonsense
Josef Jon | Martin Fajcik | Martin Docekal | Pavel Smrz

We participated in all three subtasks. In subtasks A and B, our submissions are based on pretrained language representation models (namely ALBERT) and data augmentation. We experimented with solving the task for another language, Czech, by means of multilingual models and machine translated dataset, or translated model inputs. We show that with a strong machine translation system, our system can be used in another language with a small accuracy loss. In subtask C, our submission, which is based on pretrained sequence-to-sequence model (BART), ranked 1st in BLEU score ranking, however, we show that the correlation between BLEU and human evaluation, in which our submission ended up 4th, is low. We analyse the metrics used in the evaluation and we propose an additional score based on model from subtask B, which correlates well with our manual ranking, as well as reranking method based on the same principle. We performed an error and dataset analysis for all subtasks and we present our findings.

pdf bib
Masked Reasoner at SemEval-2020 Task 4 : Fine-Tuning RoBERTa for Commonsense ReasoningSemEval-2020 Task 4: Fine-Tuning RoBERTa for Commonsense Reasoning
Daming Lu

This paper describes the masked reasoner system that participated in SemEval-2020 Task 4 : Commonsense Validation and Explanation. The system participated in the subtask B.We proposes a novel method to fine-tune RoBERTa by masking the most important word in the statement. We believe that the confidence of the system in recovering that word is positively correlated to the score the masked language model gives to the current statement-explanation pair. We evaluate the importance of each word using InferSent and do the masked fine-tuning on RoBERTa. Then we use the fine-tuned model to predict the most plausible explanation. Our system is fast in training and achieved 73.5 % accuracy.

pdf bib
UoR at SemEval-2020 Task 4 : Pre-trained Sentence Transformer Models for Commonsense Validation and ExplanationUoR at SemEval-2020 Task 4: Pre-trained Sentence Transformer Models for Commonsense Validation and Explanation
Thanet Markchom | Bhuvana Dhruva | Chandresh Pravin | Huizhi Liang

SemEval Task 4 Commonsense Validation and Explanation Challenge is to validate whether a system can differentiate natural language statements that make sense from those that do not make sense. Two subtasks, A and B, are focused in this work, i.e., detecting against-common-sense statements and selecting explanations of why they are false from the given options. Intuitively, commonsense validation requires additional knowledge beyond the given statements. Therefore, we propose a system utilising pre-trained sentence transformer models based on BERT, RoBERTa and DistillBERT architectures to embed the statements before classification. According to the results, these embeddings can improve the performance of the typical MLP and LSTM classifiers as downstream models of both subtasks compared to regular tokenised statements. These embedded statements are shown to comprise additional information from external resources which help validate common sense in natural language.

pdf bib
BUT-FIT at SemEval-2020 Task 5 : Automatic Detection of Counterfactual Statements with Deep Pre-trained Language Representation ModelsBUT-FIT at SemEval-2020 Task 5: Automatic Detection of Counterfactual Statements with Deep Pre-trained Language Representation Models
Martin Fajcik | Josef Jon | Martin Docekal | Pavel Smrz

This paper describes BUT-FIT’s submission at SemEval-2020 Task 5 : Modelling Causal Reasoning in Language : Detecting Counterfactuals. The challenge focused on detecting whether a given statement contains a counterfactual (Subtask 1) and extracting both antecedent and consequent parts of the counterfactual from the text (Subtask 2). We experimented with various state-of-the-art language representation models (LRMs). We found RoBERTa LRM to perform the best in both subtasks. We achieved the first place in both exact match and F1 for Subtask 2 and ranked second for Subtask 1.

pdf bib
ACNLP at SemEval-2020 Task 6 : A Supervised Approach for Definition ExtractionACNLP at SemEval-2020 Task 6: A Supervised Approach for Definition Extraction
Fabien Caspani | Pirashanth Ratnamogan | Mathis Linger | Mhamed Hajaiej

We describe our contribution to two of the subtasks of SemEval 2020 Task 6, DeftEval : Extracting term-definition pairs in free text. The system for Subtask 1 : Sentence Classification is based on a transformer architecture where we use transfer learning to fine-tune a pretrained model on the downstream task, and the one for Subtask 3 : Relation Classification uses a Random Forest classifier with handcrafted dedicated features. Our systems respectively achieve 0.830 and 0.994 F1-scores on the official test set, and we believe that the insights derived from our study are potentially relevant to help advance the research on definition extraction.

pdf bib
CN-HIT-IT.NLP at SemEval-2020 Task 4 : Enhanced Language Representation with Multiple Knowledge TriplesCN-HIT-IT.NLP at SemEval-2020 Task 4: Enhanced Language Representation with Multiple Knowledge Triples
Yice Zhang | Jiaxuan Lin | Yang Fan | Peng Jin | Yuanchao Liu | Bingquan Liu

This paper describes our system that participated in the SemEval-2020 task 4 : Commonsense Validation and Explanation. For this task, it is obvious that external knowledge, such as Knowledge graph, can help the model understand commonsense in natural language statements. But how to select the right triples for statements remains unsolved, so how to reduce the interference of irrelevant triples on model performance is a research focus. This paper adopt a modified K-BERT as the language encoder, to enhance language representation through triples from knowledge graphs. Experiments show that our method is better than models without external knowledge, and is slightly better than the original K-BERT. We got an accuracy score of 0.97 in subtaskA, ranking 1/45, and got an accuracy score of 0.948, ranking 2/35.

pdf bib
CS-NLP Team at SemEval-2020 Task 4 : Evaluation of State-of-the-art NLP Deep Learning Architectures on Commonsense Reasoning TaskCS-NLP Team at SemEval-2020 Task 4: Evaluation of State-of-the-art NLP Deep Learning Architectures on Commonsense Reasoning Task
Sirwe Saeedi | Aliakbar Panahi | Seyran Saeedi | Alvis C Fong

In this paper, we investigate a commonsense inference task that unifies natural language understanding and commonsense reasoning. We describe our attempt at SemEval-2020 Task 4 competition : Commonsense Validation and Explanation (ComVE) challenge. We discuss several state-of-the-art deep learning architectures for this challenge. Our system uses prepared labeled textual datasets that were manually curated for three different natural language inference subtasks. The goal of the first subtask is to test whether a model can distinguish between natural language statements that make sense and those that do not make sense. We compare the performance of several language models and fine-tuned classifiers. Then, we propose a method inspired by question / answering tasks to treat a classification problem as a multiple choice question task to boost the performance of our experimental results (96.06 %), which is significantly better than the baseline. For the second subtask, which is to select the reason why a statement does not make sense, we stand within the first six teams (93.7 %) among 27 participants with very competitive results. Our result for last subtask of generating reason against the nonsense statement shows many potentials for future researches as we applied the most powerful generative model of language (GPT-2) with 6.1732 BLEU score among first four teams.

pdf bib
JBNU at SemEval-2020 Task 4 : BERT and UniLM for Commonsense Validation and ExplanationJBNU at SemEval-2020 Task 4: BERT and UniLM for Commonsense Validation and Explanation
Seung-Hoon Na | Jong-Hyeon Lee

This paper presents our contributions to the SemEval-2020 Task 4 Commonsense Validation and Explanation (ComVE) and includes the experimental results of the two Subtasks B and C of the SemEval-2020 Task 4. Our systems rely on pre-trained language models, i.e., BERT (including its variants) and UniLM, and rank 10th and 7th among 27 and 17 systems on Subtasks B and C, respectively. We analyze the commonsense ability of the existing pretrained language models by testing them on the SemEval-2020 Task 4 ComVE dataset, specifically for Subtasks B and C, the explanation subtasks with multi-choice and sentence generation, respectively.

pdf bib
KaLM at SemEval-2020 Task 4 : Knowledge-aware Language Models for Comprehension and GenerationKaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension and Generation
Jiajing Wan | Xinting Huang

This paper presents our strategies in SemEval 2020 Task 4 : Commonsense Validation and Explanation. We propose a novel way to search for evidence and choose the different large-scale pre-trained models as the backbone for three subtasks. The results show that our evidence-searching approach improves model performance on commonsense explanation task. Our team ranks 2nd in subtask C according to human evaluation score.

pdf bib
LMVE at SemEval-2020 Task 4 : Commonsense Validation and Explanation Using Pretraining Language ModelLMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation Using Pretraining Language Model
Shilei Liu | Yu Guo | BoChao Li | Feiliang Ren

This paper introduces our system for commonsense validation and explanation. For Sen-Making task, we use a novel pretraining language model based architecture to pick out one of the two given statements that is againstcommon sense. For Explanation task, we use a hint sentence mechanism to improve the performance greatly. In addition, we propose a subtask level transfer learning to share information between subtasks.

pdf bib
SSN-NLP at SemEval-2020 Task 4 : Text Classification and Generation on Common Sense Context Using Neural NetworksSSN-NLP at SemEval-2020 Task 4: Text Classification and Generation on Common Sense Context Using Neural Networks
Rishivardhan K. | Kayalvizhi S | Thenmozhi D. | Raghav R. | Kshitij Sharma

Common sense validation deals with testing whether a system can differentiate natural language statements that make sense from those that do not make sense. This paper describes the our approach to solve this challenge. For common sense validation with multi choice, we propose a stacking based approach to classify sentences that are more favourable in terms of common sense to the particular statement. We have used majority voting classifier methodology amongst three models such as Bidirectional Encoder Representations from Transformers (BERT), Micro Text Classification (Micro TC) and XLNet. For sentence generation, we used Neural Machine Translation (NMT) model to generate explanatory sentences.

pdf bib
UAICS at SemEval-2020 Task 4 : Using a Bidirectional Transformer for Task aUAICS at SemEval-2020 Task 4: Using a Bidirectional Transformer for Task a
Ciprian-Gabriel Cusmuliuc | Lucia-Georgiana Coca | Adrian Iftene

Commonsense Validation and Explanation has been a difficult task for machines since the dawn of computing. Although very trivial to humans it poses a high complexity for machines due to the necessity of inference over a pre-existing knowledge base. In order to try and solve this problem the SemEval 2020 Task 4-Commonsense Validation and Explanation (ComVE) aims to evaluate systems capable of multiple stages of ComVE. The challenge includes 3 tasks (A, B and C), each with it’s own requirements. Our team participated only in task A which required selecting the statement that made the least sense. We choose to use a bidirectional transformer in order to solve the challenge, this paper presents the details of our method, runs and result.

pdf bib
Warren at SemEval-2020 Task 4 : ALBERT and Multi-Task Learning for Commonsense ValidationSemEval-2020 Task 4: ALBERT and Multi-Task Learning for Commonsense Validation
Yuhang Wu | Hao Wu

This paper describes our system in subtask A of SemEval 2020 Shared Task 4. We propose a reinforcement learning model based on MTL(Multi-Task Learning) to enhance the prediction ability of commonsense validation. The experimental results demonstrate that our system outperforms the single-task text classification model. We combine MTL and ALBERT pretrain model to achieve an accuracy of 0.904 and our model is ranked 16th on the final leader board of the competition among the 45 teams.

pdf bib
ETHAN at SemEval-2020 Task 5 : Modelling Causal Reasoning in Language Using Neuro-symbolic Cloud ComputingETHAN at SemEval-2020 Task 5: Modelling Causal Reasoning in Language Using Neuro-symbolic Cloud Computing
Len Yabloko

I present ETHAN : Experimental Testing of Hybrid AI Node implemented entirely on free cloud computing infrastructure. The ultimate goal of this research is to create modular reusable hybrid neuro-symbolic architecture for Artificial Intelligence. As a test case I model natural language comprehension of causal relations from open domain text corpus that combines semi-supervised language model (Huggingface Transformers) with constituency and dependency parsers (Allen Institute for Artificial Intelligence.)

pdf bib
Ferryman as SemEval-2020 Task 5 : Optimized BERT for Detecting CounterfactualsSemEval-2020 Task 5: Optimized BERT for Detecting Counterfactuals
Weilong Chen | Yan Zhuang | Peng Wang | Feng Hong | Yan Wang | Yanru Zhang

The main purpose of this article is to state the effect of using different methods and models for counterfactual determination and detection of causal knowledge. Nowadays, counterfactual reasoning has been widely used in various fields. In the realm of natural language process(NLP), counterfactual reasoning has huge potential to improve the correctness of a sentence. In the shared Task 5 of detecting counterfactual in SemEval 2020, we pre-process the officially given dataset according to case conversion, extract stem and abbreviation replacement. We use last-5 bidirectional encoder representation from bidirectional encoder representation from transformer (BERT)and term frequencyinverse document frequency (TF-IDF) vectorizer for counterfactual detection. Meanwhile, multi-sample dropout and cross validation are used to improve versatility and prevent problems such as poor generosity caused by overfitting. Finally, our team Ferryman ranked the 8th place in the sub-task 1 of this competition.

pdf bib
Lee at SemEval-2020 Task 5 : ALBERT Model Based on the Maximum Ensemble Strategy and Different Data Sampling Methods for Detecting Counterfactual StatementsSemEval-2020 Task 5: ALBERT Model Based on the Maximum Ensemble Strategy and Different Data Sampling Methods for Detecting Counterfactual Statements
Junyi Li | Yuhang Wu | Bin Wang | Haiyan Ding

This article describes the system submitted to SemEval 2020 Task 5 : Modelling Causal Reasoning in Language : Detecting Counterfactuals. In this task, we only participate in the subtask A which is detecting counterfactual statements. In order to solve this sub-task, first of all, because of the problem of data balance, we use the undersampling and oversampling methods to process the data set. Second, we used the ALBERT model and the maximum ensemble method based on the ALBERT model. Our methods achieved a F1 score of 0.85 in subtask A.

pdf bib
NLU-Co at SemEval-2020 Task 5 : NLU / SVM Based Model Apply Tocharacterise and Extract Counterfactual Items on Raw DataNLU-Co at SemEval-2020 Task 5: NLU/SVM Based Model Apply Tocharacterise and Extract Counterfactual Items on Raw Data
Elvis Mboning Tchiaze | Damien Nouvel

In this article, we try to solve the problem of classification of counterfactual statements and extraction of antecedents / consequences in raw data, by mobilizing on one hand Support vector machine (SVMs) and on the other hand Natural Language Understanding (NLU) infrastructures available on the market for conversational agents. Our experiments allowed us to test different pipelines of two known platforms (Snips NLU and Rasa NLU). The results obtained show that a Rasa NLU pipeline, built with a well-preprocessed dataset and tuned algorithms, allows to model accurately the structure of a counterfactual event, in order to facilitate the identification and the extraction of its components.

pdf bib
YNU-oxz at SemEval-2020 Task 5 : Detecting Counterfactuals Based on Ordered Neurons LSTM and Hierarchical Attention NetworkYNU-oxz at SemEval-2020 Task 5: Detecting Counterfactuals Based on Ordered Neurons LSTM and Hierarchical Attention Network
Xiaozhi Ou | Shengyan Liu | Hongling Li

This paper describes the system and results of our team’s participation in SemEval-2020 Task5 : Modelling Causal Reasoning in Language : Detecting Counterfactuals, which aims to simulate counterfactual semantics and reasoning in natural language. This task contains two subtasks : Subtask1Detecting counterfactual statements and Subtask2Detecting antecedent and consequence. We only participated in Subtask1, aiming to determine whether a given sentence is counterfactual. In order to solve this task, we proposed a system based on Ordered Neurons LSTM (ON-LSTM) with Hierarchical Attention Network (HAN) and used Pooling operation for dimensionality reduction. Finally, we used the K-fold approach as the ensemble method. Our model achieved an F1 score of 0.7040 in Subtask1 (Ranked 16/27).

pdf bib
BERTatDE at SemEval-2020 Task 6 : Extracting Term-definition Pairs in Free Text Using Pre-trained ModelBERTatDE at SemEval-2020 Task 6: Extracting Term-definition Pairs in Free Text Using Pre-trained Model
Huihui Zhang | Feiliang Ren

Definition extraction is an important task in Nature Language Processing, and it is used to identify the terms and definitions related to terms. The task contains sentence classification task (i.e., classify whether it contains definition) and sequence labeling task (i.e., find the boundary of terms and definitions). The paper describes our system BERTatDE1 in sentence classification task (subtask 1) and sequence labeling task (subtask 2) in the definition extraction (SemEval-2020 Task 6). We use BERT to solve the multi-domain problems including the uncertainty of term boundary that is, different areas have different ways to definite the domain related terms. We use BERT, BiLSTM and attention in subtask 1 and our best result achieved 79.71 % in F1 and the eighteenth place in subtask 1. For the subtask 2, we use BERT, BiLSTM and CRF to sequence labeling, and achieve 40.73 % in Macro-averaged F1.

pdf bib
Defx at SemEval-2020 Task 6 : Joint Extraction of Concepts and Relations for Definition ExtractionSemEval-2020 Task 6: Joint Extraction of Concepts and Relations for Definition Extraction
Marc Hübner | Christoph Alt | Robert Schwarzenberg | Leonhard Hennig

Definition Extraction systems are a valuable knowledge source for both humans and algorithms. In this paper we describe our submissions to the DeftEval shared task (SemEval-2020 Task 6), which is evaluated on an English textbook corpus. We provide a detailed explanation of our system for the joint extraction of definition concepts and the relations among them. Furthermore we provide an ablation study of our model variations and describe the results of an error analysis.

pdf bib
UPB at SemEval-2020 Task 6 : Pretrained Language Models for Definition ExtractionUPB at SemEval-2020 Task 6: Pretrained Language Models for Definition Extraction
Andrei-Marius Avram | Dumitru-Clementin Cercel | Costin Chiru

This work presents our contribution in the context of the 6th task of SemEval-2020 : Extracting Definitions from Free Text in Textbooks (DeftEval). This competition consists of three subtasks with different levels of granularity : (1) classification of sentences as definitional or non-definitional, (2) labeling of definitional sentences, and (3) relation classification. We use various pretrained language models (i.e., BERT, XLNet, RoBERTa, SciBERT, and ALBERT) to solve each of the three subtasks of the competition. Specifically, for each language model variant, we experiment by both freezing its weights and fine-tuning them. We also explore a multi-task architecture that was trained to jointly predict the outputs for the second and the third subtasks. Our best performing model evaluated on the DeftEval dataset obtains the 32nd place for the first subtask and the 37th place for the second subtask. The code is available for further research at :https://github.com/avramandrei/DeftEval\n

pdf bib
Buhscitu at SemEval-2020 Task 7 : Assessing Humour in Edited News Headlines Using Hand-Crafted Features and Online Knowledge BasesSemEval-2020 Task 7: Assessing Humour in Edited News Headlines Using Hand-Crafted Features and Online Knowledge Bases
Kristian Nørgaard Jensen | Nicolaj Filrup Rasmussen | Thai Wang | Marco Placenti | Barbara Plank

This paper describes a system that aims at assessing humour intensity in edited news headlines as part of the 7th task of SemEval-2020 on Humor, Emphasis and Sentiment. Various factors need to be accounted for in order to assess the funniness of an edited headline. We propose an architecture that uses hand-crafted features, knowledge bases and a language model to understand humour, and combines them in a regression model. Our system outperforms two baselines. In general, automatic humour assessment remains a difficult task.

pdf bib
Hasyarasa at SemEval-2020 Task 7 : Quantifying Humor as Departure from ExpectednessSemEval-2020 Task 7: Quantifying Humor as Departure from Expectedness
Ravi Theja Desetty | Ranit Chatterjee | Smita Ghaisas

This paper describes our system submission Hasyarasa for the SemEval-2020 Task-7 : Assessing Humor in Edited News Headlines. This task has two subtasks. The goal of Subtask 1 is to predict the mean funniness of the edited headline given the original and the edited headline. In Subtask 2, given two edits on the original headline, the goal is to predict the funnier of the two. We observed that the departure from expected state/ actions of situations/ individuals is the cause of humor in the edited headlines. We propose two novel features : Contextual Semantic Distance and Contextual Neighborhood Distance to estimate this departure and thus capture the contextual absurdity and hence the humor in the edited headlines. We have used these features together with a Bi-LSTM Attention based model and have achieved 0.53310 RMSE for Subtask 1 and 60.19 % accuracy for Subtask 2.

pdf bib
YNU-HPCC at SemEval-2020 Task 7 : Using an Ensemble BiGRU Model to Evaluate the Humor of Edited News TitlesYNU-HPCC at SemEval-2020 Task 7: Using an Ensemble BiGRU Model to Evaluate the Humor of Edited News Titles
Joseph Tomasulo | Jin Wang | Xuejie Zhang

This paper describes an ensemble model designed for Semeval-2020 Task 7. The task is based on the Humicroedit dataset that is comprised of news titles and one-word substitutions designed to make them humorous. We use BERT, FastText, Elmo, and Word2Vec to encode these titles then pass them to a bidirectional gated recurrent unit (BiGRU) with attention. Finally, we used XGBoost on the concatenation of the results of the different models to make predictions.

pdf bib
NLP_UIOWA at SemEval-2020 Task 8 : You’re Not the Only One Cursed with Knowledge-Multi Branch Model Memotion AnalysisNLP_UIOWA at SemEval-2020 Task 8: You’re Not the Only One Cursed with Knowledge - Multi Branch Model Memotion Analysis
Ingroj Shrestha | Jonathan Rusert

We propose hybrid models (HybridE and HybridW) for meme analysis (SemEval 2020 Task 8), which involves sentiment classification (Subtask A), humor classification (Subtask B), and scale of semantic classes (Subtask C). The hybrid model consists of BLSTM and CNN for text and image processing respectively. HybridE provides equal weight to BLSTM and CNN performance, while HybridW provides weightage based on the performance of BLSTM and CNN on a validation set. The performances (macro F1) of our hybrid model on Subtask A are 0.329 (HybridE), 0.328 (HybridW), on Subtask B are 0.507 (HybridE), 0.512 (HybridW), and on Subtask C are 0.309 (HybridE), 0.311 (HybridW).

pdf bib
CS-Embed at SemEval-2020 Task 9 : The Effectiveness of Code-switched Word Embeddings for Sentiment AnalysisCS-Embed at SemEval-2020 Task 9: The Effectiveness of Code-switched Word Embeddings for Sentiment Analysis
Frances Adriana Laureano De Leon | Florimond Guéniat | Harish Tayyar Madabushi

The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching. While recent research into code-switched posts has focused on the use of multilingual word embeddings, these embeddings were not trained on code-switched data. In this work, we present word-embeddings trained on code-switched tweets, specifically those that make use of Spanish and English, known as Spanglish. We explore the embedding space to discover how they capture the meanings of words in both languages. We test the effectiveness of these embeddings by participating in SemEval 2020 Task 9 : Sentiment Analysis on Code-Mixed Social Media Text. We utilised them to train a sentiment classifier that achieves an F-1 score of 0.722. This is higher than the baseline for the competition of 0.656, with our team (codalab username francesita) ranking 14 out of 29 participating teams, beating the baseline.Sentiment Analysis on Code-Mixed Social Media Text. We utilised them to train a sentiment classifier that achieves an F-1 score of 0.722. This is higher than the baseline for the competition of 0.656, with our team (codalab username francesita) ranking 14 out of 29 participating teams, beating the baseline.

pdf bib
FII-UAIC at SemEval-2020 Task 9 : Sentiment Analysis for Code-Mixed Social Media Text Using CNNFII-UAIC at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text Using CNN
Lavinia Aparaschivei | Andrei Palihovici | Daniela Gîfu

The Sentiment Analysis for Code-Mixed Social Media Text task at the SemEval 2020 competition focuses on sentiment analysis in code-mixed social media text, specifically, on the combination of English with Spanish (Spanglish) and Hindi (Hinglish). In this paper, we present a system able to classify tweets, from Spanish and English languages, into positive, negative and neutral. Firstly, we built a classifier able to provide corresponding sentiment labels. Besides the sentiment labels, we provide the language labels at the word level. Secondly, we generate a word-level representation, using Convolutional Neural Network (CNN) architecture. Our solution indicates promising results for the Sentimix Spanglish-English task (0.744), the team, Lavinia_Ap, occupied the 9th place. However, for the Sentimix Hindi-English task (0.324) the results have to be improved.

pdf bib
NLP-CIC at SemEval-2020 Task 9 : Analysing Sentiment in Code-switching Language Using a Simple Deep-learning ClassifierNLP-CIC at SemEval-2020 Task 9: Analysing Sentiment in Code-switching Language Using a Simple Deep-learning Classifier
Jason Angel | Segun Taofeek Aroyehun | Antonio Tamayo | Alexander Gelbukh

Code-switching is a phenomenon in which two or more languages are used in the same message. Nowadays, it is quite common to find messages with languages mixed in social media. This phenomenon presents a challenge for sentiment analysis. In this paper, we use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages. Our simple approach achieved a F1-score of 0:71 on test set on the competition. We analyze our best model capabilities and perform error analysis to expose important difficulties for classifying sentiment in a code-switching setting.

pdf bib
Palomino-Ochoa at SemEval-2020 Task 9 : Robust System Based on Transformer for Code-Mixed Sentiment ClassificationSemEval-2020 Task 9: Robust System Based on Transformer for Code-Mixed Sentiment Classification
Daniel Palomino | José Ochoa-Luna

We present a transfer learning system to perform a mixed Spanish-English sentiment classification task. Our proposal uses the state-of-the-art language model BERT and embed it within a ULMFiT transfer learning pipeline. This combination allows us to predict the polarity detection of code-mixed (English-Spanish) tweets. Thus, among 29 submitted systems, our approach (referred to as dplominop) is ranked 4th on the Sentimix Spanglish test set of SemEval 2020 Task 9. In fact, our system yields the weighted-F1 score value of 0.755 which can be easily reproduced the source code and implementation details are made available.

pdf bib
ULD@NUIG at SemEval-2020 Task 9 : Generative Morphemes with an Attention Model for Sentiment Analysis in Code-Mixed TextULD@NUIG at SemEval-2020 Task 9: Generative Morphemes with an Attention Model for Sentiment Analysis in Code-Mixed Text
Koustava Goswami | Priya Rani | Bharathi Raja Chakravarthi | Theodorus Fransen | John P. McCrae

Code mixing is a common phenomena in multilingual societies where people switch from one language to another for various reasons. Recent advances in public communication over different social media sites have led to an increase in the frequency of code-mixed usage in written language. In this paper, we present the Generative Morphemes with Attention (GenMA) Model sentiment analysis system contributed to SemEval 2020 Task 9 SentiMix. The system aims to predict the sentiments of the given English-Hindi code-mixed tweets without using word-level language tags instead inferring this automatically using a morphological model. The system is based on a novel deep neural network (DNN) architecture, which has outperformed the baseline F1-score on the test data-set as well as the validation data-set. Our results can be found under the user name koustava on the Sentimix Hindi English page.

pdf bib
ECNU at SemEval-2020 Task 7 : Assessing Humor in Edited News Headlines Using BiLSTM with AttentionECNU at SemEval-2020 Task 7: Assessing Humor in Edited News Headlines Using BiLSTM with Attention
Tiantian Zhang | Zhixuan Chen | Man Lan

In this paper we describe our system submitted to SemEval 2020 Task 7 : Assessing Humor in Edited News Headlines. We participated in all subtasks, in which the main goal is to predict the mean funniness of the edited headline given the original and the edited headline. Our system involves two similar sub-networks, which generate vector representations for the original and edited headlines respectively. And then we do a subtract operation of the outputs from two sub-networks to predict the funniness of the edited headline.

pdf bib
ELMo-NB at SemEval-2020 Task 7 : Assessing Sense of Humor in EditedNews Headlines Using ELMo and NBELMo-NB at SemEval-2020 Task 7: Assessing Sense of Humor in EditedNews Headlines Using ELMo and NB
Enas Khwaileh | Muntaha A. Al-As’ad

Our approach is constructed to improve on a couple of aspects ; preprocessing with an emphasis on humor sense detection, using embeddings from state-of-the-art language model(Elmo), and ensembling the results came up with using machine learning model Na ve Bayes(NB) with a deep learning pre-trained models. Elmo-NB participation has scored (0.5642) on the competition leader board, where results were measured by Root Mean Squared Error (RMSE).

pdf bib
Ferryman at SemEval-2020 Task 7 : Ensemble Model for Assessing Humor in Edited News HeadlinesSemEval-2020 Task 7: Ensemble Model for Assessing Humor in Edited News Headlines
Weilong Chen | Jipeng Li | Chenghao Huang | Wei Bai | Yanru Zhang | Yan Wang

Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis. In the shared task of assessing the funniness of edited news headlines, which is a part of the SemEval 2020 competition, we preprocess datasets by replacing abbreviation, stemming words, then merge three models including Light Gradient Boosting Machine (LightGBM), Long Short-Term Memory (LSTM), and Bidirectional Encoder Representation from Transformer (BERT) by taking the average to perform the best. Our team Ferryman wins the 9th place in Sub-task 1 of Task 7-Regression.

pdf bib
Funny3 at SemEval-2020 Task 7 : Humor Detection of Edited Headlines with LSTM and TFIDF Neural Network SystemSemEval-2020 Task 7: Humor Detection of Edited Headlines with LSTM and TFIDF Neural Network System
Xuefeng Luo | Kuan Tang

This paper presents a neural network system where we participate in the first task of SemEval-2020 shared task 7 Assessing the Funniness of Edited News Headlines. Our target is to create to neural network model that can predict the funniness of edited headlines. We build our model using a combination of LSTM and TF-IDF, then a feed-forward neural network. The system manages to slightly improve RSME scores regarding our mean score baseline.

pdf bib
HumorAAC at SemEval-2020 Task 7 : Assessing the Funniness of Edited News Headlines through Regression and Trump MentionsHumorAAC at SemEval-2020 Task 7: Assessing the Funniness of Edited News Headlines through Regression and Trump Mentions
Anna-Katharina Dick | Charlotte Weirich | Alla Kutkina

In this paper we describe our contribution to the Semeval-2020 Humor Assessment task. We essentially use three different features that are passed into a ridge regression to determine a funniness score for an edited news headline : statistical, count-based features, semantic features and contextual information. For deciding which one of two given edited headlines is funnier, we additionally use scoring information and logistic regression. Our work was mostly concentrated on investigating features, rather than improving prediction based on pre-trained language models. The resulting system is task-specific, lightweight and performs above the majority baseline. Our experiments indicate that features related to socio-cultural context, in our case mentions of Donald Trump, generally perform better than context-independent features like headline length.

pdf bib
MLEngineer at SemEval-2020 Task 7 : BERT-Flair Based Humor Detection Model (BFHumor)MLEngineer at SemEval-2020 Task 7: BERT-Flair Based Humor Detection Model (BFHumor)
Fara Shatnawi | Malak Abdullah | Mahmoud Hammad

Task 7, Assessing the Funniness of Edited News Headlines, in the International Workshop SemEval2020 introduces two sub-tasks to predict the funniness values of edited news headlines from the Reddit website. This paper proposes the BFHumor model of the MLEngineer team that participates in both sub-tasks in this competition. The BFHumor’s model is defined as a BERT-Flair based humor detection model that is a combination of different pre-trained models with various Natural Language Processing (NLP) techniques. The Bidirectional Encoder Representations from Transformers (BERT) regressor is considered the primary pre-trained model in our approach, whereas Flair is the main NLP library. It is worth mentioning that the BFHumor model has been ranked 4th in sub-task1 with a root mean square error (RMSE) value of 0.51966, and it is 0.02 away from the first ranked model. Also, the team is ranked 12th in the sub-task2 with an accuracy of 0.62291, which is 0.05 away from the top-ranked model. Our results indicate that the BFHumor model is one of the top models for detecting humor in the text.

pdf bib
UTFPR at SemEval-2020 Task 7 : Using Co-occurrence Frequencies to Capture UnexpectednessUTFPR at SemEval-2020 Task 7: Using Co-occurrence Frequencies to Capture Unexpectedness
Gustavo Henrique Paetzold

We describe the UTFPR system for SemEval-2020’s Task 7 : Assessing Humor in Edited News Headlines. Ours is a minimalist unsupervised system that uses word co-occurrence frequencies from large corpora to capture unexpectedness as a mean to capture funniness. Our system placed 22nd on the shared task’s Task 2. We found that our approach requires more text than we used to perform reliably, and that unexpectedness alone is not sufficient to gauge funniness for humorous content that targets a diverse target audience.

pdf bib
WUY at SemEval-2020 Task 7 : Combining BERT and Naive Bayes-SVM for Humor Assessment in Edited News HeadlinesWUY at SemEval-2020 Task 7: Combining BERT and Naive Bayes-SVM for Humor Assessment in Edited News Headlines
Cheng Zhang | Hayato Yamana

This paper describes our participation in SemEval 2020 Task 7 on assessment of humor in edited news headlines, which includes two subtasks, estimating the humor of micro-editd news headlines (subtask A) and predicting the more humorous of the two edited headlines (subtask B). To address these tasks, we propose two systems. The first system adopts a regression-based fine-tuned single-sequence bidirectional encoder representations from transformers (BERT) model with easy data augmentation (EDA), called BERT+EDA. The second system adopts a hybrid of a regression-based fine-tuned sequence-pair BERT model and a combined Naive Bayes and support vector machine (SVM) model estimated on term frequencyinverse document frequency (TFIDF) features, called BERT+NB-SVM. In this case, no additional training datasets were used, and the BERT+NB-SVM model outperformed BERT+EDA. The official root-mean-square deviation (RMSE) score for subtask A is 0.57369 and ranks 31st out of 48, whereas the best RMSE of BERT+NB-SVM is 0.52429, ranking 7th. For subtask B, we simply use a sequence-pair BERT model, the official accuracy of which is 0.53196 and ranks 25th out of 32.

pdf bib
BERT at SemEval-2020 Task 8 : Using BERT to Analyse Meme EmotionsBERT at SemEval-2020 Task 8: Using BERT to Analyse Meme Emotions
Adithya Avvaru | Sanath Vobilisetty

Sentiment analysis, being one of the most sought after research problems within Natural Language Processing (NLP) researchers. The range of problems being addressed by sentiment analysis is increasing. Till now, most of the research focuses on predicting sentiment, or sentiment categories like sarcasm, humor, offense and motivation on text data. But, there is very limited research that is focusing on predicting or analyzing the sentiment of internet memes. We try to address this problem as part of Task 8 of SemEval 2020 : Memotion Analysis. We have participated in all the three tasks under Memotion Analysis. Our system built using state-of-the-art Transformer-based pre-trained Bidirectional Encoder Representations from Transformers (BERT) performed better compared to baseline models for the two tasks A and C and performed close to the baseline model for task B. In this paper, we present the data used, steps used by us for data cleaning and preparation, the fine-tuning process for BERT based model and finally predict the sentiment or sentiment categories. We found that the sequence models like Long Short Term Memory(LSTM) and its variants performed below par in predicting the sentiments. We also performed a comparative analysis with other Transformer based models like DistilBERT and XLNet.

pdf bib
CSECU_KDE_MA at SemEval-2020 Task 8 : A Neural Attention Model for Memotion AnalysisCSECU_KDE_MA at SemEval-2020 Task 8: A Neural Attention Model for Memotion Analysis
Abu Nowshed Chy | Umme Aymun Siddiqua | Masaki Aono

A meme is a pictorial representation of an idea or theme. In the age of emerging volume of social media platforms, memes are spreading rapidly from person to person and becoming a trending ways of opinion expression. However, due to the multimodal characteristics of meme contents, detecting and analyzing the underlying emotion of a meme is a formidable task. In this paper, we present our approach for detecting the emotion of a meme defined in the SemEval-2020 Task 8. Our team CSECU_KDE_MA employs an attention-based neural network model to tackle the problem. Upon extracting the text contents from a meme using an optical character reader (OCR), we represent it using the distributed representation of words. Next, we perform the convolution based on multiple kernel sizes to obtain the higher-level feature sequences. The feature sequences are then fed into the attentive time-distributed bidirectional LSTM model to learn the long-term dependencies effectively. Experimental results show that our proposed neural model obtained competitive performance among the participants’ systems.

pdf bib
Hitachi at SemEval-2020 Task 8 : Simple but Effective Modality Ensemble for Meme Emotion RecognitionSemEval-2020 Task 8: Simple but Effective Modality Ensemble for Meme Emotion Recognition
Terufumi Morishita | Gaku Morio | Shota Horiguchi | Hiroaki Ozaki | Toshinori Miyoshi

Users of social networking services often share their emotions via multi-modal content, usually images paired with text embedded in them. SemEval-2020 task 8, Memotion Analysis, aims at automatically recognizing these emotions of so-called internet memes. In this paper, we propose a simple but effective Modality Ensemble that incorporates visual and textual deep-learning models, which are independently trained, rather than providing a single multi-modal joint network. To this end, we first fine-tune four pre-trained visual models (i.e., Inception-ResNet, PolyNet, SENet, and PNASNet) and four textual models (i.e., BERT, GPT-2, Transformer-XL, and XLNet). Then, we fuse their predictions with ensemble methods to effectively capture cross-modal correlations. The experiments performed on dev-set show that both visual and textual features aided each other, especially in subtask-C, and consequently, our system ranked 2nd on subtask-C.

pdf bib
Memebusters at SemEval-2020 Task 8 : Feature Fusion Model for Sentiment Analysis on Memes Using Transfer LearningSemEval-2020 Task 8: Feature Fusion Model for Sentiment Analysis on Memes Using Transfer Learning
Mayukh Sharma | Ilanthenral Kandasamy | W.b. Vasantha

In this paper, we describe our deep learning system used for SemEval 2020 Task 8 : Memotion analysis. We participated in all the subtasks i.e Subtask A : Sentiment classification, Subtask B : Humor classification, and Subtask C : Scales of semantic classes. Similar multimodal architecture was used for each subtask. The proposed architecture makes use of transfer learning for images and text feature extraction. The extracted features are then fused together using stacked bidirectional Long Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) model with attention mechanism for final predictions. We also propose a single model for predicting semantic classes (Subtask B) as well as their scales (Subtask C) by branching the final output of the post LSTM dense layers. Our model was ranked 5 in Subtask B and ranked 8 in Subtask C and performed nicely in Subtask A on the leader board. Our system makes use of transfer learning for feature extraction and fusion of image and text features for predictions.

pdf bib
SIS@IIITH at SemEval-2020 Task 8 : An Overview of Simple Text Classification Methods for Meme AnalysisSIS@IIITH at SemEval-2020 Task 8: An Overview of Simple Text Classification Methods for Meme Analysis
Sravani Boinepelli | Manish Shrivastava | Vasudeva Varma

Memes are steadily taking over the feeds of the public on social media. There is always the threat of malicious users on the internet posting offensive content, even through memes. Hence, the automatic detection of offensive images / memes is imperative along with detection of offensive text. However, this is a much more complex task as it involves both visual cues as well as language understanding and cultural / context knowledge. This paper describes our approach to the task of SemEval-2020 Task 8 : Memotion Analysis. We chose to participate only in Task A which dealt with Sentiment Classification, which we formulated as a text classification problem. Through our experiments, we explored multiple training models to evaluate the performance of simple text classification algorithms on the raw text obtained after running OCR on meme images. Our submitted model achieved an accuracy of 72.69 % and exceeded the existing baseline’s Macro F1 score by 8 % on the official test dataset. Apart from describing our official submission, we shall elucidate how different classification models respond to this task.

pdf bib
UoR at SemEval-2020 Task 8 : Gaussian Mixture Modelling (GMM) Based Sampling Approach for Multi-modal Memotion AnalysisUoR at SemEval-2020 Task 8: Gaussian Mixture Modelling (GMM) Based Sampling Approach for Multi-modal Memotion Analysis
Zehao Liu | Emmanuel Osei-Brefo | Siyuan Chen | Huizhi Liang

Memes are widely used on social media. They usually contain multi-modal information such as images and texts, serving as valuable data sources to analyse opinions and sentiment orientations of online communities. The provided memes data often face an imbalanced data problem, that is, some classes or labelled sentiment categories significantly outnumber other classes. This often results in difficulty in applying machine learning techniques where balanced labelled input data are required. In this paper, a Gaussian Mixture Model sampling method is proposed to tackle the problem of class imbalance for the memes sentiment classification task. To utilise both text and image data, a multi-modal CNN-LSTM model is proposed to jointly learn latent features for positive, negative and neutral category predictions. The experiments show that the re-sampling model can slightly improve the accuracy on the trial data of sub-task A of Task 8. The multi-modal CNN-LSTM model can achieve macro F1 score 0.329 on the test set.

pdf bib
BAKSA at SemEval-2020 Task 9 : Bolstering CNN with Self-Attention for Sentiment Analysis of Code Mixed TextBAKSA at SemEval-2020 Task 9: Bolstering CNN with Self-Attention for Sentiment Analysis of Code Mixed Text
Ayush Kumar | Harsh Agarwal | Keshav Bansal | Ashutosh Modi

Sentiment Analysis of code-mixed text has diversified applications in opinion mining ranging from tagging user reviews to identifying social or political sentiments of a sub-population. In this paper, we present an ensemble architecture of convolutional neural net (CNN) and self-attention based LSTM for sentiment analysis of code-mixed tweets. While the CNN component helps in the classification of positive and negative tweets, the self-attention based LSTM, helps in the classification of neutral tweets, because of its ability to identify correct sentiment among multiple sentiment bearing units. We achieved F1 scores of 0.707 (ranked 5th) and 0.725 (ranked 13th) on Hindi-English (Hinglish) and Spanish-English (Spanglish) datasets, respectively. The submissions for Hinglish and Spanglish tasks were made under the usernames ayushk and harsh_6 respectively.

pdf bib
Deep Learning Brasil-NLP at SemEval-2020 Task 9 : Sentiment Analysis of Code-Mixed Tweets Using Ensemble of Language ModelsNLP at SemEval-2020 Task 9: Sentiment Analysis of Code-Mixed Tweets Using Ensemble of Language Models
Manoel Veríssimo dos Santos Neto | Ayrton Amaral | Nádia Silva | Anderson da Silva Soares

In this paper, we describe a methodology to predict sentiment in code-mixed tweets (hindi-english). Our team called verissimo.manoel in CodaLab developed an approach based on an ensemble of four models (MultiFiT, BERT, ALBERT, and XLNET). The final classification algorithm was an ensemble of some predictions of all softmax values from these four models. This architecture was used and evaluated in the context of the SemEval 2020 challenge (task 9), and our system got 72.7 % on the F1 score.

pdf bib
IUST at SemEval-2020 Task 9 : Sentiment Analysis for Code-Mixed Social Media Text Using Deep Neural Networks and Linear BaselinesIUST at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text Using Deep Neural Networks and Linear Baselines
Soroush Javdan | Taha Shangipour ataei | Behrouz Minaei-Bidgoli

Sentiment Analysis is a well-studied field of Natural Language Processing. However, the rapid growth of social media and noisy content within them poses significant challenges in addressing this problem with well-established methods and tools. One of these challenges is code-mixing, which means using different languages to convey thoughts in social media texts. Our group, with the name of IUST(username : TAHA), participated at the SemEval-2020 shared task 9 on Sentiment Analysis for Code-Mixed Social Media Text, and we have attempted to develop a system to predict the sentiment of a given code-mixed tweet. We used different preprocessing techniques and proposed to use different methods that vary from NBSVM to more complicated deep neural network models. Our best performing method obtains an F1 score of 0.751 for the Spanish-English sub-task and 0.706 over the Hindi-English sub-task.

pdf bib
MeisterMorxrc at SemEval-2020 Task 9 : Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed TweetsMeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets
Qi Wu | Peng Wang | Chenghao Huang

Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis. In the shared task of sentiment analysis of code-mixed tweets, which is a part of the SemEval-2020 competition, we preprocess datasets by replacing emoji and deleting uncommon characters and so on, and then fine-tune the Bidirectional Encoder Representation from Transformers(BERT) to perform the best. After exhausting top3 submissions, Our team MeisterMorxrc achieves an averaged F1 score of 0.730 in this task, and and our codalab username is MeisterMorxrc

pdf bib
WESSA at SemEval-2020 Task 9 : Code-Mixed Sentiment Analysis Using TransformersWESSA at SemEval-2020 Task 9: Code-Mixed Sentiment Analysis Using Transformers
Ahmed Sultan | Mahmoud Salim | Amina Gaber | Islam El Hosary

In this paper, we describe our system submitted for SemEval 2020 Task 9, Sentiment Analysis for Code-Mixed Social Media Text alongside other experiments. Our best performing system is a Transfer Learning-based model that fine-tunes XLM-RoBERTa, a transformer-based multilingual masked language model, on monolingual English and Spanish data and Spanish-English code-mixed data. Our system outperforms the official task baseline by achieving a 70.1 % average F1-Score on the official leaderboard using the test set. For later submissions, our system manages to achieve a 75.9 % average F1-Score on the test set using CodaLab username ahmed0sultan.

pdf bib
Zyy1510 Team at SemEval-2020 Task 9 : Sentiment Analysis for Code-Mixed Social Media Text with Sub-word Level RepresentationsSemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text with Sub-word Level Representations
Yueying Zhu | Xiaobing Zhou | Hongling Li | Kunjie Dong

This paper reports the zyy1510 team’s work in the International Workshop on Semantic Evaluation (SemEval-2020) shared task on Sentiment analysis for Code-Mixed (Hindi-English, English-Spanish) Social Media Text. The purpose of this task is to determine the polarity of the text, dividing it into one of the three labels positive, negative and neutral. To achieve this goal, we propose an ensemble model of word n-grams-based Multinomial Naive Bayes (MNB) and sub-word level representations in LSTM (Sub-word LSTM) to identify the sentiments of code-mixed data of Hindi-English and English-Spanish. This ensemble model combines the advantage of rich sequential patterns and the intermediate features after convolution from the LSTM model, and the polarity of keywords from the MNB model to obtain the final sentiment score. We have tested our system on Hindi-English and English-Spanish code-mixed social media data sets released for the task. Our model achieves the F1 score of 0.647 in the Hindi-English task and 0.682 in the English-Spanish task, respectively.

pdf bib
SemEval-2020 Task 12 : Multilingual Offensive Language Identification in Social Media (OffensEval 2020)SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
Marcos Zampieri | Preslav Nakov | Sara Rosenthal | Pepa Atanasova | Georgi Karadzhov | Hamdy Mubarak | Leon Derczynski | Zeses Pitenis | Çağrı Çöltekin

We present the results and the main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval-2020). The task included three subtasks corresponding to the hierarchical taxonomy of the OLID schema from OffensEval-2019, and it was offered in five languages : Arabic, Danish, English, Greek, and Turkish. OffensEval-2020 was one of the most popular tasks at SemEval-2020, attracting a large number of participants across all subtasks and languages : a total of 528 teams signed up to participate in the task, 145 teams submitted official runs on the test data, and 70 teams submitted system description papers.

pdf bib
Galileo at SemEval-2020 Task 12 : Multi-lingual Learning for Offensive Language Identification Using Pre-trained Language ModelsSemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification Using Pre-trained Language Models
Shuohuan Wang | Jiaxiang Liu | Xuan Ouyang | Yu Sun

This paper describes Galileo’s performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A-Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B-Automatic Categorization of Offense Types and Sub-task C-Offence Target Identification.

pdf bib
Aschern at SemEval-2020 Task 11 : It Takes Three to Tango : RoBERTa, CRF, and Transfer LearningSemEval-2020 Task 11: It Takes Three to Tango: RoBERTa, CRF, and Transfer Learning
Anton Chernyavskiy | Dmitry Ilvovsky | Preslav Nakov

We describe our system for SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles. We developed ensemble models using RoBERTa-based neural architectures, additional CRF layers, transfer learning between the two subtasks, and advanced post-processing to handle the multi-label nature of the task, the consistency between nested spans, repetitions, and labels from similar spans in training. We achieved sizable improvements over baseline fine-tuned RoBERTa models, and the official evaluation ranked our system 3rd (almost tied with the 2nd) out of 36 teams on the span identification subtask with an F1 score of 0.491, and 2nd (almost tied with the 1st) out of 31 teams on the technique classification subtask with an F1 score of 0.62.

pdf bib
AdelaideCyC at SemEval-2020 Task 12 : Ensemble of Classifiers for Offensive Language Detection in Social MediaAdelaideCyC at SemEval-2020 Task 12: Ensemble of Classifiers for Offensive Language Detection in Social Media
Mahen Herath | Thushari Atapattu | Hoang Anh Dung | Christoph Treude | Katrina Falkner

This paper describes the systems our team (AdelaideCyC) has developed for SemEval Task 12 (OffensEval 2020) to detect offensive language in social media. The challenge focuses on three subtasks offensive language identification (subtask A), offense type identification (subtask B), and offense target identification (subtask C). Our team has participated in all the three subtasks. We have developed machine learning and deep learning-based ensembles of models. We have achieved F1-scores of 0.906, 0.552, and 0.623 in subtask A, B, and C respectively. While our performance scores are promising for subtask A, the results demonstrate that subtask B and C still remain challenging to classify.

pdf bib
GruPaTo at SemEval-2020 Task 12 : Retraining mBERT on Social Media and Fine-tuned Offensive Language ModelsGruPaTo at SemEval-2020 Task 12: Retraining mBERT on Social Media and Fine-tuned Offensive Language Models
Davide Colla | Tommaso Caselli | Valerio Basile | Jelena Mitrović | Michael Granitzer

We introduce an approach to multilingual Offensive Language Detection based on the mBERT transformer model. We download extra training data from Twitter in English, Danish, and Turkish, and use it to re-train the model. We then fine-tuned the model on the provided training data and, in some configurations, implement transfer learning approach exploiting the typological relatedness between English and Danish. Our systems obtained good results across the three languages (.9036 for EN,.7619 for DA, and.7789 for TR).

pdf bib
GUIR at SemEval-2020 Task 12 : Domain-Tuned Contextualized Models for Offensive Language DetectionGUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection
Sajad Sotudeh | Tong Xiang | Hao-Ren Yao | Sean MacAvaney | Eugene Yang | Nazli Goharian | Ophir Frieder

Offensive language detection is an important and challenging task in natural language processing. We present our submissions to the OffensEval 2020 shared task, which includes three English sub-tasks : identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C). Our experiments explore using a domain-tuned contextualized language model (namely, BERT) for this task. We also experiment with different components and configurations (e.g., a multi-view SVM) stacked upon BERT models for specific sub-tasks. Our submissions achieve F1 scores of 91.7 % in Sub-task A, 66.5 % in Sub-task B, and 63.2 % in Sub-task C. We perform an ablation study which reveals that domain tuning considerably improves the classification performance. Furthermore, error analysis shows common misclassification errors made by our model and outlines research directions for future.

pdf bib
IIITG-ADBU at SemEval-2020 Task 12 : Comparison of BERT and BiLSTM in Detecting Offensive LanguageIIITG-ADBU at SemEval-2020 Task 12: Comparison of BERT and BiLSTM in Detecting Offensive Language
Arup Baruah | Kaushik Das | Ferdous Barbhuiya | Kuntal Dey

Task 12 of SemEval 2020 consisted of 3 subtasks, namely offensive language identification (Subtask A), categorization of offense type (Subtask B), and offense target identification (Subtask C). This paper presents the results our classifiers obtained for the English language in the 3 subtasks. The classifiers used by us were BERT and BiLSTM. On the test set, our BERT classifier obtained macro F1 score of 0.90707 for subtask A, and 0.65279 for subtask B. The BiLSTM classifier obtained macro F1 score of 0.57565 for subtask C. The paper also performs an analysis of the errors made by our classifiers. We conjecture that the presence of few misleading instances in the dataset is affecting the performance of the classifiers. Our analysis also discusses the need of temporal context and world knowledge to determine the offensiveness of few comments.

pdf bib
NUIG at SemEval-2020 Task 12 : Pseudo Labelling for Offensive Content ClassificationNUIG at SemEval-2020 Task 12: Pseudo Labelling for Offensive Content Classification
Shardul Suryawanshi | Mihael Arcan | Paul Buitelaar

This work addresses the classification problem defined by sub-task A (English only) of the OffensEval 2020 challenge. We used a semi-supervised approach to classify given tweets into an offensive (OFF) or not-offensive (NOT) class. As the OffensEval 2020 dataset is loosely labelled with confidence scores given by unsupervised models, we used last year’s offensive language identification dataset (OLID) to label the OffensEval 2020 dataset. Our approach uses a pseudo-labelling method to annotate the current dataset. We trained four text classifiers on the OLID dataset and the classifier with the highest macro-averaged F1-score has been used to pseudo label the OffensEval 2020 dataset. The same model which performed best amongst four text classifiers on OLID dataset has been trained on the combined dataset of OLID and pseudo labelled OffensEval 2020. We evaluated the classifiers with precision, recall and macro-averaged F1-score as the primary evaluation metric on the OLID and OffensEval 2020 datasets. This work is licensed under a Creative Commons Attribution 4.0 International Licence. Licence details :.http://creativecommons.org/licenses/by/4.0/.

pdf bib
UHH-LT at SemEval-2020 Task 12 : Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language DetectionUHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection
Gregor Wiedemann | Seid Muhie Yimam | Chris Biemann

Fine-tuning of pre-trained transformer networks such as BERT yield state-of-the-art results for text classification tasks. Typically, fine-tuning is performed on task-specific training datasets in a supervised manner. One can also fine-tune in unsupervised manner beforehand by further pre-training the masked language modeling (MLM) task. Hereby, in-domain data for unsupervised MLM resembling the actual classification target dataset allows for domain adaptation of the model. In this paper, we compare current pre-trained transformer networks with and without MLM fine-tuning on their performance for offensive language detection. Our MLM fine-tuned RoBERTa-based classifier officially ranks 1st in the SemEval 2020 Shared Task 12 for the English language. Further experiments with the ALBERT model even surpass this result.

pdf bib
EL-BERT at SemEval-2020 Task 10 : A Multi-Embedding Ensemble Based Approach for Emphasis Selection in Visual MediaEL-BERT at SemEval-2020 Task 10: A Multi-Embedding Ensemble Based Approach for Emphasis Selection in Visual Media
Chandresh Kanani | Sriparna Saha | Pushpak Bhattacharyya

In visual media, text emphasis is the strengthening of words in a text to convey the intent of the author. Text emphasis in visual media is generally done by using different colors, backgrounds, or font for the text ; it helps in conveying the actual meaning of the message to the readers. Emphasis selection is the task of choosing candidate words for emphasis, it helps in automatically designing posters and other media contents with written text. If we consider only the text and do not know the intent, then there can be multiple valid emphasis selections. We propose the use of ensembles for emphasis selection to improve over single emphasis selection models. We show that the use of multi-embedding helps in enhancing the results for base models. To show the efficacy of proposed approach we have also done a comparison of our results with state-of-the-art models.

pdf bib
LAST at SemEval-2020 Task 10 : Finding Tokens to Emphasise in Short Written Texts with Precomputed Embedding Models and LightGBMLAST at SemEval-2020 Task 10: Finding Tokens to Emphasise in Short Written Texts with Precomputed Embedding Models and LightGBM
Yves Bestgen

To select tokens to be emphasised in short texts, a system mainly based on precomputed embedding models, such as BERT and ELMo, and LightGBM is proposed. Its performance is low. Additional analyzes suggest that its effectiveness is poor at predicting the highest emphasis scores while they are the most important for the challenge and that it is very sensitive to the specific instances provided during learning.

pdf bib
Randomseed19 at SemEval-2020 Task 10 : Emphasis Selection for Written Text in Visual MediaSemEval-2020 Task 10: Emphasis Selection for Written Text in Visual Media
Aleksandr Shatilov | Denis Gordeev | Alexey Rey

This paper describes our approach to emphasis selection for written text in visual media as a solution for SemEval 2020 Task 10. We used an ensemble of several different Transformer-based models and cast the task as a sequence labeling problem with two tags : ‘I’ as ‘emphasized’ and ‘O’ as ‘non-emphasized’ for each token in the text.

pdf bib
YNU-HPCC at SemEval-2020 Task 10 : Using a Multi-granularity Ordinal Classification of the BiLSTM Model for Emphasis SelectionYNU-HPCC at SemEval-2020 Task 10: Using a Multi-granularity Ordinal Classification of the BiLSTM Model for Emphasis Selection
Dawei Liao | Jin Wang | Xuejie Zhang

In this study, we propose a multi-granularity ordinal classification method to address the problem of emphasis selection. In detail, the word embedding is learned from Embeddings from Language Model (ELMO) to extract feature vector representation. Then, the ordinal classifica-tions are implemented on four different multi-granularities to approximate the continuous em-phasize values. Comparative experiments were conducted to compare the model with baseline in which the problem is transformed to label distribution problem.

pdf bib
JUST at SemEval-2020 Task 11 : Detecting Propaganda Techniques Using BERT Pre-trained ModelJUST at SemEval-2020 Task 11: Detecting Propaganda Techniques Using BERT Pre-trained Model
Ola Altiti | Malak Abdullah | Rasha Obiedat

This paper presents the submission to semeval-2020 task 11, Detection of Propaganda Techniques in News Articles. Knowing that there are two subtasks in this competition, we have participated in the Technique Classification subtask (TC), which aims to identify the propaganda techniques used in a specific propaganda span. We have used and implemented various models to detect propaganda. Our proposed model is based on BERT uncased pre-trained language model as it has achieved state-of-the-art performance on multiple NLP benchmarks. The performance results of our proposed model have scored 0.55307 F1-Score, which outperforms the baseline model provided by the organizers with 0.2519 F1-Score, and our model is 0.07 away from the best performing team. Compared to other participating systems, our submission is ranked 15th out of 31 participants.

pdf bib
NLFIIT at SemEval-2020 Task 11 : Neural Network Architectures for Detection of Propaganda Techniques in News ArticlesNLFIIT at SemEval-2020 Task 11: Neural Network Architectures for Detection of Propaganda Techniques in News Articles
Matej Martinkovic | Samuel Pecar | Marian Simko

Since propaganda became more common technique in news, it is very important to look for possibilities of its automatic detection. In this paper, we present neural model architecture submitted to the SemEval-2020 Task 11 competition : Detection of Propaganda Techniques in News Articles. We participated in both subtasks, propaganda span identification and propaganda technique classification. Our model utilizes recurrent Bi-LSTM layers with pre-trained word representations and also takes advantage of self-attention mechanism. Our model managed to achieve score 0.405 F1 for subtask 1 and 0.553 F1 for subtask 2 on test set resulting in 17th and 16th place in subtask 1 and subtask 2, respectively.

pdf bib
PsuedoProp at SemEval-2020 Task 11 : Propaganda Span Detection Using BERT-CRF and Ensemble Sentence Level ClassifierPsuedoProp at SemEval-2020 Task 11: Propaganda Span Detection Using BERT-CRF and Ensemble Sentence Level Classifier
Aniruddha Chauhan | Harshita Diddee

This paper explains our teams’ submission to the Shared Task of Fine-Grained Propaganda Detection in which we propose a sequential BERT-CRF based Span Identification model where the fine-grained detection is carried out only on the articles that are flagged as containing propaganda by an ensemble SLC model. We propose this setup bearing in mind the practicality of this approach in identifying propaganda spans in the exponentially increasing content base where the fine-tuned analysis of the entire data repository may not be the optimal choice due to its massive computational resource requirements. We present our analysis on different voting ensembles for the SLC model. Our system ranks 14th on the test set and 22nd on the development set and with an F1 score of 0.41 and 0.39 respectively.

pdf bib
SkoltechNLP at SemEval-2020 Task 11 : Exploring Unsupervised Text Augmentation for Propaganda DetectionSkoltechNLP at SemEval-2020 Task 11: Exploring Unsupervised Text Augmentation for Propaganda Detection
Daryna Dementieva | Igor Markov | Alexander Panchenko

This paper presents a solution for the Span Identification (SI) task in the Detection of Propaganda Techniques in News Articles competition at SemEval-2020. The goal of the SI task is to identify specific fragments of each article which contain the use of at least one propaganda technique. This is a binary sequence tagging task. We tested several approaches finally selecting a fine-tuned BERT model as our baseline model. Our main contribution is an investigation of several unsupervised data augmentation techniques based on distributional semantics expanding the original small training dataset as applied to this BERT-based sequence tagger. We explore various expansion strategies and show that they can substantially shift the balance between precision and recall, while maintaining comparable levels of the F1 score.

pdf bib
syrapropa at SemEval-2020 Task 11 : BERT-based Models Design for Propagandistic Technique and Span DetectionSemEval-2020 Task 11: BERT-based Models Design for Propagandistic Technique and Span Detection
Jinfen Li | Lu Xiao

This paper describes the BERT-based models proposed for two subtasks in SemEval-2020 Task 11 : Detection of Propaganda Techniques in News Articles. We first build the model for Span Identification (SI) based on SpanBERT, and facilitate the detection by a deeper model and a sentence-level representation. We then develop a hybrid model for the Technique Classification (TC). The hybrid model is composed of three submodels including two BERT models with different training methods, and a feature-based Logistic Regression model. We endeavor to deal with imbalanced dataset by adjusting cost function. We are in the seventh place in SI subtask (0.4711 of F1-measure), and in the third place in TC subtask (0.6783 of F1-measure) on the development set.

pdf bib
Team DiSaster at SemEval-2020 Task 11 : Combining BERT and Hand-crafted Features for Identifying Propaganda Techniques in NewsDiSaster at SemEval-2020 Task 11: Combining BERT and Hand-crafted Features for Identifying Propaganda Techniques in News
Anders Kaas | Viktor Torp Thomsen | Barbara Plank

The identification of communication techniques in news articles such as propaganda is important, as such techniques can influence the opinions of large numbers of people. Most work so far focused on the identification at the news article level. Recently, a new dataset and shared task has been proposed for the identification of propaganda techniques at the finer-grained span level. This paper describes our system submission to the subtask of technique classification (TC) for the SemEval 2020 shared task on detection of propaganda techniques in news articles. We propose a method of combining neural BERT representations with hand-crafted features via stacked generalization. Our model has the added advantage that it combines the power of contextual representations from BERT with simple span-based and article-based global features. We present an ablation study which shows that even though BERT representations are very powerful also for this task, BERT still benefits from being combined with carefully designed task-specific features.

pdf bib
TTUI at SemEval-2020 Task 11 : Propaganda Detection with Transfer Learning and EnsemblesTTUI at SemEval-2020 Task 11: Propaganda Detection with Transfer Learning and Ensembles
Moonsung Kim | Steven Bethard

In this paper, we describe our approaches and systems for the SemEval-2020 Task 11 on propaganda technique detection. We fine-tuned BERT and RoBERTa pre-trained models then merged them with an average ensemble. We conducted several experiments for input representations dealing with long texts and preserving context as well as for the imbalanced class problem. Our system ranked 20th out of 36 teams with 0.398 F1 in the SI task and 14th out of 31 teams with 0.556 F1 in the TC task.

pdf bib
UAIC1860 at SemEval-2020 Task 11 : Detection of Propaganda Techniques in News ArticlesUAIC1860 at SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles
Vlad Ermurachi | Daniela Gifu

The Detection of Propaganda Techniques in News Articles task at the SemEval 2020 competition focuses on detecting and classifying propaganda, pervasive in news article. In this paper, we present a system able to evaluate on sentence level, three traditional text representation techniques for these study goals, using : tf*idf, word and character n-grams. Firstly, we built a binary classifier able to provide corresponding propaganda labels, propaganda or non-propaganda. Secondly, we build a multilabel multiclass model to identify applied propaganda.

pdf bib
UMSIForeseer at SemEval-2020 Task 11 : Propaganda Detection by Fine-Tuning BERT with Resampling and Ensemble LearningUMSIForeseer at SemEval-2020 Task 11: Propaganda Detection by Fine-Tuning BERT with Resampling and Ensemble Learning
Yunzhe Jiang | Cristina Garbacea | Qiaozhu Mei

We describe our participation at the SemEval 2020 Detection of Propaganda Techniques in News Articles-Techniques Classification (TC) task, designed to categorize textual fragments into one of the 14 given propaganda techniques. Our solution leverages pre-trained BERT models. We present our model implementations, evaluation results and analysis of these results. We also investigate the potential of combining language models with resampling and ensemble learning methods to deal with data imbalance and improve performance.

pdf bib
UNTLing at SemEval-2020 Task 11 : Detection of Propaganda Techniques in English News ArticlesUNTLing at SemEval-2020 Task 11: Detection of Propaganda Techniques in English News Articles
Maia Petee | Alexis Palmer

Our system for the PropEval task explores the ability of semantic features to detect and label propagandistic rhetorical techniques in English news articles. For Subtask 2, labeling identified propagandistic fragments with one of fourteen technique labels, our system attains a micro-averaged F1 of 0.40 ; in this paper, we take a detailed look at the fourteen labels and how well our semantically-focused model detects each of them. We also propose strategies to fill the gaps.

pdf bib
Amsqr at SemEval-2020 Task 12 : Offensive Language Detection Using Neural Networks and Anti-adversarial FeaturesSemEval-2020 Task 12: Offensive Language Detection Using Neural Networks and Anti-adversarial Features
Alejandro Mosquera

This paper describes a method and system to solve the problem of detecting offensive language in social media using anti-adversarial features. Our submission to the SemEval-2020 task 12 challenge was generated by an stacked ensemble of neural networks fine-tuned on the OLID dataset and additional external sources. For Task-A (English), text normalisation filters were applied at both graphical and lexical level. The normalisation step effectively mitigates not only the natural presence of lexical variants but also intentional attempts to bypass moderation by introducing out of vocabulary words. Our approach provides strong F1 scores for both 2020 (0.9134) and 2019 (0.8258) challenges.

pdf bib
Hitachi at SemEval-2020 Task 12 : Offensive Language Identification with Noisy Labels Using Statistical Sampling and Post-ProcessingSemEval-2020 Task 12: Offensive Language Identification with Noisy Labels Using Statistical Sampling and Post-Processing
Manikandan Ravikiran | Amin Ekant Muljibhai | Toshinori Miyoshi | Hiroaki Ozaki | Yuta Koreeda | Sakata Masayuki

In this paper, we present our participation in SemEval-2020 Task-12 Subtask-A (English Language) which focuses on offensive language identification from noisy labels. To this end, we developed a hybrid system with the BERT classifier trained with tweets selected using Statistical Sampling Algorithm (SA) and Post-Processed (PP) using an offensive wordlist. Our developed system achieved 34th position with Macro-averaged F1-score (Macro-F1) of 0.90913 over both offensive and non-offensive classes. We further show comprehensive results and error analysis to assist future research in offensive language identification with noisy labels.

pdf bib
IR3218-UI at SemEval-2020 Task 12 : Emoji Effects on Offensive Language IdentifiCationIR3218-UI at SemEval-2020 Task 12: Emoji Effects on Offensive Language IdentifiCation
Sandy Kurniawan | Indra Budi | Muhammad Okky Ibrohim

In this paper, we present our approach and the results of our participation in OffensEval 2020. There are three sub-tasks in OffensEval 2020 namely offensive language identification (sub-task A), automatic categorization of offense types (sub-task B), and offense target identification (sub-task C). We participated in sub-task A of English OffensEval 2020. Our approach emphasizes on how the emoji affects offensive language identification. Our model used LSTM combined with GloVe pre-trained word vectors to identify offensive language on social media. The best model obtained macro F1-score of 0.88428.

pdf bib
JCT at SemEval-2020 Task 12 : Offensive Language Detection in Tweets Using Preprocessing Methods, Character and Word N-gramsJCT at SemEval-2020 Task 12: Offensive Language Detection in Tweets Using Preprocessing Methods, Character and Word N-grams
Moshe Uzan | Yaakov HaCohen-Kerner

In this paper, we describe our submissions to SemEval-2020 contest. We tackled subtask 12-Multilingual Offensive Language Identification in Social Media. We developed different models for four languages : Arabic, Danish, Greek, and Turkish. We applied three supervised machine learning methods using various combinations of character and word n-gram features. In addition, we applied various combinations of basic preprocessing methods. Our best submission was a model we built for offensive language identification in Danish using Random Forest. This model was ranked at the 6 position out of 39 submissions. Our result is lower by only 0.0025 than the result of the team that won the 4 place using entirely non-neural methods. Our experiments indicate that char ngram features are more helpful than word ngram features. This phenomenon probably occurs because tweets are more characterized by characters than by words, tweets are short, and contain various special sequences of characters, e.g., hashtags, shortcuts, slang words, and typos.

pdf bib
Lee at SemEval-2020 Task 12 : A BERT Model Based on the Maximum Self-ensemble Strategy for Identifying Offensive LanguageSemEval-2020 Task 12: A BERT Model Based on the Maximum Self-ensemble Strategy for Identifying Offensive Language
Junyi Li | Xiaobing Zhou | Zichen Zhang

This article describes the system submitted to SemEval 2020 Task 12 : OffensEval 2020. This task aims to identify and classify offensive languages in different languages on social media. We only participate in the English part of subtask A, which aims to identify offensive languages in English. To solve this task, we propose a BERT model system based on the transform mechanism, and use the maximum self-ensemble to improve model performance. Our model achieved a macro F1 score of 0.913(ranked 13/82) in subtask A.

pdf bib
LIIR at SemEval-2020 Task 12 : A Cross-Lingual Augmentation Approach for Multilingual Offensive Language IdentificationLIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification
Erfan Ghadery | Marie-Francine Moens

This paper presents our system entitled ‘LIIR’ for SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2). We have participated in sub-task A for English, Danish, Greek, Arabic, and Turkish languages. We adapt and fine-tune the BERT and Multilingual Bert models made available by Google AI for English and non-English languages respectively. For the English language, we use a combination of two fine-tuned BERT models. For other languages we propose a cross-lingual augmentation approach in order to enrich training data and we use Multilingual BERT to obtain sentence representations.

pdf bib
SalamNET at SemEval-2020 Task 12 : Deep Learning Approach for Arabic Offensive Language DetectionSalamNET at SemEval-2020 Task 12: Deep Learning Approach for Arabic Offensive Language Detection
Fatemah Husain | Jooyeon Lee | Sam Henry | Ozlem Uzuner

This paper describes SalamNET, an Arabic offensive language detection system that has been submitted to SemEval 2020 shared task 12 : Multilingual Offensive Language Identification in Social Media. Our approach focuses on applying multiple deep learning models and conducting in depth error analysis of results to provide system implications for future development considerations. To pursue our goal, a Recurrent Neural Network (RNN), a Gated Recurrent Unit (GRU), and Long-Short Term Memory (LSTM) models with different design architectures have been developed and evaluated. The SalamNET, a Bi-directional Gated Recurrent Unit (Bi-GRU) based model, reports a macro-F1 score of 0.83 %

pdf bib
Sonal.kumari at SemEval-2020 Task 12 : Social Media Multilingual Offensive Text Identification and Categorization Using Neural Network ModelsSemEval-2020 Task 12: Social Media Multilingual Offensive Text Identification and Categorization Using Neural Network Models
Sonal Kumari

In this paper, we present our approaches and results for SemEval-2020 Task 12, Multilingual Offensive Language Identification in Social Media (OffensEval 2020). The OffensEval 2020 had three subtasks : A) Identifying the tweets to be offensive (OFF) or non-offensive (NOT) for Arabic, Danish, English, Greek, and Turkish languages, B) Detecting if the offensive tweet is targeted (TIN) or untargeted (UNT) for the English language, and C) Categorizing the offensive targeted tweets into three classes, namely : individual (IND), Group (GRP), or Other (OTH) for the English language. We participate in all the subtasks A, B, and C. In our solution, first we use the pre-trained BERT model for all subtasks, A, B, and C and then we apply the BiLSTM model with attention mechanism (Attn-BiLSTM) for the same. Our result demonstrates that the pre-trained model is not giving good results for all types of languages and is compute and memory intensive whereas the Attn-BiLSTM model is fast and gives good accuracy with fewer resources. The Attn-BiLSTM model is giving better accuracy for Arabic and Greek where the pre-trained model is not able to capture the complete context of these languages due to lower vocab-size.

pdf bib
SSN_NLP_MLRG at SemEval-2020 Task 12 : Offensive Language Identification in English, Danish, Greek Using BERT and Machine Learning ApproachSSN_NLP_MLRG at SemEval-2020 Task 12: Offensive Language Identification in English, Danish, Greek Using BERT and Machine Learning Approach
A Kalaivani | Thenmozhi D.

Offensive language identification is to detect the hurtful tweets, derogatory comments, swear words on social media. As an emerging growth of social media communication, offensive language detection has received more attention in the last years ; we focus to perform the task on English, Danish and Greek. We have investigated which can be effect more on pre-trained models BERT (Bidirectional Encoder Representation from Transformer) and Machine Learning Approaches. Our investigation shows the difference performance between the three languages and to identify the best performance is evaluated by the classification algorithms. In the shared task SemEval-2020, our team SSN_NLP_MLRG submitted for three languages that are Subtasks A, B, C in English, Subtask A in Danish and Subtask A in Greek. Our team SSN_NLP_MLRG obtained the F1 Scores as 0.90, 0.61, 0.52 for the Subtasks A, B, C in English, 0.56 for the Subtask A in Danish and 0.67 for the Subtask A in Greek respectively.

pdf bib
TAC at SemEval-2020 Task 12 : Ensembling Approach for Multilingual Offensive Language Identification in Social MediaTAC at SemEval-2020 Task 12: Ensembling Approach for Multilingual Offensive Language Identification in Social Media
Talha Anwar | Omer Baig

Usage of offensive language on social media is getting more common these days, and there is a need of a mechanism to detect it and control it. This paper deals with offensive language detection in five different languages ; English, Arabic, Danish, Greek and Turkish. We presented an almost similar ensemble pipeline comprised of machine learning and deep learning models for all five languages. Three machine learning and four deep learning models were used in the ensemble. In the OffensEval-2020 competition our model achieved F1-score of 0.85, 0.74, 0.68, 0.81, and 0.9 for Arabic, Turkish, Danish, Greek and English language tasks respectively.

pdf bib
UoB at SemEval-2020 Task 12 : Boosting BERT with Corpus Level InformationUoB at SemEval-2020 Task 12: Boosting BERT with Corpus Level Information
Wah Meng Lim | Harish Tayyar Madabushi

Pre-trained language model word representation, such as BERT, have been extremely successful in several Natural Language Processing tasks significantly improving on the state-of-the-art. This can largely be attributed to their ability to better capture semantic information contained within a sentence. Several tasks, however, can benefit from information available at a corpus level, such as Term Frequency-Inverse Document Frequency (TF-IDF). In this work we test the effectiveness of integrating this information with BERT on the task of identifying abuse on social media and show that integrating this information with BERT does indeed significantly improve performance. We participate in Sub-Task A (abuse detection) wherein we achieve a score within two points of the top performing team and in Sub-Task B (target detection) wherein we are ranked 4 of the 44 participating teams.

up

bib (full) Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics

pdf bib
Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics
Iryna Gurevych | Marianna Apidianaki | Manaal Faruqui

pdf bib
Semantic Structural Decomposition for Neural Machine Translation
Elior Sulem | Omri Abend | Ari Rappoport

Building on recent advances in semantic parsing and text simplification, we investigate the use of semantic splitting of the source sentence as preprocessing for machine translation. We experiment with a Transformer model and evaluate using large-scale crowd-sourcing experiments. Results show a significant increase in fluency on long sentences on an English-to- French setting with a training corpus of 5 M sentence pairs, while retaining comparable adequacy. We also perform a manual analysis which explores the tradeoff between adequacy and fluency in the case where all sentence lengths are considered.

pdf bib
On the Systematicity of Probing Contextualized Word Representations : The Case of Hypernymy in BERTBERT
Abhilasha Ravichander | Eduard Hovy | Kaheer Suleman | Adam Trischler | Jackie Chi Kit Cheung

Contextualized word representations have become a driving force in NLP, motivating widespread interest in understanding their capabilities and the mechanisms by which they operate. Particularly intriguing is their ability to identify and encode conceptual abstractions. Past work has probed BERT representations for this competence, finding that BERT can correctly retrieve noun hypernyms in cloze tasks. In this work, we ask the question : do probing studies shed light on systematic knowledge in BERT representations? As a case study, we examine hypernymy knowledge encoded in BERT representations. In particular, we demonstrate through a simple consistency probe that the ability to correctly retrieve hypernyms in cloze tasks, as used in prior work, does not correspond to systematic knowledge in BERT. Our main conclusion is cautionary : even if BERT demonstrates high probing accuracy for a particular competence, it does not necessarily follow that BERT ‘understands’ a concept, and it can not be expected to systematically generalize across applicable contexts.do probing studies shed light on systematic knowledge in BERT representations? As a case study, we examine hypernymy knowledge encoded in BERT representations. In particular, we demonstrate through a simple consistency probe that the ability to correctly retrieve hypernyms in cloze tasks, as used in prior work, does not correspond to systematic knowledge in BERT. Our main conclusion is cautionary: even if BERT demonstrates high probing accuracy for a particular competence, it does not necessarily follow that BERT ‘understands’ a concept, and it cannot be expected to systematically generalize across applicable contexts.

pdf bib
PISA : A measure of Preference In Selection of Arguments to model verb argument recoverabilityPISA: A measure of Preference In Selection of Arguments to model verb argument recoverability
Giulia Cappelli | Alessandro Lenci

Our paper offers a computational model of the semantic recoverability of verb arguments, tested in particular on direct objects and Instruments. Our fully distributional model is intended to improve on older taxonomy-based models, which require a lexicon in addition to the training corpus. We computed the selectional preferences of 99 transitive verbs and 173 Instrument verbs as the mean value of the pairwise cosines between their arguments (a weighted mean between all the arguments, or an unweighted mean with the topmost k arguments). Results show that our model can predict the recoverability of objects and Instruments, providing a similar result to that of taxonomy-based models but at a much cheaper computational cost.

pdf bib
Learning Negation Scope from Syntactic Structure
Nick McKenna | Mark Steedman

We present a semi-supervised model which learns the semantics of negation purely through analysis of syntactic structure. Linguistic theory posits that the semantics of negation can be understood purely syntactically, though recent research relies on combining a variety of features including part-of-speech tags, word embeddings, and semantic representations to achieve high task performance. Our simplified model returns to syntactic theory and achieves state-of-the-art performance on the task of Negation Scope Detection while demonstrating the tight relationship between the syntax and semantics of negation.