Lexical and Computational Semantics and Semantic Evaluation (formerly Workshop on Sense Evaluation) (2019)


up

pdf (full)
bib (full)
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

pdf bib
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)
Rada Mihalcea | Ekaterina Shutova | Lun-Wei Ku | Kilian Evang | Soujanya Poria

pdf bib
SURel : A Gold Standard for Incorporating Meaning Shifts into Term ExtractionSURel: A Gold Standard for Incorporating Meaning Shifts into Term Extraction
Anna Hätty | Dominik Schlechtweg | Sabine Schulte im Walde

We introduce SURel, a novel dataset with human-annotated meaning shifts between general-language and domain-specific contexts. We show that meaning shifts of term candidates cause errors in term extraction, and demonstrate that the SURel annotation reflects these errors. Furthermore, we illustrate that SURel enables us to assess optimisations of term extraction techniques when incorporating meaning shifts.

pdf bib
Second-order contexts from lexical substitutes for few-shot learning of word representations
Qianchu Liu | Diana McCarthy | Anna Korhonen

There is a growing awareness of the need to handle rare and unseen words in word representation modelling. In this paper, we focus on few-shot learning of emerging concepts that fully exploits only a few available contexts. We introduce a substitute-based context representation technique that can be applied on an existing word embedding space. Previous context-based approaches to modelling unseen words only consider bag-of-word first-order contexts, whereas our method aggregates contexts as second-order substitutes that are produced by a sequence-aware sentence completion model. We experimented with three tasks that aim to test the modelling of emerging concepts. We found that these tasks show different emphasis on first and second order contexts, and our substitute-based method achieves superior performance on naturally-occurring contexts from corpora.

pdf bib
Pre-trained Contextualized Character Embeddings Lead to Major Improvements in Time Normalization : a Detailed Analysis
Dongfang Xu | Egoitz Laparra | Steven Bethard

Recent studies have shown that pre-trained contextual word embeddings, which assign the same word different vectors in different contexts, improve performance in many tasks. But while contextual embeddings can also be trained at the character level, the effectiveness of such embeddings has not been studied. We derive character-level contextual embeddings from Flair (Akbik et al., 2018), and apply them to a time normalization task, yielding major performance improvements over the previous state-of-the-art : 51 % error reduction in news and 33 % in clinical notes. We analyze the sources of these improvements, and find that pre-trained contextual character embeddings are more robust to term variations, infrequent terms, and cross-domain changes. We also quantify the size of context that pre-trained contextual character embeddings take advantage of, and show that such embeddings capture features like part-of-speech and capitalization.

pdf bib
Bot2Vec : Learning Representations of ChatbotsBot2Vec: Learning Representations of Chatbots
Jonathan Herzig | Tommy Sandbank | Michal Shmueli-Scheuer | David Konopnicki

Chatbots (i.e., bots) are becoming widely used in multiple domains, along with supporting bot programming platforms. These platforms are equipped with novel testing tools aimed at improving the quality of individual chatbots. Doing so requires an understanding of what sort of bots are being built (captured by their underlying conversation graphs) and how well they perform (derived through analysis of conversation logs). In this paper, we propose a new model, Bot2Vec, that embeds bots to a compact representation based on their structure and usage logs. Then, we utilize Bot2Vec representations to improve the quality of two bot analysis tasks. Using conversation data and graphs of over than 90 bots, we show that Bot2Vec representations improve detection performance by more than 16 % for both tasks.

pdf bib
A Semantic Cover Approach for Topic Modeling
Rajagopal Venkatesaramani | Doug Downey | Bradley Malin | Yevgeniy Vorobeychik

We introduce a novel topic modeling approach based on constructing a semantic set cover for clusters of similar documents. Specifically, our approach first clusters documents using their Tf-Idf representation, and then covers each cluster with a set of topic words based on semantic similarity, defined in terms of a word embedding. Computing a topic cover amounts to solving a minimum set cover problem. Our evaluation compares our topic modeling approach to Latent Dirichlet Allocation (LDA) on three metrics : 1) qualitative topic match, measured using evaluations by Amazon Mechanical Turk (MTurk) workers, 2) performance on classification tasks using each topic model as a sparse feature representation, and 3) topic coherence. We find that qualitative judgments significantly favor our approach, the method outperforms LDA on topic coherence, and is comparable to LDA on document classification tasks.

pdf bib
MCScript2.0 : A Machine Comprehension Corpus Focused on Script Events and ParticipantsMCScript2.0: A Machine Comprehension Corpus Focused on Script Events and Participants
Simon Ostermann | Michael Roth | Manfred Pinkal

We introduce MCScript2.0, a machine comprehension corpus for the end-to-end evaluation of script knowledge. MCScript2.0 contains approx. 20,000 questions on approx. 3,500 texts, crowdsourced based on a new collection process that results in challenging questions. Half of the questions can not be answered from the reading texts, but require the use of commonsense and, in particular, script knowledge. We give a thorough analysis of our corpus and show that while the task is not challenging to humans, existing machine comprehension models fail to perform well on the data, even if they make use of a commonsense knowledge base. The dataset is available at http://www.sfb1102. uni-saarland.de/?page_id=2582

pdf bib
Deconstructing multimodality : visual properties and visual context in human semantic processing
Christopher Davis | Luana Bulat | Anita Lilla Vero | Ekaterina Shutova

Multimodal semantic models that extend linguistic representations with additional perceptual input have proved successful in a range of natural language processing (NLP) tasks. Recent research has successfully used neural methods to automatically create visual representations for words. However, these works have extracted visual features from complete images, and have not examined how different kinds of visual information impact performance. In contrast, we construct multimodal models that differentiate between internal visual properties of the objects and their external visual context. We evaluate the models on the task of decoding brain activity associated with the meanings of nouns, demonstrating their advantage over those based on complete images.

pdf bib
Neural User Factor Adaptation for Text Classification : Learning to Generalize Across Author Demographics
Xiaolei Huang | Michael J. Paul

Language use varies across different demographic factors, such as gender, age, and geographic location. However, most existing document classification methods ignore demographic variability. In this study, we examine empirically how text data can vary across four demographic factors : gender, age, country, and region. We propose a multitask neural model to account for demographic variations via adversarial training. In experiments on four English-language social media datasets, we find that classification performance improves when adapting for user factors.

pdf bib
Abstract Graphs and Abstract Paths for Knowledge Graph Completion
Vivi Nastase | Bhushan Kotnis

Knowledge graphs, which provide numerous facts in a machine-friendly format, are incomplete. Information that we induce from such graphs e.g. entity embeddings, relation representations or patterns will be affected by the imbalance in the information captured in the graph by biasing representations, or causing us to miss potential patterns. To partially compensate for this situation we describe a method for representing knowledge graphs that capture an intensional representation of the original extensional information. This representation is very compact, and it abstracts away from individual links, allowing us to find better path candidates, as shown by the results of link prediction using this information.

pdf bib
Enthymemetic Conditionals
Eimear Maguire

To model conditionals in a way that reflects their acceptability, we must include some means of making judgements about whether antecedent and consequent are meaningfully related or not. Enthymemes are non-logical arguments which do not hold up by themselves, but are acceptable through their relation to a topos, an already-known general principle or pattern for reasoning. This paper uses enthymemes and topoi as a way to model the world-knowledge behind these judgements. In doing so, it provides a reformalisation (in TTR) of enthymemes and topoi as networks rather than functions, and information state update rules for conditionals.

pdf bib
Acquiring Structured Temporal Representation via Crowdsourcing : A Feasibility Study
Yuchen Zhang | Nianwen Xue

Temporal Dependency Trees are a structured temporal representation that represents temporal relations among time expressions and events in a text as a dependency tree structure. Compared to traditional pair-wise temporal relation representations, temporal dependency trees facilitate efficient annotations, higher inter-annotator agreement, and efficient computations. However, annotations on temporal dependency trees so far have only been done by expert annotators, which is costly and time-consuming. In this paper, we introduce a method to crowdsource temporal dependency tree annotations, and show that this representation is intuitive and can be collected with high accuracy and agreement through crowdsourcing. We produce a corpus of temporal dependency trees, and present a baseline temporal dependency parser, trained and evaluated on this new corpus.

pdf bib
Improving Generalization in Coreference Resolution via Adversarial Training
Sanjay Subramanian | Dan Roth

In order for coreference resolution systems to be useful in practice, they must be able to generalize to new text. In this work, we demonstrate that the performance of the state-of-the-art system decreases when the names of PER and GPE named entities in the CoNLL dataset are changed to names that do not occur in the training set. We use the technique of adversarial gradient-based training to retrain the state-of-the-art system and demonstrate that the retrained system achieves higher performance on the CoNLL dataset (both with and without the change of named entities) and the GAP dataset.

pdf bib
Improving Human Needs Categorization of Events with Semantic Classification
Haibo Ding | Ellen Riloff | Zhe Feng

Human Needs categories have been used to characterize the reason why an affective event is positive or negative. For example, I got the flu and I got fired are both negative (undesirable) events, but getting the flu is a Health problem while getting fired is a Financial problem. Previous work created learning models to assign events to Human Needs categories based on their words and contexts. In this paper, we introduce an intermediate step that assigns words to relevant semantic concepts. We create lightly supervised models that learn to label words with respect to 10 semantic concepts associated with Human Needs categories, and incorporate these labels as features for event categorization. Our results show that recognizing relevant semantic concepts improves both the recall and precision of Human Needs categorization for events.

pdf bib
Automatic Accuracy Prediction for AMR ParsingAMR Parsing
Juri Opitz | Anette Frank

Abstract Meaning Representation (AMR) represents sentences as directed, acyclic and rooted graphs, aiming at capturing their meaning in a machine readable format. AMR parsing converts natural language sentences into such graphs. However, evaluating a parser on new data by means of comparison to manually created AMR graphs is very costly. Also, we would like to be able to detect parses of questionable quality, or preferring results of alternative systems by selecting the ones for which we can assess good quality. We propose AMR accuracy prediction as the task of predicting several metrics of correctness for an automatically generated AMR parse in absence of the corresponding gold parse. We develop a neural end-to-end multi-output regression model and perform three case studies : firstly, we evaluate the model’s capacity of predicting AMR parse accuracies and test whether it can reliably assign high scores to gold parses. Secondly, we perform parse selection based on predicted parse accuracies of candidate parses from alternative systems, with the aim of improving overall results. Finally, we predict system ranks for submissions from two AMR shared tasks on the basis of their predicted parse accuracy averages. All experiments are carried out across two different domains and show that our method is effective.

pdf bib
An Argument-Marker Model for Syntax-Agnostic Proto-Role Labeling
Juri Opitz | Anette Frank

Semantic proto-role labeling (SPRL) is an alternative to semantic role labeling (SRL) that moves beyond a categorical definition of roles, following Dowty’s feature-based view of proto-roles. This theory determines agenthood vs. patienthood based on a participant’s instantiation of more or less typical agent vs. patient properties, such as, for example, volition in an event. To perform SPRL, we develop an ensemble of hierarchical models with self-attention and concurrently learned predicate-argument markers. Our method is competitive with the state-of-the art, overall outperforming previous work in two formulations of the task (multi-label and multi-variate Likert scale pre- diction). In contrast to previous work, our results do not depend on gold argument heads derived from supplementary gold tree banks.

pdf bib
Bayesian Inference Semantics : A Modelling System and A Test SuiteBayesian Inference Semantics: A Modelling System and A Test Suite
Jean-Philippe Bernardy | Rasmus Blanck | Stergios Chatzikyriakidis | Shalom Lappin | Aleksandre Maskharashvili

We present BIS, a Bayesian Inference Semantics, for probabilistic reasoning in natural language. The current system is based on the framework of Bernardy et al. (2018), but departs from it in important respects. BIS makes use of Bayesian learning for inferring a hypothesis from premises. This involves estimating the probability of the hypothesis, given the data supplied by the premises of an argument. It uses a syntactic parser to generate typed syntactic structures that serve as input to a model generation system. Sentences are interpreted compositionally to probabilistic programs, and the corresponding truth values are estimated using sampling methods. BIS successfully deals with various probabilistic semantic phenomena, including frequency adverbs, generalised quantifiers, generics, and vague predicates. It performs well on a number of interesting probabilistic reasoning tasks. It also sustains most classically valid inferences (instantiation, de Morgan’s laws, etc.). To test BIS we have built an experimental test suite with examples of a range of probabilistic and classical inference patterns.

pdf bib
Incivility Detection in Online Comments
Farig Sadeque | Stephen Rains | Yotam Shmargad | Kate Kenski | Kevin Coe | Steven Bethard

Incivility in public discourse has been a major concern in recent times as it can affect the quality and tenacity of the discourse negatively. In this paper, we present neural models that can learn to detect name-calling and vulgarity from a newspaper comment section. We show that in contrast to prior work on detecting toxic language, fine-grained incivilities like namecalling can not be accurately detected by simple models like logistic regression. We apply the models trained on the newspaper comments data to detect uncivil comments in a Russian troll dataset, and find that despite the change of domain, the model makes accurate predictions.

pdf bib
Generating Animations from Screenplays
Yeyao Zhang | Eleftheria Tsipidi | Sasha Schriber | Mubbasir Kapadia | Markus Gross | Ashutosh Modi

Automatically generating animation from natural language text finds application in a number of areas e.g. movie script writing, instructional videos, and public safety. However, translating natural language text into animation is a challenging task. Existing text-to-animation systems can handle only very simple sentences, which limits their applications. In this paper, we develop a text-to-animation system which is capable of handling complex sentences. We achieve this by introducing a text simplification step into the process. Building on an existing animation generation system for screenwriting, we create a robust NLP pipeline to extract information from screenplays and map them to the system’s knowledge base. We develop a set of linguistic transformation rules that simplify complex sentences. Information extracted from the simplified sentences is used to generate a rough storyboard and video depicting the text. Our sentence simplification module outperforms existing systems in terms of BLEU and SARI metrics. We further evaluated our system via a user study : 68 % participants believe that our system generates reasonable animation from input screenplays.

up

pdf (full)
bib (full)
Proceedings of the 13th International Workshop on Semantic Evaluation

pdf bib
Proceedings of the 13th International Workshop on Semantic Evaluation
Jonathan May | Ekaterina Shutova | Aurelie Herbelot | Xiaodan Zhu | Marianna Apidianaki | Saif M. Mohammad

pdf bib
SemEval-2019 Task 1 : Cross-lingual Semantic Parsing with UCCASemEval-2019 Task 1: Cross-lingual Semantic Parsing with UCCA
Daniel Hershcovich | Zohar Aizenbud | Leshem Choshen | Elior Sulem | Ari Rappoport | Omri Abend

We present the SemEval 2019 shared task on Universal Conceptual Cognitive Annotation (UCCA) parsing in English, German and French, and discuss the participating systems and results. UCCA is a cross-linguistically applicable framework for semantic representation, which builds on extensive typological work and supports rapid annotation. UCCA poses a challenge for existing parsing techniques, as it exhibits reentrancy (resulting in DAG structures), discontinuous structures and non-terminal nodes corresponding to complex semantic units. The shared task has yielded improvements over the state-of-the-art baseline in all languages and settings. Full results can be found in the task’s website.https://competitions.codalab.org/competitions/19160.

pdf bib
ANA at SemEval-2019 Task 3 : Contextual Emotion detection in Conversations through hierarchical LSTMs and BERTANA at SemEval-2019 Task 3: Contextual Emotion detection in Conversations through hierarchical LSTMs and BERT
Chenyang Huang | Amine Trabelsi | Osmar Zaïane

This paper describes the system submitted by ANA Team for the SemEval-2019 Task 3 : EmoContext. We propose a novel Hierarchi- cal LSTMs for Contextual Emotion Detection (HRLCE) model. It classifies the emotion of an utterance given its conversational con- text. The results show that, in this task, our HRCLE outperforms the most recent state-of- the-art text classification framework : BERT. We combine the results generated by BERT and HRCLE to achieve an overall score of 0.7709 which ranked 5th on the final leader board of the competition among 165 Teams.

pdf bib
SemEval-2019 Task 5 : Multilingual Detection of Hate Speech Against Immigrants and Women in TwitterSemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter
Valerio Basile | Cristina Bosco | Elisabetta Fersini | Debora Nozza | Viviana Patti | Francisco Manuel Rangel Pardo | Paolo Rosso | Manuela Sanguinetti

The paper describes the organization of the SemEval 2019 Task 5 about the detection of hate speech against immigrants and women in Spanish and English messages extracted from Twitter. The task is organized in two related classification subtasks : a main binary subtask for detecting the presence of hate speech, and a finer-grained one devoted to identifying further features in hateful contents such as the aggressive attitude and the target harassed, to distinguish if the incitement is against an individual rather than a group. HatEval has been one of the most popular tasks in SemEval-2019 with a total of 108 submitted runs for Subtask A and 70 runs for Subtask B, from a total of 74 different teams. Data provided for the task are described by showing how they have been collected and annotated. Moreover, the paper provides an analysis and discussion about the participant systems and the results they achieved in both subtasks.

pdf bib
SemEval-2019 Task 6 : Identifying and Categorizing Offensive Language in Social Media (OffensEval)SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)
Marcos Zampieri | Shervin Malmasi | Preslav Nakov | Sara Rosenthal | Noura Farra | Ritesh Kumar

We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval). The task was based on a new dataset, the Offensive Language Identification Dataset (OLID), which contains over 14,000 English tweets, and it featured three sub-tasks. In sub-task A, systems were asked to discriminate between offensive and non-offensive posts. In sub-task B, systems had to identify the type of offensive content in the post. Finally, in sub-task C, systems had to detect the target of the offensive posts. OffensEval attracted a large number of participants and it was one of the most popular tasks in SemEval-2019. In total, nearly 800 teams signed up to participate in the task and 115 of them submitted results, which are presented and analyzed in this report.

pdf bib
DANGNT@UIT.VNU-HCM at SemEval 2019 Task 1 : Graph Transformation System from Stanford Basic Dependencies to Universal Conceptual Cognitive Annotation (UCCA)DANGNT@UIT.VNU-HCM at SemEval 2019 Task 1: Graph Transformation System from Stanford Basic Dependencies to Universal Conceptual Cognitive Annotation (UCCA)
Dang Tuan Nguyen | Trung Tran

This paper describes the graph transfor-mation system (GT System) for SemEval 2019 Task 1 : Cross-lingual Semantic Parsing with Universal Conceptual Cognitive Annotation (UCCA)1. The input of GT System is a pair of text and its unannotated xml, which is a layer 0 part of UCCA form. The output of GT System is the corresponding full UCCA xml. Based on the idea of graph illustration and transformation, we perform four main tasks when building GT System. At the first task, we illustrate the graph form of stanford dependencies2 of input text. We then transform into an intermediate graph in the second task. At the third task, we continue to transform into ouput graph form. Finally, we create the output UCCA xml. The evaluation results show that our method generates good-quality UCCA xml and has a meaningful contribution to the semantic represetation sub-field in Natural Language Processing.

pdf bib
MaskParse@Deskin at SemEval-2019 Task 1 : Cross-lingual UCCA Semantic Parsing using Recursive Masked Sequence TaggingMaskParse@Deskin at SemEval-2019 Task 1: Cross-lingual UCCA Semantic Parsing using Recursive Masked Sequence Tagging
Gabriel Marzinotto | Johannes Heinecke | Géraldine Damnati

This paper describes our recursive system for SemEval-2019 Task 1 : Cross-lingual Semantic Parsing with UCCA. Each recursive step consists of two parts. We first perform semantic parsing using a sequence tagger to estimate the probabilities of the UCCA categories in the sentence. Then, we apply a decoding policy which interprets these probabilities and builds the graph nodes. Parsing is done recursively, we perform a first inference on the sentence to extract the main scenes and links and then we recursively apply our model on the sentence using a masking features that reflects the decisions made in previous steps. Process continues until the terminal nodes are reached. We chose a standard neural tagger and we focus on our recursive parsing strategy and on the cross lingual transfer problem to develop a robust model for the French language, using only few training samples

pdf bib
UC Davis at SemEval-2019 Task 1 : DAG Semantic Parsing with Attention-based DecoderUC Davis at SemEval-2019 Task 1: DAG Semantic Parsing with Attention-based Decoder
Dian Yu | Kenji Sagae

We present an encoder-decoder model for semantic parsing with UCCA SemEval 2019 Task 1. The encoder is a Bi-LSTM and the decoder uses recursive self-attention. The proposed model alleviates challenges and feature engineering in traditional transition-based and graph-based parsers. The resulting parser is simple and proved to effective on the semantic parsing task.

pdf bib
BrainEE at SemEval-2019 Task 3 : Ensembling Linear Classifiers for Emotion PredictionBrainEE at SemEval-2019 Task 3: Ensembling Linear Classifiers for Emotion Prediction
Vachagan Gratian

The paper describes an ensemble of linear perceptrons trained for emotion classification as part of the SemEval-2019 shared-task 3. The model uses a matrix of probabilities to weight the activations of the base-classifiers and makes a final prediction using the sum rule. The base-classifiers are multi-class perceptrons utilizing character and word n-grams, part-of-speech tags and sentiment polarity scores. The results of our experiments indicate that the ensemble outperforms the base-classifiers, but only marginally. In the best scenario our model attains an F-Micro score of 0.672, whereas the base-classifiers attained scores ranging from 0.636 to 0.666.

pdf bib
CAiRE_HKUST at SemEval-2019 Task 3 : Hierarchical Attention for Dialogue Emotion ClassificationCAiRE_HKUST at SemEval-2019 Task 3: Hierarchical Attention for Dialogue Emotion Classification
Genta Indra Winata | Andrea Madotto | Zhaojiang Lin | Jamin Shin | Yan Xu | Peng Xu | Pascale Fung

Detecting emotion from dialogue is a challenge that has not yet been extensively surveyed. One could consider the emotion of each dialogue turn to be independent, but in this paper, we introduce a hierarchical approach to classify emotion, hypothesizing that the current emotional state depends on previous latent emotions. We benchmark several feature-based classifiers using pre-trained word and emotion embeddings, state-of-the-art end-to-end neural network models, and Gaussian processes for automatic hyper-parameter search. In our experiments, hierarchical architectures consistently give significant improvements, and our best model achieves a 76.77 % F1-score on the test set.

pdf bib
CLaC Lab at SemEval-2019 Task 3 : Contextual Emotion Detection Using a Combination of Neural Networks and SVMCLaC Lab at SemEval-2019 Task 3: Contextual Emotion Detection Using a Combination of Neural Networks and SVM
Elham Mohammadi | Hessam Amini | Leila Kosseim

This paper describes our system at SemEval 2019, Task 3 (EmoContext), which focused on the contextual detection of emotions in a dataset of 3-round dialogues. For our final system, we used a neural network with pretrained ELMo word embeddings and POS tags as input, GRUs as hidden units, an attention mechanism to capture representations of the dialogues, and an SVM classifier which used the learned network representations to perform the task of multi-class classification. This system yielded a micro-averaged F1 score of 0.7072 for the three emotion classes, improving the baseline by approximately 12 %.

pdf bib
CLARK at SemEval-2019 Task 3 : Exploring the Role of Context to Identify Emotion in a Short ConversationCLARK at SemEval-2019 Task 3: Exploring the Role of Context to Identify Emotion in a Short Conversation
Joseph Cummings | Jason Wilson

With text lacking valuable information avail-able in other modalities, context may provide useful information to better detect emotions. In this paper, we do a systematic exploration of the role of context in recognizing emotion in a conversation. We use a Naive Bayes model to show that inferring the mood of the conversation before classifying individual utterances leads to better performance. Additionally, we find that using context while train-ing the model significantly decreases performance. Our approach has the additional bene-fit that its performance rivals a baseline LSTM model while requiring fewer resources.

pdf bib
CoAStaL at SemEval-2019 Task 3 : Affect Classification in Dialogue using Attentive BiLSTMsCoAStaL at SemEval-2019 Task 3: Affect Classification in Dialogue using Attentive BiLSTMs
Ana Valeria González | Victor Petrén Bach Hansen | Joachim Bingel | Anders Søgaard

This work describes the system presented by the CoAStaL Natural Language Processing group at University of Copenhagen. The main system we present uses the same attention mechanism presented in (Yang et al., 2016). Our overall model architecture is also inspired by their hierarchical classification model and adapted to deal with classification in dialogue by encoding information at the turn level. We use different encodings for each turn to create a more expressive representation of dialogue context which is then fed into our classifier. We also define a custom preprocessing step in order to deal with language commonly used in interactions across many social media outlets. Our proposed system achieves a micro F1 score of 0.7340 on the test set and shows significant gains in performance compared to a system using dialogue level encoding.

pdf bib
CX-ST-RNM at SemEval-2019 Task 3 : Fusion of Recurrent Neural Networks Based on Contextualized and Static Word Representations for Contextual Emotion DetectionCX-ST-RNM at SemEval-2019 Task 3: Fusion of Recurrent Neural Networks Based on Contextualized and Static Word Representations for Contextual Emotion Detection
Michał Perełkiewicz

In this paper, I describe a fusion model combining contextualized and static word representations for approaching the EmoContext task in the SemEval 2019 competition. The model is based on two Recurrent Neural Networks, the first one is fed with a state-of-the-art ELMo deep contextualized word representation and the second one is fed with a static Word2Vec embedding augmented with 10-dimensional affective word feature vector. The proposed model is compared with two baseline models based on a static word representation and a contextualized word representation, separately. My approach achieved officially 0.7278 microaveraged F1 score on the test dataset, ranking 47th out of 165 participants.

pdf bib
E-LSTM at SemEval-2019 Task 3 : Semantic and Sentimental Features Retention for Emotion Detection in TextE-LSTM at SemEval-2019 Task 3: Semantic and Sentimental Features Retention for Emotion Detection in Text
Harsh Patel

This paper discusses the solution to the problem statement of the SemEval19 : EmoContext competition which is Contextual Emotion Detection in Texts. The paper includes the explanation of an architecture that I created by exploiting the embedding layers of Word2Vec and GloVe using LSTMs as memory unit cells which detects approximate emotion of chats between two people in the English language provided in the textual form. The set of emotions on which the model was trained was Happy, Sad, Angry and Others. The paper also includes an analysis of different conventional machine learning algorithms in comparison to E-LSTM.

pdf bib
ELiRF-UPV at SemEval-2019 Task 3 : Snapshot Ensemble of Hierarchical Convolutional Neural Networks for Contextual Emotion DetectionELiRF-UPV at SemEval-2019 Task 3: Snapshot Ensemble of Hierarchical Convolutional Neural Networks for Contextual Emotion Detection
José-Ángel González | Lluís-F. Hurtado | Ferran Pla

This paper describes the approach developed by the ELiRF-UPV team at SemEval 2019 Task 3 : Contextual Emotion Detection in Text. We have developed a Snapshot Ensemble of 1D Hierarchical Convolutional Neural Networks to extract features from 3-turn conversations in order to perform contextual emotion detection in text. This Snapshot Ensemble is obtained by averaging the models selected by a Genetic Algorithm that optimizes the evaluation measure. The proposed ensemble obtains better results than a single model and it obtains competitive and promising results on Contextual Emotion Detection in Text.

pdf bib
EPITA-ADAPT at SemEval-2019 Task 3 : Detecting emotions in textual conversations using deep learning models combinationEPITA-ADAPT at SemEval-2019 Task 3: Detecting emotions in textual conversations using deep learning models combination
Abdessalam Bouchekif | Praveen Joshi | Latifa Bouchekif | Haithem Afli

Messaging platforms like WhatsApp, Facebook Messenger and Twitter have gained recently much popularity owing to their ability in connecting users in real-time. The content of these textual messages can be a useful resource for text mining to discover and unhide various aspects, including emotions. In this paper we present our submission for SemEval 2019 task ‘EmoContext’. The task consists of classifying a given textual dialogue into one of four emotion classes : Angry, Happy, Sad and Others. Our proposed system is based on the combination of different deep neural networks techniques. In particular, we use Recurrent Neural Networks (LSTM, B-LSTM, GRU, B-GRU), Convolutional Neural Network (CNN) and Transfer Learning (TL) methodes. Our final system, achieves an F1 score of 74.51 % on the subtask evaluation dataset.

pdf bib
Figure Eight at SemEval-2019 Task 3 : Ensemble of Transfer Learning Methods for Contextual Emotion DetectionSemEval-2019 Task 3: Ensemble of Transfer Learning Methods for Contextual Emotion Detection
Joan Xiao

This paper describes our transfer learning-based approach to contextual emotion detection as part of SemEval-2019 Task 3. We experiment with transfer learning using pre-trained language models (ULMFiT, OpenAI GPT, and BERT) and fine-tune them on this task. We also train a deep learning model from scratch using pre-trained word embeddings and BiLSTM architecture with attention mechanism. The ensembled model achieves competitive result, ranking ninth out of 165 teams. The result reveals that ULMFiT performs best due to its superior fine-tuning techniques. We propose improvements for future work.

pdf bib
LIRMM-Advanse at SemEval-2019 Task 3 : Attentive Conversation Modeling for Emotion Detection and ClassificationLIRMM-Advanse at SemEval-2019 Task 3: Attentive Conversation Modeling for Emotion Detection and Classification
Waleed Ragheb | Jérôme Azé | Sandra Bringay | Maximilien Servajean

This paper addresses the problem of modeling textual conversations and detecting emotions. Our proposed model makes use of 1) deep transfer learning rather than the classical shallow methods of word embedding ; 2) self-attention mechanisms to focus on the most important parts of the texts and 3) turn-based conversational modeling for classifying the emotions. The approach does not rely on any hand-crafted features or lexicons. Our model was evaluated on the data provided by the SemEval-2019 shared task on contextual emotion detection in text. The model shows very competitive results.

pdf bib
MoonGrad at SemEval-2019 Task 3 : Ensemble BiRNNs for Contextual Emotion Detection in DialoguesMoonGrad at SemEval-2019 Task 3: Ensemble BiRNNs for Contextual Emotion Detection in Dialogues
Chandrakant Bothe | Stefan Wermter

When reading I do n’t want to talk to you any more, we might interpret this as either an angry or a sad emotion in the absence of context. Often, the utterances are shorter, and given a short utterance like Me too !, it is difficult to interpret the emotion without context. The lack of prosodic or visual information makes it a challenging problem to detect such emotions only with text. However, using contextual information in the dialogue is gaining importance to provide a context-aware recognition of linguistic features such as emotion, dialogue act, sentiment etc. The SemEval 2019 Task 3 EmoContext competition provides a dataset of three-turn dialogues labeled with the three emotion classes, i.e. Happy, Sad and Angry, and in addition with Others as none of the aforementioned emotion classes. We develop an ensemble of the recurrent neural model with character- and word-level features as an input to solve this problem. The system performs quite well, achieving a microaveraged F1 score (F1) of 0.7212 for the three emotion classes.

pdf bib
NELEC at SemEval-2019 Task 3 : Think Twice Before Going DeepNELEC at SemEval-2019 Task 3: Think Twice Before Going Deep
Parag Agrawal | Anshuman Suri

Existing Machine Learning techniques yield close to human performance on text-based classification tasks. However, the presence of multi-modal noise in chat data such as emoticons, slang, spelling mistakes, code-mixed data, etc. makes existing deep-learning solutions perform poorly. The inability of deep-learning systems to robustly capture these covariates puts a cap on their performance. We propose NELEC : Neural and Lexical Combiner, a system which elegantly combines textual and deep-learning based methods for sentiment classification. We evaluate our system as part of the third task of ‘Contextual Emotion Detection in Text’ as part of SemEval-2019. Our system performs significantly better than the baseline, as well as our deep-learning model benchmarks. It achieved a micro-averaged F1 score of 0.7765, ranking 3rd on the test-set leader-board. Our code is available at https://github.com/iamgroot42/nelec

pdf bib
NTUA-ISLab at SemEval-2019 Task 3 : Determining emotions in contextual conversations with deep learningNTUA-ISLab at SemEval-2019 Task 3: Determining emotions in contextual conversations with deep learning
Rolandos Alexandros Potamias | Georgios Siolas

Sentiment analysis (SA) in texts is a well-studied Natural Language Processing task, which in nowadays gains popularity due to the explosion of social media, and the subsequent accumulation of huge amounts of related data. However, capturing emotional states and the sentiment polarity of written excerpts requires knowledge on the events triggering them. Towards this goal, we present a computational end-to-end context-aware SA methodology, which was competed in the context of the SemEval-2019 / EmoContext task (Task 3). The proposed system is founded on the combination of two neural architectures, a deep recurrent neural network, structured by an attentive Bidirectional LSTM, and a deep dense network (DNN). The system achieved 0.745 micro f1-score, and ranked 26/165 (top 20 %) teams among the official task submissions.

pdf bib
PKUSE at SemEval-2019 Task 3 : Emotion Detection with Emotion-Oriented Neural Attention NetworkPKUSE at SemEval-2019 Task 3: Emotion Detection with Emotion-Oriented Neural Attention Network
Luyao Ma | Long Zhang | Wei Ye | Wenhui Hu

This paper presents the system in SemEval-2019 Task 3, EmoContext : Contextual Emotion Detection in Text. We propose a deep learning architecture with bidirectional LSTM networks, augmented with an emotion-oriented attention network that is capable of extracting emotion information from an utterance. Experimental results show that our model outperforms its variants and the baseline. Overall, this system has achieved 75.57 % for the microaveraged F1 score.

pdf bib
Podlab at SemEval-2019 Task 3 : The Importance of Being ShallowSemEval-2019 Task 3: The Importance of Being Shallow
Andrew Nguyen | Tobin South | Nigel Bean | Jonathan Tuke | Lewis Mitchell

This paper describes our linear SVM system for emotion classification from conversational dialogue, entered in SemEval2019 Task 3. We used off-the-shelf tools coupled with feature engineering and parameter tuning to create a simple, interpretable, yet high-performing, classification model. Our system achieves a micro F1 score of 0.7357, which is 92 % of the top score for the competition, demonstrating that shallow classification approaches can perform well when coupled with detailed fea- ture selection and statistical analysis.

pdf bib
SCIA at SemEval-2019 Task 3 : Sentiment Analysis in Textual Conversations Using Deep LearningSCIA at SemEval-2019 Task 3: Sentiment Analysis in Textual Conversations Using Deep Learning
Zinedine Rebiai | Simon Andersen | Antoine Debrenne | Victor Lafargue

In this paper we present our submission for SemEval-2019 Task 3 : EmoContext. The task consisted of classifying a textual dialogue into one of four emotion classes : happy, sad, angry or others. Our approach tried to improve on multiple aspects, preprocessing with an emphasis on spell-checking and ensembling with four different models : Bi-directional contextual LSTM (BC-LSTM), categorical Bi-LSTM (CAT-LSTM), binary convolutional Bi-LSTM (BIN-LSTM) and Gated Recurrent Unit (GRU). On the leader-board, we submitted two systems that obtained a micro F1 score (F1) of 0.711 and 0.712. After the competition, we merged our two systems with ensembling, which achieved a F1 of 0.7324 on the test dataset.

pdf bib
Sentim at SemEval-2019 Task 3 : Convolutional Neural Networks For Sentiment in ConversationsSemEval-2019 Task 3: Convolutional Neural Networks For Sentiment in Conversations
Jacob Anderson

In this work convolutional neural networks were used in order to determine the sentiment in a conversational setting. This paper’s contributions include a method for handling any sized input and a method for breaking down the conversation into separate parts for easier processing. Finally, clustering was shown to improve results and that such a model for handling sentiment in conversations is both fast and accurate.

pdf bib
TDBot at SemEval-2019 Task 3 : Context Aware Emotion Detection Using A Conditioned Classification ApproachTDBot at SemEval-2019 Task 3: Context Aware Emotion Detection Using A Conditioned Classification Approach
Sourabh Maity

With the system description it is shown how to use the context information while detecting the emotion in a dialogue. Some guidelines about how to handle emojis was also laid out. While developing this system I realized the importance of pre-processing in conversational text data, or in general NLP related tasks ; it can not be over emphasized.

pdf bib
THU-HCSI at SemEval-2019 Task 3 : Hierarchical Ensemble Classification of Contextual Emotion in ConversationTHU-HCSI at SemEval-2019 Task 3: Hierarchical Ensemble Classification of Contextual Emotion in Conversation
Xihao Liang | Ye Ma | Mingxing Xu

In this paper, we describe our hierarchical ensemble system designed for the SemEval-2019 task3, EmoContext. In our system, three sets of classifiers are trained for different sub-targets and the predicted labels of these base classifiers are combined through three steps of voting to make the final prediction. Effective details for developing base classifiers are highlighted.

pdf bib
TokyoTech_NLP at SemEval-2019 Task 3 : Emotion-related Symbols in Emotion DetectionTokyoTech_NLP at SemEval-2019 Task 3: Emotion-related Symbols in Emotion Detection
Zhishen Yang | Sam Vijlbrief | Naoaki Okazaki

This paper presents our contextual emotion detection system in approaching the SemEval2019 shared task 3 : EmoContext : Contextual Emotion Detection in Text. This system cooperates with an emotion detection neural network method (Poria et al., 2017), emoji2vec (Eisner et al., 2016) embedding, word2vec embedding (Mikolov et al., 2013), and our proposed emoticon and emoji preprocessing method. The experimental results demonstrate the usefulness of our emoticon and emoji prepossessing method, and representations of emoticons and emoji contribute model’s emotion detection.

pdf bib
UAIC at SemEval-2019 Task 3 : Extracting Much from LittleUAIC at SemEval-2019 Task 3: Extracting Much from Little
Cristian Simionescu | Ingrid Stoleru | Diana Lucaci | Gheorghe Balan | Iulian Bute | Adrian Iftene

In this paper, we present a system description for implementing a sentiment analysis agent capable of interpreting the state of an interlocutor engaged in short three message conversations. We present the results and observations of our work and which parts could be further improved in the future.

pdf bib
ABARUAH at SemEval-2019 Task 5 : Bi-directional LSTM for Hate Speech DetectionABARUAH at SemEval-2019 Task 5 : Bi-directional LSTM for Hate Speech Detection
Arup Baruah | Ferdous Barbhuiya | Kuntal Dey

In this paper, we present the results obtained using bi-directional long short-term memory (BiLSTM) with and without attention and Logistic Regression (LR) models for SemEval-2019 Task 5 titled HatEval : Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. This paper presents the results obtained for Subtask A for English language. The results of the BiLSTM and LR models are compared for two different types of preprocessing. One with no stemming performed and no stopwords removed. The other with stemming performed and stopwords removed. The BiLSTM model without attention performed the best for the first test, while the LR model with character n-grams performed the best for the second test. The BiLSTM model obtained an F1 score of 0.51 on the test set and obtained an official ranking of 8/71.

pdf bib
CIC at SemEval-2019 Task 5 : Simple Yet Very Efficient Approach to Hate Speech Detection, Aggressive Behavior Detection, and Target Classification in TwitterCIC at SemEval-2019 Task 5: Simple Yet Very Efficient Approach to Hate Speech Detection, Aggressive Behavior Detection, and Target Classification in Twitter
Iqra Ameer | Muhammad Hammad Fahim Siddiqui | Grigori Sidorov | Alexander Gelbukh

In recent years, the use of social media has in-creased incredibly. Social media permits Inter-net users a friendly platform to express their views and opinions. Along with these nice and distinct communication chances, it also allows bad things like usage of hate speech. Online automatic hate speech detection in various aspects is a significant scientific problem. This paper presents the Instituto Politcnico Nacional (Mexico) approach for the Semeval 2019 Task-5 [ Hateval 2019 ] (Basile et al., 2019) competition for Multilingual Detection of Hate Speech on Twitter. The goal of this paper is to detect (A) Hate speech against immigrants and women, (B) Aggressive behavior and target classification, both for English and Spanish. In the proposed approach, we used a bag of words model with preprocessing (stem-ming and stop words removal). We submitted two different systems with names : (i) CIC-1 and (ii) CIC-2 for Hateval 2019 shared task. We used TF values in the first system and TF-IDF for the second system. The first system, CIC-1 got 2nd rank in subtask B for both English and Spanish languages with EMR score of 0.568 for English and 0.675 for Spanish. The second system, CIC-2 was ranked 4th in sub-task A and 1st in subtask B for Spanish language with a macro-F1 score of 0.727 and EMR score of 0.705 respectively.

pdf bib
CiTIUS-COLE at SemEval-2019 Task 5 : Combining Linguistic Features to Identify Hate Speech Against Immigrants and Women on Multilingual TweetsCiTIUS-COLE at SemEval-2019 Task 5: Combining Linguistic Features to Identify Hate Speech Against Immigrants and Women on Multilingual Tweets
Sattam Almatarneh | Pablo Gamallo | Francisco J. Ribadas Pena

This article describes the strategy submitted by the CiTIUS-COLE team to SemEval 2019 Task 5, a task which consists of binary classi- fication where the system predicting whether a tweet in English or in Spanish is hateful against women or immigrants or not. The proposed strategy relies on combining linguis- tic features to improve the classifier’s perfor- mance. More precisely, the method combines textual and lexical features, embedding words with the bag of words in Term Frequency- Inverse Document Frequency (TF-IDF) repre- sentation. The system performance reaches about 81 % F1 when it is applied to the training dataset, but its F1 drops to 36 % on the official test dataset for the English and 64 % for the Spanish language concerning the hate speech class

pdf bib
Grunn2019 at SemEval-2019 Task 5 : Shared Task on Multilingual Detection of HateSemEval-2019 Task 5: Shared Task on Multilingual Detection of Hate
Mike Zhang | Roy David | Leon Graumans | Gerben Timmerman

Hate speech occurs more often than ever and polarizes society. To help counter this polarization, SemEval 2019 organizes a shared task called the Multilingual Detection of Hate. The first task (A) is to decide whether a given tweet contains hate against immigrants or women, in a multilingual perspective, for English and Spanish. In the second task (B), the system is also asked to classify the following sub-tasks : hateful tweets as aggressive or not aggressive, and to identify the target harassed as individual or generic. We evaluate multiple models, and finally combine them in an ensemble setting. This ensemble setting is built of five and three submodels for the English and Spanish task respectively. In the current setup it shows that using a bigger ensemble for English tweets performs mediocre, while a slightly smaller ensemble does work well for detecting hate speech in Spanish tweets. Our results on the test set for English show 0.378 macro F1 on task A and 0.553 macro F1 on task B. For Spanish the results are significantly higher, 0.701 macro F1 on task A and 0.734 macro F1 for task B.

pdf bib
HATERecognizer at SemEval-2019 Task 5 : Using Features and Neural Networks to Face Hate RecognitionHATERecognizer at SemEval-2019 Task 5: Using Features and Neural Networks to Face Hate Recognition
Victor Nina-Alcocer

This paper presents a detailed description of our participation in task 5 on SemEval-2019. This task consists of classifying English and Spanish tweets that contain hate towards women or immigrants. We carried out several experiments ; for a finer-grained study of the task, we analyzed different features and designing architectures of neural networks. Additionally, to face the lack of hate content in tweets, we include data augmentation as a technique to in- crease hate content in our datasets.

pdf bib
GL at SemEval-2019 Task 5 : Identifying hateful tweets with a deep learning approach.GL at SemEval-2019 Task 5: Identifying hateful tweets with a deep learning approach.
Gretel Liz De la Peña

This paper describes the system we developed for SemEval 2019 on Multilingual detection of hate speech against immigrants and women in Twitter (HatEval-Task 5). We use an approach based on an Attention-based Long Short-Term Memory Recurrent Neural Network. In particular, we build a Bidirectional LSTM to extract information from the word embeddings over the sentence, then apply attention over the hidden states to estimate the importance of each word and finally feed this context vector to another LSTM model to get a representation. Then, the output obtained with this model is used to get the prediction of each of the sub-tasks.

pdf bib
INF-HatEval at SemEval-2019 Task 5 : Convolutional Neural Networks for Hate Speech Detection Against Women and Immigrants on TwitterINF-HatEval at SemEval-2019 Task 5: Convolutional Neural Networks for Hate Speech Detection Against Women and Immigrants on Twitter
Alison Ribeiro | Nádia Silva

In this paper, we describe our approach to detect hate speech against women and immigrants on Twitter in a multilingual context, English and Spanish. This challenge was proposed by the SemEval-2019 Task 5, where participants should develop models for hate speech detection, a two-class classification where systems have to predict whether a tweet in English or in Spanish with a given target (women or immigrants) is hateful or not hateful (Task A), and whether the hate speech is directed at a specific person or a group of individuals (Task B). For this, we implemented a Convolutional Neural Networks (CNN) using pre-trained word embeddings (GloVe and FastText) with 300 dimensions. Our proposed model obtained in Task A 0.488 and 0.696 F1-score for English and Spanish, respectively. For Task B, the CNN obtained 0.297 and 0.430 EMR for English and Spanish, respectively.

pdf bib
LT3 at SemEval-2019 Task 5 : Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter (hatEval)LT3 at SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter (hatEval)
Nina Bauwelinck | Gilles Jacobs | Véronique Hoste | Els Lefever

This paper describes our contribution to the SemEval-2019 Task 5 on the detection of hate speech against immigrants and women in Twitter (hatEval). We considered a supervised classification-based approach to detect hate speech in English tweets, which combines a variety of standard lexical and syntactic features with specific features for capturing offensive language. Our experimental results show good classification performance on the training data, but a considerable drop in recall on the held-out test set.

pdf bib
MineriaUNAM at SemEval-2019 Task 5 : Detecting Hate Speech in Twitter using Multiple Features in a Combinatorial FrameworkMineriaUNAM at SemEval-2019 Task 5: Detecting Hate Speech in Twitter using Multiple Features in a Combinatorial Framework
Luis Enrique Argota Vega | Jorge Carlos Reyes-Magaña | Helena Gómez-Adorno | Gemma Bel-Enguix

This paper presents our approach to the Task 5 of Semeval-2019, which aims at detecting hate speech against immigrants and women in Twitter. The task consists of two sub-tasks, in Spanish and English : (A) detection of hate speech and (B) classification of hateful tweets as aggressive or not, and identification of the target harassed as individual or group. We used linguistically motivated features and several types of n-grams (words, characters, functional words, punctuation symbols, POS, among others). For task A, we trained a Support Vector Machine using a combinatorial framework, whereas for task B we followed a multi-labeled approach using the Random Forest classifier. Our approach achieved the highest F1-score in sub-task A for the Spanish language.

pdf bib
STUFIIT at SemEval-2019 Task 5 : Multilingual Hate Speech Detection on Twitter with MUSE and ELMo EmbeddingsSTUFIIT at SemEval-2019 Task 5: Multilingual Hate Speech Detection on Twitter with MUSE and ELMo Embeddings
Michal Bojkovský | Matúš Pikuliak

We present a number of models used for hate speech detection for Semeval 2019 Task-5 : Hateval. We evaluate the viability of multilingual learning for this task. We also experiment with adversarial learning as a means of creating a multilingual model. Ultimately our multilingual models have had worse results than their monolignual counterparts. We find that the choice of word representations (word embeddings) is very crucial for deep learning as a simple switch between MUSE and ELMo embeddings has shown a 3-4 % increase in accuracy. This also shows the importance of context when dealing with online content.

pdf bib
The binary trio at SemEval-2019 Task 5 : Multitarget Hate Speech Detection in TweetsSemEval-2019 Task 5: Multitarget Hate Speech Detection in Tweets
Patricia Chiril | Farah Benamara Zitoune | Véronique Moriceau | Abhishek Kumar

The massive growth of user-generated web content through blogs, online forums and most notably, social media networks, led to a large spreading of hatred or abusive messages which have to be moderated. This paper proposes a supervised approach to hate speech detection towards immigrants and women in English tweets. Several models have been developed ranging from feature-engineering approaches to neural ones.

pdf bib
The Titans at SemEval-2019 Task 5 : Detection of hate speech against immigrants and women in TwitterSemEval-2019 Task 5: Detection of hate speech against immigrants and women in Twitter
Avishek Garain | Arpan Basu

This system paper is a description of the system submitted to SemEval-2019 Task 5 Task B for the English language, where we had to primarily detect hate speech and then detect aggressive behaviour and its target audience in Twitter. There were two specific target audiences, immigrants and women. The language of the tweets was English. We were required to first detect whether a tweet is containing hate speech. Thereafter we were required to find whether the tweet was showing aggressive behaviour, and then we had to find whether the targeted audience was an individual or a group of people.

pdf bib
TuEval at SemEval-2019 Task 5 : LSTM Approach to Hate Speech Detection in English and SpanishTuEval at SemEval-2019 Task 5: LSTM Approach to Hate Speech Detection in English and Spanish
Mihai Manolescu | Denise Löfflad | Adham Nasser Mohamed Saber | Masoumeh Moradipour Tari

The detection of hate speech, especially in online platforms and forums, is quickly becoming a hot topic as anti-hate speech legislation begins to be applied to public discourse online. The HatEval shared task was created with this in mind ; participants were expected to develop a model capable of determining whether or not input (in this case, Twitter datasets in English and Spanish) could be considered hate speech (designated as Task A), if they were aggressive, and whether the tweet was targeting an individual, or speaking generally (Task B). We approached this task by creating an LSTM model with an embedding layer. We found that our model performed considerably better on English language input when compared to Spanish language input. In English, we achieved an F1-Score of 0.466 for Task A and 0.462 for Task B ; In Spanish, we achieved scores of 0.617 and 0.612 on Task A and Task B, respectively.

pdf bib
Tw-StAR at SemEval-2019 Task 5 : N-gram embeddings for Hate Speech Detection in Multilingual TweetsStAR at SemEval-2019 Task 5: N-gram embeddings for Hate Speech Detection in Multilingual Tweets
Hala Mulki | Chedi Bechikh Ali | Hatem Haddad | Ismail Babaoğlu

In this paper, we describe our contribution in SemEval-2019 : subtask A of task 5 Multilingual detection of hate speech against immigrants and women in Twitter (HatEval). We developed two hate speech detection model variants through Tw-StAR framework. While the first model adopted one-hot encoding ngrams to train an NB classifier, the second generated and learned n-gram embeddings within a feedforward neural network. For both models, specific terms, selected via MWT patterns, were tagged in the input data. With two feature types employed, we could investigate the ability of n-gram embeddings to rival one-hot n-grams. Our results showed that in English, n-gram embeddings outperformed one-hot ngrams. However, representing Spanish tweets by one-hot n-grams yielded a slightly better performance compared to that of n-gram embeddings. The official ranking indicated that Tw-StAR ranked 9th for English and 20th for Spanish.

pdf bib
UA at SemEval-2019 Task 5 : Setting A Strong Linear Baseline for Hate Speech DetectionUA at SemEval-2019 Task 5: Setting A Strong Linear Baseline for Hate Speech Detection
Carlos Perelló | David Tomás | Alberto Garcia-Garcia | Jose Garcia-Rodriguez | Jose Camacho-Collados

This paper describes the system developed at the University of Alicante (UA) for the SemEval 2019 Task 5 : Shared Task on Multilingual Detection of Hate. The purpose of this work is to build a strong baseline for hate speech detection, using a traditional machine learning approach with standard textual features, which could serve in a near future as a reference to compare with deep learning systems. We participated in both task A (Hate Speech Detection against Immigrants and Women) and task B (Aggressive behavior and Target Classification). Despite its simplicity, our system obtained a remarkable F1-score of 72.5 (sixth highest) and an accuracy of 73.6 (second highest) in Spanish (task A), outperforming more complex neural models from a total of 40 participant systems.

pdf bib
UNBNLP at SemEval-2019 Task 5 and 6 : Using Language Models to Detect Hate Speech and Offensive LanguageUNBNLP at SemEval-2019 Task 5 and 6: Using Language Models to Detect Hate Speech and Offensive Language
Ali Hakimi Parizi | Milton King | Paul Cook

In this paper we apply a range of approaches to language modeling including word-level n-gram and neural language models, and character-level neural language models to the problem of detecting hate speech and offensive language. Our findings indicate that language models are able to capture knowledge of whether text is hateful or offensive. However, our findings also indicate that more conventional approaches to text classification often perform similarly or better.

pdf bib
UTFPR at SemEval-2019 Task 5 : Hate Speech Identification with Recurrent Neural NetworksUTFPR at SemEval-2019 Task 5: Hate Speech Identification with Recurrent Neural Networks
Gustavo Henrique Paetzold | Marcos Zampieri | Shervin Malmasi

In this paper we revisit the problem of automatically identifying hate speech in posts from social media. We approach the task using a system based on minimalistic compositional Recurrent Neural Networks (RNN). We tested our approach on the SemEval-2019 Task 5 : Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter (HatEval) shared task dataset. The dataset made available by the HatEval organizers contained English and Spanish posts retrieved from Twitter annotated with respect to the presence of hateful content and its target. In this paper we present the results obtained by our system in comparison to the other entries in the shared task. Our system achieved competitive performance ranking 7th in sub-task A out of 62 systems in the English track.

pdf bib
YNU NLP at SemEval-2019 Task 5 : Attention and Capsule Ensemble for Identifying Hate SpeechYNU NLP at SemEval-2019 Task 5: Attention and Capsule Ensemble for Identifying Hate Speech
Bin Wang | Haiyan Ding

This paper describes the system submitted to SemEval 2019 Task 5 : Multilingual detection of hate speech against immigrants and women in Twitter (hatEval). Its main purpose is to conduct hate speech detection on Twitter, which mainly includes two specific different targets, immigrants and women. We participate in both subtask A and subtask B for English. In order to address this task, we develope an ensemble of an attention-LSTM model based on HAN and an BiGRU-capsule model. Both models use fastText pre-trained embeddings, and we use this model in both subtasks. In comparison to other participating teams, our system is ranked 16th in the Sub-task A for English, and 12th in the Sub-task B for English.

pdf bib
BNU-HKBU UIC NLP Team 2 at SemEval-2019 Task 6 : Detecting Offensive Language Using BERT modelBNU-HKBU UIC NLP Team 2 at SemEval-2019 Task 6: Detecting Offensive Language Using BERT model
Zhenghao Wu | Hao Zheng | Jianming Wang | Weifeng Su | Jefferson Fong

In this study we deal with the problem of identifying and categorizing offensive language in social media. Our group, BNU-HKBU UIC NLP Team2, use supervised classification along with multiple version of data generated by different ways of pre-processing the data. We then use the state-of-the-art model Bidirectional Encoder Representations from Transformers, or BERT (Devlin et al, 2018), to capture linguistic, syntactic and semantic features. Long range dependencies between each part of a sentence can be captured by BERT’s bidirectional encoder representations. Our results show 85.12 % accuracy and 80.57 % F1 scores in Subtask A (offensive language identification), 87.92 % accuracy and 50 % F1 scores in Subtask B (categorization of offense types), and 69.95 % accuracy and 50.47 % F1 score in Subtask C (offense target identification). Analysis of the results shows that distinguishing between targeted and untargeted offensive language is not a simple task. More work needs to be done on the unbalance data problem in Subtasks B and C. Some future work is also discussed.

pdf bib
ConvAI at SemEval-2019 Task 6 : Offensive Language Identification and Categorization with Perspective and BERTConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT
John Pavlopoulos | Nithum Thain | Lucas Dixon | Ion Androutsopoulos

This paper presents the application of two strong baseline systems for toxicity detection and evaluates their performance in identifying and categorizing offensive language in social media. PERSPECTIVE is an API, that serves multiple machine learning models for the improvement of conversations online, as well as a toxicity detection system, trained on a wide variety of comments from platforms across the Internet. BERT is a recently popular language representation model, fine tuned per task and achieving state of the art performance in multiple NLP tasks. PERSPECTIVE performed better than BERT in detecting toxicity, but BERT was much better in categorizing the offensive type. Both baselines were ranked surprisingly high in the SEMEVAL-2019 OFFENSEVAL competition, PERSPECTIVE in detecting an offensive post (12th) and BERT in categorizing it (11th). The main contribution of this paper is the assessment of two strong baselines for the identification (PERSPECTIVE) and the categorization (BERT) of offensive language with little or no additional training data.

pdf bib
DeepAnalyzer at SemEval-2019 Task 6 : A deep learning-based ensemble method for identifying offensive tweetsDeepAnalyzer at SemEval-2019 Task 6: A deep learning-based ensemble method for identifying offensive tweets
Gretel Liz De la Peña | Paolo Rosso

This paper describes the system we developed for SemEval 2019 on Identifying and Categorizing Offensive Language in Social Media (OffensEval-Task 6). The task focuses on offensive language in tweets. It is organized into three sub-tasks for offensive language identification ; automatic categorization of offense types and offense target identification. The approach for the first subtask is a deep learning-based ensemble method which uses a Bidirectional LSTM Recurrent Neural Network and a Convolutional Neural Network. Additionally we use the information from part-of-speech tagging of tweets for target identification and combine previous results for categorization of offense types.

pdf bib
Duluth at SemEval-2019 Task 6 : Lexical Approaches to Identify and Categorize Offensive TweetsDuluth at SemEval-2019 Task 6: Lexical Approaches to Identify and Categorize Offensive Tweets
Ted Pedersen

This paper describes the Duluth systems that participated in SemEval2019 Task 6, Identifying and Categorizing Offensive Language in Social Media (OffensEval). For the most part these systems took traditional Machine Learning approaches that built classifiers from lexical features found in manually labeled training data. However, our most successful system for classifying a tweet as offensive (or not) was a rule-based blacklist approach, and we also experimented with combining the training data from two different but related SemEval tasks. Our best systems in each of the three OffensEval tasks placed in the middle of the comparative evaluation, ranking 57th of 103 in task A, 39th of 75 in task B, and 44th of 65 in task C.

pdf bib
Ghmerti at SemEval-2019 Task 6 : A Deep Word- and Character-based Approach to Offensive Language IdentificationSemEval-2019 Task 6: A Deep Word- and Character-based Approach to Offensive Language Identification
Ehsan Doostmohammadi | Hossein Sameti | Ali Saffar

This paper presents the models submitted by Ghmerti team for subtasks A and B of the OffensEval shared task at SemEval 2019. OffensEval addresses the problem of identifying and categorizing offensive language in social media in three subtasks ; whether or not a content is offensive (subtask A), whether it is targeted (subtask B) towards an individual, a group, or other entities (subtask C). The proposed approach includes character-level Convolutional Neural Network, word-level Recurrent Neural Network, and some preprocessing. The performance achieved by the proposed model is 77.93 % macro-averaged F1-score.

pdf bib
HAD-Tbingen at SemEval-2019 Task 6 : Deep Learning Analysis of Offensive Language on Twitter : Identification and CategorizationHAD-Tübingen at SemEval-2019 Task 6: Deep Learning Analysis of Offensive Language on Twitter: Identification and Categorization
Himanshu Bansal | Daniel Nagel | Anita Soloveva

This paper describes the submissions of our team, HAD-Tbingen, for the SemEval 2019-Task 6 : OffensEval : Identifying and Categorizing Offensive Language in Social Media. We participated in all the three sub-tasks : Sub-task A-Offensive language identification

pdf bib
HHU at SemEval-2019 Task 6 : Context Does Matter-Tackling Offensive Language Identification and Categorization with ELMoHHU at SemEval-2019 Task 6: Context Does Matter - Tackling Offensive Language Identification and Categorization with ELMo
Alexander Oberstrass | Julia Romberg | Anke Stoll | Stefan Conrad

We present our results for OffensEval : Identifying and Categorizing Offensive Language in Social Media (SemEval 2019-Task 6). Our results show that context embeddings are important features for the three different sub-tasks in connection with classical machine and with deep learning. Our best model reached place 3 of 75 in sub-task B with a macro F_1 of 0.719. Our approaches for sub-task A and C perform less well but could also deliver promising results.F_1 of 0.719. Our approaches for sub-task A and C perform less well but could also deliver promising results.

pdf bib
INGEOTEC at SemEval-2019 Task 5 and Task 6 : A Genetic Programming Approach for Text ClassificationINGEOTEC at SemEval-2019 Task 5 and Task 6: A Genetic Programming Approach for Text Classification
Mario Graff | Sabino Miranda-Jiménez | Eric Tellez | Daniela Alejandra Ochoa

This paper describes our participation in HatEval and OffensEval challenges for English and Spanish languages. We used several approaches, B4MSA, FastText, and EvoMSA. Best results were achieved with EvoMSA, which is a multilingual and domain-independent architecture that combines the prediction of different knowledge sources to solve text classification problems.

pdf bib
JTML at SemEval-2019 Task 6 : Offensive Tweets Identification using Convolutional Neural NetworksJTML at SemEval-2019 Task 6: Offensive Tweets Identification using Convolutional Neural Networks
Johnny Torres | Carmen Vaca

In this paper, we propose the use of a Convolutional Neural Network (CNN) to identify offensive tweets, as well as the type and target of the offense. We use an end-to-end model (i.e., no preprocessing) and fine-tune pre-trained embeddings (FastText) during training for learning words’ representation. We compare the proposed CNN model to a baseline model, such as Linear Regression, and several neural models. The results show that CNN outperforms other models, and stands as a simple but strong baseline in comparison to other systems submitted to the Shared Task.

pdf bib
LaSTUS / TALN at SemEval-2019 Task 6 : Identification and Categorization of Offensive Language in Social Media with Attention-based Bi-LSTM modelLaSTUS/TALN at SemEval-2019 Task 6: Identification and Categorization of Offensive Language in Social Media with Attention-based Bi-LSTM model
Lutfiye Seda Mut Altin | Àlex Bravo Serrano | Horacio Saggion

We present a bidirectional Long-Short Term Memory network for identifying offensive language in Twitter. Our system has been developed in the context of the SemEval 2019 Task 6 which comprises three different sub-tasks, namely A : Offensive Language Detection, B : Categorization of Offensive Language, C : Offensive Language Target Identification. We used a pre-trained Word Embeddings in tweet data, including information about emojis and hashtags. Our approach achieves good performance in the three sub-tasks.

pdf bib
LTL-UDE at SemEval-2019 Task 6 : BERT and Two-Vote Classification for Categorizing OffensivenessLTL-UDE at SemEval-2019 Task 6: BERT and Two-Vote Classification for Categorizing Offensiveness
Piush Aggarwal | Tobias Horsmann | Michael Wojatzki | Torsten Zesch

We present results for Subtask A and C of SemEval 2019 Shared Task 6. In Subtask A, we experiment with an embedding representation of postings and use BERT to categorize postings. Our best result reaches the 10th place (out of 103). In Subtask C, we applied a two-vote classification approach with minority fallback, which is placed on the 19th rank (out of 65).

pdf bib
MIDAS at SemEval-2019 Task 6 : Identifying Offensive Posts and Targeted Offense from TwitterMIDAS at SemEval-2019 Task 6: Identifying Offensive Posts and Targeted Offense from Twitter
Debanjan Mahata | Haimin Zhang | Karan Uppal | Yaman Kumar | Rajiv Ratn Shah | Simra Shahid | Laiba Mehnaz | Sarthak Anand

In this paper we present our approach and the system description for Sub Task A and Sub Task B of SemEval 2019 Task 6 : Identifying and Categorizing Offensive Language in Social Media. Sub Task A involves identifying if a given tweet is offensive and Sub Task B involves detecting if an offensive tweet is targeted towards someone (group or an individual). Our models for Sub Task A is based on an ensemble of Convolutional Neural Network and Bidirectional LSTM, whereas for Sub Task B, we rely on a set of heuristics derived from the training data. We provide detailed analysis of the results obtained using the trained models. Our team ranked 5th out of 103 participants in Sub Task A, achieving a macro F1 score of 0.807, and ranked 8th out of 75 participants achieving a macro F1 of 0.695.

pdf bib
Nikolov-Radivchev at SemEval-2019 Task 6 : Offensive Tweet Classification with BERT and EnsemblesSemEval-2019 Task 6: Offensive Tweet Classification with BERT and Ensembles
Alex Nikolov | Victor Radivchev

This paper examines different approaches and models towards offensive tweet classification which were used as a part of the OffensEval 2019 competition. It reviews Tweet preprocessing, techniques for overcoming unbalanced class distribution in the provided test data, and comparison of multiple attempted machine learning models.

pdf bib
NIT_Agartala_NLP_Team at SemEval-2019 Task 6 : An Ensemble Approach to Identifying and Categorizing Offensive Language in Twitter Social Media CorporaNIT_Agartala_NLP_Team at SemEval-2019 Task 6: An Ensemble Approach to Identifying and Categorizing Offensive Language in Twitter Social Media Corpora
Steve Durairaj Swamy | Anupam Jamatia | Björn Gambäck | Amitava Das

The paper describes the systems submitted to OffensEval (SemEval 2019, Task 6) on ‘Identifying and Categorizing Offensive Language in Social Media’ by the ‘NIT_Agartala_NLP_Team’. A Twitter annotated dataset of 13,240 English tweets was provided by the task organizers to train the individual models, with the best results obtained using an ensemble model composed of six different classifiers. The ensemble model produced macro-averaged F1-scores of 0.7434, 0.7078 and 0.4853 on Subtasks A, B, and C, respectively. The paper highlights the overall low predictive nature of various linguistic features and surface level count features, as well as the limitations of a traditional machine learning approach when compared to a Deep Learning counterpart.

pdf bib
NLP@UIOWA at SemEval-2019 Task 6 : Classifying the Crass using Multi-windowed CNNsNLP@UIOWA at SemEval-2019 Task 6: Classifying the Crass using Multi-windowed CNNs
Jonathan Rusert | Padmini Srinivasan

This paper proposes a system for OffensEval (SemEval 2019 Task 6), which calls for a system to classify offensive language into several categories. Our system is a text based CNN, which learns only from the provided training data. Our system achieves 80-90 % accuracy for the binary classification problems (offensive vs not offensive and targeted vs untargeted) and 63 % accuracy for trinary classification (group vs individual vs other).

pdf bib
nlpUP at SemEval-2019 Task 6 : A Deep Neural Language Model for Offensive Language DetectionUP at SemEval-2019 Task 6: A Deep Neural Language Model for Offensive Language Detection
Jelena Mitrović | Bastian Birkeneder | Michael Granitzer

This paper presents our submission for the SemEval shared task 6, sub-task A on the identification of offensive language. Our proposed model, C-BiGRU, combines a Convolutional Neural Network (CNN) with a bidirectional Recurrent Neural Network (RNN). We utilize word2vec to capture the semantic similarities between words. This composition allows us to extract long term dependencies in tweets and distinguish between offensive and non-offensive tweets. In addition, we evaluate our approach on a different dataset and show that our model is capable of detecting online aggressiveness in both English and German tweets. Our model achieved a macro F1-score of 79.40 % on the SemEval dataset.

pdf bib
TECHSSN at SemEval-2019 Task 6 : Identifying and Categorizing Offensive Language in Tweets using Deep Neural NetworksTECHSSN at SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Tweets using Deep Neural Networks
Angel Suseelan | Rajalakshmi S | Logesh B | Harshini S | Geetika B | Dyaneswaran S | S Milton Rajendram | Mirnalinee T T

Task 6 of SemEval 2019 involves identifying and categorizing offensive language in social media. The systems developed by TECHSSN team uses multi-level classification techniques. We have developed two systems. In the first system, the first level of classification is done by a multi-branch 2D CNN classifier with Google’s pre-trained Word2Vec embedding and the second level of classification by string matching technique supported by offensive and bad words dictionary. The second system uses a multi-branch 1D CNN classifier with Glove pre-trained embedding layer for the first level of classification and string matching for the second level of classification. Input data with a probability of less than 0.70 in the first level are passed on to the second level. The misclassified examples are classified correctly in the second level.

pdf bib
The Titans at SemEval-2019 Task 6 : Offensive Language Identification, Categorization and Target IdentificationSemEval-2019 Task 6: Offensive Language Identification, Categorization and Target Identification
Avishek Garain | Arpan Basu

This system paper is a description of the system submitted to SemEval-2019 Task 6, where we had to detect offensive language in Twitter. There were two specific target audiences, immigrants and women. The language of the tweets was English. We were required to first detect whether a tweet contains offensive content, and then we had to find out whether the tweet was targeted against some individual, group or other entity. Finally we were required to classify the targeted audience.

pdf bib
TKaSt at SemEval-2019 Task 6 : Something Old, Something Neu(ral): Traditional and Neural Approaches to Offensive Text ClassificationTüKaSt at SemEval-2019 Task 6: Something Old, Something Neu(ral): Traditional and Neural Approaches to Offensive Text Classification
Madeeswaran Kannan | Lukas Stein

We describe our system (TKaSt) submitted for Task 6 : Offensive Language Classification, at SemEval 2019. We developed multiple SVM classifier models that used sentence-level dense vector representations of tweets enriched with sentiment information and term-weighting. Our best results achieved F1 scores of 0.734, 0.660 and 0.465 in the first, second and third sub-tasks respectively. We also describe a neural network model that was developed in parallel but not used during evaluation due to time constraints.

pdf bib
TUVD team at SemEval-2019 Task 6 : Offense Target IdentificationTUVD team at SemEval-2019 Task 6: Offense Target Identification
Elena Shushkevich | John Cardiff | Paolo Rosso

This article presents our approach for detecting a target of offensive messages in Twitter, including Individual, Group and Others classes. The model we have created is an ensemble of simpler models, including Logistic Regression, Naive Bayes, Support Vector Machine and the interpolation between Logistic Regression and Naive Bayes with 0.25 coefficient of interpolation. The model allows us to achieve 0.547 macro F1-score.

pdf bib
UBC-NLP at SemEval-2019 Task 6 : Ensemble Learning of Offensive Content With Enhanced Training DataUBC-NLP at SemEval-2019 Task 6: Ensemble Learning of Offensive Content With Enhanced Training Data
Arun Rajendran | Chiyu Zhang | Muhammad Abdul-Mageed

We examine learning offensive content on Twitter with limited, imbalanced data. For the purpose, we investigate the utility of using various data enhancement methods with a host of classical ensemble classifiers. Among the 75 participating teams in SemEval-2019 sub-task B, our system ranks 6th (with 0.706 macro F1-score). For sub-task C, among the 65 participating teams, our system ranks 9th (with 0.587 macro F1-score).

pdf bib
UHH-LT at SemEval-2019 Task 6 : Supervised vs. Unsupervised Transfer Learning for Offensive Language DetectionUHH-LT at SemEval-2019 Task 6: Supervised vs. Unsupervised Transfer Learning for Offensive Language Detection
Gregor Wiedemann | Eugen Ruppert | Chris Biemann

We present a neural network based approach of transfer learning for offensive language detection. For our system, we compare two types of knowledge transfer : supervised and unsupervised pre-training. Supervised pre-training of our bidirectional GRU-3-CNN architecture is performed as multi-task learning of parallel training of five different tasks. The selected tasks are supervised classification problems from public NLP resources with some overlap to offensive language such as sentiment detection, emoji classification, and aggressive language classification. Unsupervised transfer learning is performed with a thematic clustering of 40 M unlabeled tweets via LDA. Based on this dataset, pre-training is performed by predicting the main topic of a tweet. Results indicate that unsupervised transfer from large datasets performs slightly better than supervised training on small ‘near target category’ datasets. In the SemEval Task, our system ranks 14 out of 103 participants.

pdf bib
UM-IU@LING at SemEval-2019 Task 6 : Identifying Offensive Tweets Using BERT and SVMsUM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs
Jian Zhu | Zuoyu Tian | Sandra Kübler

This paper describes the UM-IU@LING’s system for the SemEval 2019 Task 6 : Offens-Eval. We take a mixed approach to identify and categorize hate speech in social media. In subtask A, we fine-tuned a BERT based classifier to detect abusive content in tweets, achieving a macro F1 score of 0.8136 on the test data, thus reaching the 3rd rank out of 103 submissions. In subtasks B and C, we used a linear SVM with selected character n-gram features. For subtask C, our system could identify the target of abuse with a macro F1 score of 0.5243, ranking it 27th out of 65 submissions.

pdf bib
YNUWB at SemEval-2019 Task 6 : K-max pooling CNN with average meta-embedding for identifying offensive languageYNUWB at SemEval-2019 Task 6: K-max pooling CNN with average meta-embedding for identifying offensive language
Bin Wang | Xiaobing Zhou | Xuejie Zhang

This paper describes the system submitted to SemEval 2019 Task 6 : OffensEval 2019. The task aims to identify and categorize offensive language in social media, we only participate in Sub-task A, which aims to identify offensive language. In order to address this task, we propose a system based on a K-max pooling convolutional neural network model, and use an argument for averaging as a valid meta-embedding technique to get a metaembedding. Finally, we also use a cyclic learning rate policy to improve model performance. Our model achieves a Macro F1-score of 0.802 (ranked 9/103) in the Sub-task A.

pdf bib
SemEval-2019 Task 7 : RumourEval, Determining Rumour Veracity and Support for RumoursSemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours
Genevieve Gorrell | Elena Kochkina | Maria Liakata | Ahmet Aker | Arkaitz Zubiaga | Kalina Bontcheva | Leon Derczynski

Since the first RumourEval shared task in 2017, interest in automated claim validation has greatly increased, as the danger of fake news has become a mainstream concern. However automated support for rumour verification remains in its infancy. It is therefore important that a shared task in this area continues to provide a focus for effort, which is likely to increase. Rumour verification is characterised by the need to consider evolving conversations and news updates to reach a verdict on a rumour’s veracity. As in RumourEval 2017 we provided a dataset of dubious posts and ensuing conversations in social media, annotated both for stance and veracity. The social media rumours stem from a variety of breaking news stories and the dataset is expanded to include Reddit as well as new Twitter posts. There were two concrete tasks ; rumour stance prediction and rumour verification, which we present in detail along with results achieved by participants. We received 22 system submissions (a 70 % increase from RumourEval 2017) many of which used state-of-the-art methodology to tackle the challenges involved.

pdf bib
eventAI at SemEval-2019 Task 7 : Rumor Detection on Social Media by Exploiting Content, User Credibility and Propagation InformationAI at SemEval-2019 Task 7: Rumor Detection on Social Media by Exploiting Content, User Credibility and Propagation Information
Quanzhi Li | Qiong Zhang | Luo Si

This paper describes our system for SemEval 2019 RumorEval : Determining rumor veracity and support for rumors (SemEval 2019 Task 7). This track has two tasks : Task A is to determine a user’s stance towards the source rumor, and Task B is to detect the veracity of the rumor : true, false or unverified. For stance classification, a neural network model with language features is utilized. For rumor verification, our approach exploits information from different dimensions : rumor content, source credibility, user credibility, user stance, event propagation path, etc. We use an ensemble approach in both tasks, which includes neural network models as well as the traditional classification algorithms. Our system is ranked 1st place in the rumor verification task by both the macro F1 measure and the RMSE measure.

pdf bib
AUTOHOME-ORCA at SemEval-2019 Task 8 : Application of BERT for Fact-Checking in Community ForumsAUTOHOME-ORCA at SemEval-2019 Task 8: Application of BERT for Fact-Checking in Community Forums
Zhengwei Lv | Duoxing Liu | Haifeng Sun | Xiao Liang | Tao Lei | Zhizhong Shi | Feng Zhu | Lei Yang

Fact checking is an important task for maintaining high quality posts and improving user experience in Community Question Answering forums. Therefore, the SemEval-2019 task 8 is aimed to identify factual question (subtask A) and detect true factual information from corresponding answers (subtask B). In order to address this task, we propose a system based on the BERT model with meta information of questions. For the subtask A, the outputs of fine-tuned BERT classification model are combined with the feature of length of questions to boost the performance. For the subtask B, the predictions of several variants of BERT model encoding the meta information are combined to create an ensemble model. Our system achieved competitive results with an accuracy of 0.82 in the subtask A and 0.83 in the subtask B. The experimental results validate the effectiveness of our system.

pdf bib
AiFu at SemEval-2019 Task 10 : A Symbolic and Sub-symbolic Integrated System for SAT Math Question AnsweringAiFu at SemEval-2019 Task 10: A Symbolic and Sub-symbolic Integrated System for SAT Math Question Answering
Yifan Liu | Keyu Ding | Yi Zhou

AiFu has won the first place in the SemEval-2019 Task 10-Math Question Answeringcompetition. This paper is to describe how it works technically and to report and analyze some essential experimental results

pdf bib
Clark Kent at SemEval-2019 Task 4 : Stylometric Insights into Hyperpartisan News DetectionClark Kent at SemEval-2019 Task 4: Stylometric Insights into Hyperpartisan News Detection
Viresh Gupta | Baani Leen Kaur Jolly | Ramneek Kaur | Tanmoy Chakraborty

In this paper, we present a news bias prediction system, which we developed as part of a SemEval 2019 task. We developed an XGBoost based system which uses character and word level n-gram features represented using TF-IDF, count vector based correlation matrix, and predicts if an input news article is a hyperpartisan news article. Our model was able to achieve a precision of 68.3 % on the test set provided by the contest organizers. We also run our model on the BuzzFeed corpus and find XGBoost with simple character level N-Gram embeddings to be performing well with an accuracy of around 96 %.

pdf bib
Dick-Preston and Morbo at SemEval-2019 Task 4 : Transfer Learning for Hyperpartisan News DetectionSemEval-2019 Task 4: Transfer Learning for Hyperpartisan News Detection
Tim Isbister | Fredrik Johansson

In a world of information operations, influence campaigns, and fake news, classification of news articles as following hyperpartisan argumentation or not is becoming increasingly important. We present a deep learning-based approach in which a pre-trained language model has been fine-tuned on domain-specific data and used for classification of news articles, as part of the SemEval-2019 task on hyperpartisan news detection. The suggested approach yields accuracy and F1-scores around 0.8 which places the best performing classifier among the top-5 systems in the competition.

pdf bib
Harvey Mudd College at SemEval-2019 Task 4 : The Clint Buchanan Hyperpartisan News DetectorSemEval-2019 Task 4: The Clint Buchanan Hyperpartisan News Detector
Mehdi Drissi | Pedro Sandoval Segura | Vivaswat Ojha | Julie Medero

We investigate the recently developed Bidi- rectional Encoder Representations from Transformers (BERT) model (Devlin et al. 2018) for the hyperpartisan news detection task. Using a subset of hand-labeled articles from SemEval as a validation set, we test the performance of different parameters for BERT models. We find that accuracy from two different BERT models using different proportions of the articles is consistently high, with our best-performing model on the validation set achieving 85 % accuracy and the best-performing model on the test set achieving 77 %. We further determined that our model exhibits strong consistency, labeling independent slices of the same article identically. Finally, we find that randomizing the order of word pieces dramatically reduces validation accuracy (to approximately 60 %), but that shuffling groups of four or more word pieces maintains an accuracy of about 80 %, indicating the model mainly gains value from local context.

pdf bib
Harvey Mudd College at SemEval-2019 Task 4 : The D.X. Beaumont Hyperpartisan News DetectorSemEval-2019 Task 4: The D.X. Beaumont Hyperpartisan News Detector
Evan Amason | Jake Palanker | Mary Clare Shen | Julie Medero

We use the 600 hand-labelled articles from SemEval Task 4 to hand-tune a classifier with 3000 features for the Hyperpartisan News Detection task. Our final system uses features based on bag-of-words (BoW), analysis of the article title, language complexity, and simple sentiment analysis in a naive Bayes classifier. We trained our final system on the 600,000 articles labelled by publisher. Our final system has an accuracy of 0.653 on the hand-labeled test set. The most effective features are the Automated Readability Index and the presence of certain words in the title. This suggests that hyperpartisan writing uses a distinct writing style, especially in the title.

pdf bib
Spider-Jerusalem at SemEval-2019 Task 4 : Hyperpartisan News DetectionJerusalem at SemEval-2019 Task 4: Hyperpartisan News Detection
Amal Alabdulkarim | Tariq Alhindi

This paper describes our system for detecting hyperpartisan news articles, which was submitted for the shared task in SemEval 2019 on Hyperpartisan News Detection. We developed a Support Vector Machine (SVM) model that uses TF-IDF of tokens, Language Inquiry and Word Count (LIWC) features, and structural features such as number of paragraphs and hyperlink count in an article. The model was trained on 645 articles from two classes : mainstream and hyperpartisan. Our system was ranked seventeenth out of forty two participating teams in the binary classification task with an accuracy score of 0.742 on the blind test set (the accuracy of the top ranked system was 0.822). We provide a detailed description of our preprocessing steps, discussion of our experiments using different combinations of features, and analysis of our results and prediction errors.

pdf bib
Steve Martin at SemEval-2019 Task 4 : Ensemble Learning Model for Detecting Hyperpartisan NewsMartin at SemEval-2019 Task 4: Ensemble Learning Model for Detecting Hyperpartisan News
Youngjun Joo | Inchon Hwang

This paper describes our submission to task 4 in SemEval 2019, i.e., hyperpartisan news detection. Our model aims at detecting hyperpartisan news by incorporating the style-based features and the content-based features. We extract a broad number of feature sets and use as our learning algorithms the GBDT and the n-gram CNN model. Finally, we apply the weighted average for effective learning between the two models. Our model achieves an accuracy of 0.745 on the test set in subtask A.

pdf bib
Team Fernando-Pessa at SemEval-2019 Task 4 : Back to Basics in Hyperpartisan News DetectionSemEval-2019 Task 4: Back to Basics in Hyperpartisan News Detection
André Cruz | Gil Rocha | Rui Sousa-Silva | Henrique Lopes Cardoso

This paper describes our submission to the SemEval 2019 Hyperpartisan News Detection task. Our system aims for a linguistics-based document classification from a minimal set of interpretable features, while maintaining good performance. To this goal, we follow a feature-based approach and perform several experiments with different machine learning classifiers. Additionally, we explore feature importances and distributions among the two classes. On the main task, our model achieved an accuracy of 71.7 %, which was improved after the task’s end to 72.9 %. We also participate on the meta-learning sub-task, for classifying documents with the binary classifications of all submitted systems as input, achieving an accuracy of 89.9 %.

pdf bib
Team Jack Ryder at SemEval-2019 Task 4 : Using BERT Representations for Detecting Hyperpartisan NewsSemEval-2019 Task 4: Using BERT Representations for Detecting Hyperpartisan News
Daniel Shaprin | Giovanni Da San Martino | Alberto Barrón-Cedeño | Preslav Nakov

We describe the system submitted by the Jack Ryder team to SemEval-2019 Task 4 on Hyperpartisan News Detection. The task asked participants to predict whether a given article is hyperpartisan, i.e., extreme-left or extreme-right. We proposed an approach based on BERT with fine-tuning, which was ranked 7th out 28 teams on the distantly supervised dataset, where all articles from a hyperpartisan / non-hyperpartisan news outlet are considered to be hyperpartisan / non-hyperpartisan. On a manually annotated test dataset, where human annotators double-checked the labels, we were ranked 29th out of 42 teams.

pdf bib
Team Kit Kittredge at SemEval-2019 Task 4 : LSTM Voting SystemSemEval-2019 Task 4: LSTM Voting System
Rebekah Cramerus | Tatjana Scheffler

This paper describes the approach of team Kit Kittredge to SemEval-2019 Task 4 : Hyperpartisan News Detection. The goal was binary classification of news articles into the categories of biased or unbiased. We had two software submissions : one a simple bag-of-words model, and the second an LSTM (Long Short Term Memory) neural network, which was trained on a subset of the original dataset selected by a voting system of other LSTMs. This method did not prove much more successful than the baseline, however, due to the models’ tendency to learn publisher-specific traits instead of general bias.

pdf bib
Team Peter Brinkmann at SemEval-2019 Task 4 : Detecting Biased News Articles Using Convolutional Neural NetworksSemEval-2019 Task 4: Detecting Biased News Articles Using Convolutional Neural Networks
Michael Färber | Agon Qurdina | Lule Ahmedi

In this paper, we present an approach for classifying news articles as biased (i.e., hyperpartisan) or unbiased, based on a convolutional neural network. We experiment with various embedding methods (pretrained and trained on the training dataset) and variations of the convolutional neural network architecture and compare the results. When evaluating our best performing approach on the actual test data set of the SemEval 2019 Task 4, we obtained relatively low precision and accuracy values, while gaining the highest recall rate among all 42 participating teams.

pdf bib
Team Peter-Parker at SemEval-2019 Task 4 : BERT-Based Method in Hyperpartisan News DetectionSemEval-2019 Task 4: BERT-Based Method in Hyperpartisan News Detection
Zhiyuan Ning | Yuanzhen Lin | Ruichao Zhong

This paper describes the team peter-parker’s participation in Hyperpartisan News Detection task (SemEval-2019 Task 4), which requires to classify whether a given news article is bias or not. We decided to use JAVA to do the article parsing tool and the BERT-Based model to do the bias prediction. Furthermore, we will show experiment results with analysis.

pdf bib
Team Xenophilius Lovegood at SemEval-2019 Task 4 : Hyperpartisanship Classification using Convolutional Neural NetworksSemEval-2019 Task 4: Hyperpartisanship Classification using Convolutional Neural Networks
Albin Zehe | Lena Hettinger | Stefan Ernst | Christian Hauptmann | Andreas Hotho

This paper describes our system for the SemEval 2019 Task 4 on hyperpartisan news detection. We build on an existing deep learning approach for sentence classification based on a Convolutional Neural Network. Modifying the original model with additional layers to increase its expressiveness and finally building an ensemble of multiple versions of the model, we obtain an accuracy of 67.52 % and an F1 score of 73.78 % on the main test dataset. We also report on additional experiments incorporating handcrafted features into the CNN and using it as a feature extractor for a linear SVM.

pdf bib
The Sally Smedley Hyperpartisan News Detector at SemEval-2019 Task 4SemEval-2019 Task 4
Kazuaki Hanawa | Shota Sasaki | Hiroki Ouchi | Jun Suzuki | Kentaro Inui

This paper describes our system submitted to the formal run of SemEval-2019 Task 4 : Hyperpartisan news detection. Our system is based on a linear classifier using several features, i.e., 1) embedding features based on the pre-trained BERT embeddings, 2) article length features, and 3) embedding features of informative phrases extracted from by-publisher dataset. Our system achieved 80.9 % accuracy on the test set for the formal run and got the 3rd place out of 42 teams.

pdf bib
Tintin at SemEval-2019 Task 4 : Detecting Hyperpartisan News Article with only Simple TokensSemEval-2019 Task 4: Detecting Hyperpartisan News Article with only Simple Tokens
Yves Bestgen

Tintin, the system proposed by the CECL for the Hyperpartisan News Detection task of SemEval 2019, is exclusively based on the tokens that make up the documents and a standard supervised learning procedure. It obtained very contrasting results : poor on the main task, but much more effective at distinguishing documents published by hyperpartisan media outlets from unbiased ones, as it ranked first. An analysis of the most important features highlighted the positive aspects, but also some potential limitations of the approach.

pdf bib
Vernon-fenwick at SemEval-2019 Task 4 : Hyperpartisan News Detection using Lexical and Semantic FeaturesSemEval-2019 Task 4: Hyperpartisan News Detection using Lexical and Semantic Features
Vertika Srivastava | Ankita Gupta | Divya Prakash | Sudeep Kumar Sahoo | Rohit R.R | Yeon Hyang Kim

In this paper, we present our submission for SemEval-2019 Task 4 : Hyperpartisan News Detection. Hyperpartisan news articles are sharply polarized and extremely biased (onesided). It shows blind beliefs, opinions and unreasonable adherence to a party, idea, faction or a person. Through this task, we aim to develop an automated system that can be used to detect hyperpartisan news and serve as a prescreening technique for fake news detection. The proposed system jointly uses a rich set of handcrafted textual and semantic features. Our system achieved 2nd rank on the primary metric (82.0 % accuracy) and 1st rank on the secondary metric (82.1 % F1-score), among all participating teams. Comparison with the best performing system on the leaderboard shows that our system is behind by only 0.2 % absolute difference in accuracy.

pdf bib
BLCU_NLP at SemEval-2019 Task 7 : An Inference Chain-based GPT Model for Rumour EvaluationBLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation
Ruoyao Yang | Wanying Xie | Chunhua Liu | Dong Yu

Researchers have been paying increasing attention to rumour evaluation due to the rapid spread of unsubstantiated rumours on social media platforms, including SemEval 2019 task 7. However, labelled data for learning rumour veracity is scarce, and labels in rumour stance data are highly disproportionate, making it challenging for a model to perform supervised-learning adequately. We propose an inference chain-based system, which fully utilizes conversation structure-based knowledge in the limited data and expand the training data in minority categories to alleviate class imbalance. Our approach obtains 12.6 % improvement upon the baseline system for subtask A, ranks 1st among 21 systems in subtask A, and ranks 4th among 12 systems in subtask B.

pdf bib
CLEARumor at SemEval-2019 Task 7 : ConvoLving ELMo Against RumorsCLEARumor at SemEval-2019 Task 7: ConvoLving ELMo Against Rumors
Ipek Baris | Lukas Schmelzeisen | Steffen Staab

This paper describes our submission to SemEval-2019 Task 7 : RumourEval : Determining Rumor Veracity and Support for Rumors. We participated in both subtasks. The goal of subtask A is to classify the type of interaction between a rumorous social media post and a reply post as support, query, deny, or comment. The goal of subtask B is to predict the veracity of a given rumor. For subtask A, we implement a CNN-based neural architecture using ELMo embeddings of post text combined with auxiliary features and achieve a F1-score of 44.6 %. For subtask B, we employ a MLP neural network leveraging our estimates for subtask A and achieve a F1-score of 30.1 % (second place in the competition). We provide results and analysis of our system performance and present ablation experiments.

pdf bib
GWU NLP at SemEval-2019 Task 7 : Hybrid Pipeline for Rumour Veracity and Stance Classification on Social MediaGWU NLP at SemEval-2019 Task 7: Hybrid Pipeline for Rumour Veracity and Stance Classification on Social Media
Sardar Hamidian | Mona Diab

Social media plays a crucial role as the main resource news for information seekers online. However, the unmoderated feature of social media platforms lead to the emergence and spread of untrustworthy contents which harm individuals or even societies. Most of the current automated approaches for automatically determining the veracity of a rumor are not generalizable for novel emerging topics. This paper describes our hybrid system comprising rules and a machine learning model which makes use of replied tweets to identify the veracity of the source tweet. The proposed system in this paper achieved 0.435 F-Macro in stance classification, and 0.262 F-macro and 0.801 RMSE in rumor verification tasks in Task7 of SemEval 2019.

pdf bib
SINAI-DL at SemEval-2019 Task 7 : Data Augmentation and Temporal ExpressionsSINAI-DL at SemEval-2019 Task 7: Data Augmentation and Temporal Expressions
Miguel A. García-Cumbreras | Salud María Jiménez-Zafra | Arturo Montejo-Ráez | Manuel Carlos Díaz-Galiano | Estela Saquete

This paper describes the participation of the SINAI-DL team at RumourEval (Task 7 in SemEval 2019, subtask A : SDQC). SDQC addresses the challenge of rumour stance classification as an indirect way of identifying potential rumours. Given a tweet with several replies, our system classifies each reply into either supporting, denying, questioning or commenting on the underlying rumours. We have applied data augmentation, temporal expressions labelling and transfer learning with a four-layer neural classifier. We achieve an accuracy of 0.715 with the official run over reply tweets.

pdf bib
UPV-28-UNITO at SemEval-2019 Task 7 : Exploiting Post’s Nesting and Syntax Information for Rumor Stance ClassificationUPV-28-UNITO at SemEval-2019 Task 7: Exploiting Post’s Nesting and Syntax Information for Rumor Stance Classification
Bilal Ghanem | Alessandra Teresa Cignarella | Cristina Bosco | Paolo Rosso | Francisco Manuel Rangel Pardo

In the present paper we describe the UPV-28-UNITO system’s submission to the RumorEval 2019 shared task. The approach we applied for addressing both the subtasks of the contest exploits both classical machine learning algorithms and word embeddings, and it is based on diverse groups of features : stylistic, lexical, emotional, sentiment, meta-structural and Twitter-based. A novel set of features that take advantage of the syntactic information in texts is moreover introduced in the paper.

pdf bib
ColumbiaNLP at SemEval-2019 Task 8 : The Answer is Language Model Fine-tuningColumbiaNLP at SemEval-2019 Task 8: The Answer is Language Model Fine-tuning
Tuhin Chakrabarty | Smaranda Muresan

Community Question Answering forums are very popular nowadays, as they represent effective means for communities to share information around particular topics. But the information shared on these forums are often not authentic. This paper presents the ColumbiaNLP submission for the SemEval-2019 Task 8 : Fact-Checking in Community Question Answering Forums. We show how fine-tuning a language model on a large unannotated corpus of old threads from Qatar Living forum helps us to classify question types (factual, opinion, socializing) and to judge the factuality of answers on the shared task labeled data from the same forum. Our system finished 4th and 2nd on Subtask A (question type classification) and B (answer factuality prediction), respectively, based on the official metric of accuracy.

pdf bib
Fermi at SemEval-2019 Task 8 : An elementary but effective approach to Question Discernment in Community QA ForumsSemEval-2019 Task 8: An elementary but effective approach to Question Discernment in Community QA Forums
Bakhtiyar Syed | Vijayasaradhi Indurthi | Manish Shrivastava | Manish Gupta | Vasudeva Varma

Online Community Question Answering Forums (cQA) have gained massive popularity within recent years. The rise in users for such forums have led to the increase in the need for automated evaluation for question comprehension and fact evaluation of the answers provided by various participants in the forum. Our team, Fermi, participated in sub-task A of Task 8 at SemEval 2019-which tackles the first problem in the pipeline of factual evaluation in cQA forums, i.e., deciding whether a posed question asks for a factual information, an opinion / advice or is just socializing. This information is highly useful in segregating factual questions from non-factual ones which highly helps in organizing the questions into useful categories and trims down the problem space for the next task in the pipeline for fact evaluation among the available answers. Our system uses the embeddings obtained from Universal Sentence Encoder combined with XGBoost for the classification sub-task A. We also evaluate other combinations of embeddings and off-the-shelf machine learning algorithms to demonstrate the efficacy of the various representations and their combinations. Our results across the evaluation test set gave an accuracy of 84 % and received the first position in the final standings judged by the organizers.Fermi, participated in sub-task A of Task 8 at SemEval 2019 - which tackles the first problem in the pipeline of factual evaluation in cQA forums, i.e., deciding whether a posed question asks for a factual information, an opinion/advice or is just socializing. This information is highly useful in segregating factual questions from non-factual ones which highly helps in organizing the questions into useful categories and trims down the problem space for the next task in the pipeline for fact evaluation among the available answers. Our system uses the embeddings obtained from Universal Sentence Encoder combined with XGBoost for the classification sub-task A. We also evaluate other combinations of embeddings and off-the-shelf machine learning algorithms to demonstrate the efficacy of the various representations and their combinations. Our results across the evaluation test set gave an accuracy of 84% and received the first position in the final standings judged by the organizers.

pdf bib
TueFact at SemEval 2019 Task 8 : Fact checking in community question answering forums : context mattersTueFact at SemEval 2019 Task 8: Fact checking in community question answering forums: context matters
Réka Juhász | Franziska Barbara Linnenschmidt | Teslin Roys

The SemEval 2019 Task 8 on Fact-Checking in community question answering forums aimed to classify questions into categories and verify the correctness of answers given on the QatarLiving public forum. The task was divided into two subtasks : the first classifying the question, the second the answers. The TueFact system described in this paper used different approaches for the two subtasks. Subtask A makes use of word vectors based on a bag-of-word-ngram model using up to trigrams. Predictions are done using multi-class logistic regression. The official SemEval result lists an accuracy of 0.60. Subtask B uses vectorized character n-grams up to trigrams instead. Predictions are done using a LSTM model and achieved an accuracy of 0.53 on the final SemEval Task 8 evaluation set.

pdf bib
YNU-HPCC at SemEval-2019 Task 8 : Using A LSTM-Attention Model for Fact-Checking in Community ForumsYNU-HPCC at SemEval-2019 Task 8: Using A LSTM-Attention Model for Fact-Checking in Community Forums
Peng Liu | Jin Wang | Xuejie Zhang

We propose a system that uses a long short-term memory with attention mechanism (LSTM-Attention) model to complete the task. The LSTM-Attention model uses two LSTM to extract the features of the question and answer pair. Then, each of the features is sequentially composed using the attention mechanism, concatenating the two vectors into one. Finally, the concatenated vector is used as input for the MLP and the MLP’s output layer uses the softmax function to classify the provided answers into three categories. This model is capable of extracting the features of the question and answer pair well. The results show that the proposed system outperforms the baseline algorithm.

pdf bib
MIDAS at SemEval-2019 Task 9 : Suggestion Mining from Online Reviews using ULMFitMIDAS at SemEval-2019 Task 9: Suggestion Mining from Online Reviews using ULMFit
Sarthak Anand | Debanjan Mahata | Kartik Aggarwal | Laiba Mehnaz | Simra Shahid | Haimin Zhang | Yaman Kumar | Rajiv Shah | Karan Uppal

In this paper we present our approach to tackle the Suggestion Mining from Online Reviews and Forums Sub-Task A. Given a review, we are asked to predict whether the review consists of a suggestion or not. Our model is based on Universal Language Model Fine-tuning for Text Classification. We apply various pre-processing techniques before training the language and the classification model. We further provide analysis of the model. Our team ranked 10th out of 34 participants, achieving an F1 score of 0.7011.

pdf bib
NTUA-ISLab at SemEval-2019 Task 9 : Mining Suggestions in the wildNTUA-ISLab at SemEval-2019 Task 9: Mining Suggestions in the wild
Rolandos Alexandros Potamias | Alexandros Neofytou | Georgios Siolas

As online customer forums and product comparison sites increase their societal influence, users are actively expressing their opinions and posting their recommendations on their fellow customers online. However, systems capable of recognizing suggestions still lack in stability. Suggestion Mining, a novel and challenging field of Natural Language Processing, is increasingly gaining attention, aiming to track user advice on online forums. In this paper, a carefully designed methodology to identify customer-to-company and customer-to-customer suggestions is presented. The methodology implements a rule-based classifier using heuristic, lexical and syntactic patterns. The approach ranked at 5th and 1st position, achieving an f1-score of 0.749 and 0.858 for SemEval-2019 / Suggestion Mining sub-tasks A and B, respectively. In addition, we were able to improve performance results by combining the rule-based classifier with a recurrent convolutional neural network, that exhibits an f1-score of 0.79 for subtask A.

pdf bib
SSN-SPARKS at SemEval-2019 Task 9 : Mining Suggestions from Online Reviews using Deep Learning Techniques on Augmented DataSSN-SPARKS at SemEval-2019 Task 9: Mining Suggestions from Online Reviews using Deep Learning Techniques on Augmented Data
Rajalakshmi S | Angel Suseelan | S Milton Rajendram | Mirnalinee T T

This paper describes the work on mining the suggestions from online reviews and forums. Opinion mining detects whether the comments are positive, negative or neutral, while suggestion mining explores the review content for the possible tips or advice. The system developed by SSN-SPARKS team in SemEval-2019 for task 9 (suggestion mining) uses a rule-based approach for feature selection, SMOTE technique for data augmentation and deep learning technique (Convolutional Neural Network) for classification. We have compared the results with Random Forest classifier (RF) and MultiLayer Perceptron (MLP) model. Results show that the CNN model performs better than other models for both the subtasks.

pdf bib
Suggestion Miner at SemEval-2019 Task 9 : Suggestion Detection in Online Forum using Word GraphSemEval-2019 Task 9: Suggestion Detection in Online Forum using Word Graph
Usman Ahmed | Humera Liaquat | Luqman Ahmed | Syed Jawad Hussain

This paper describes the suggestion miner system that participates in SemEval 2019 Task 9-SubTask A-Suggestion Mining from Online Reviews and Forums. The system participated in the subtasks A. This paper discusses the results of our system in the development, evaluation and post evaluation. Each class in the dataset is represented as directed unweighted graphs. Then, the comparison is carried out with each class graph which results in a vector. This vector is used as features by a machine learning algorithm. The model is evaluated on hold on strategy. The organizers randomly split (8500 instances) training set (provided to the participant in training their system) and testing set (833 instances). The test set is reserved to evaluate the performance of participants systems. During the evaluation, our system ranked 31 in the Coda Lab result of the subtask A (binary class problem). The binary class system achieves evaluation value 0.34, precision 0.87, recall 0.73 and F measure 0.78.

pdf bib
ThisIsCompetition at SemEval-2019 Task 9 : BERT is unstable for out-of-domain samplesThisIsCompetition at SemEval-2019 Task 9: BERT is unstable for out-of-domain samples
Cheoneum Park | Juae Kim | Hyeon-gu Lee | Reinald Kim Amplayo | Harksoo Kim | Jungyun Seo | Changki Lee

This paper describes our system, Joint Encoders for Stable Suggestion Inference (JESSI), for the SemEval 2019 Task 9 : Suggestion Mining from Online Reviews and Forums. JESSI is a combination of two sentence encoders : (a) one using multiple pre-trained word embeddings learned from log-bilinear regression (GloVe) and translation (CoVe) models, and (b) one on top of word encodings from a pre-trained deep bidirectional transformer (BERT). We include a domain adversarial training module when training for out-of-domain samples. Our experiments show that while BERT performs exceptionally well for in-domain samples, several runs of the model show that it is unstable for out-of-domain samples. The problem is mitigated tremendously by (1) combining BERT with a non-BERT encoder, and (2) using an RNN-based classifier on top of BERT. Our final models obtained second place with 77.78 % F-Score on Subtask A (i.e. in-domain) and achieved an F-Score of 79.59 % on Subtask B (i.e. out-of-domain), even without using any additional external data.

pdf bib
Yimmon at SemEval-2019 Task 9 : Suggestion Mining with Hybrid Augmented ApproachesSemEval-2019 Task 9: Suggestion Mining with Hybrid Augmented Approaches
Yimeng Zhuang

Suggestion mining task aims to extract tips, advice, and recommendations from unstructured text. The task includes many challenges, such as class imbalance, figurative expressions, context dependency, and long and complex sentences. This paper gives a detailed system description of our submission in SemEval 2019 Task 9 Subtask A. We transfer Self-Attention Network (SAN), a successful model in machine reading comprehension field, into this task. Our model concentrates on modeling long-term dependency which is indispensable to parse long and complex sentences. Besides, we also adopt techniques, such as contextualized embedding, back-translation, and auxiliary loss, to augment the system. Our model achieves a performance of F1=76.3, and rank 4th among 34 participating systems. Further ablation study shows that the techniques used in our system are beneficial to the performance.

pdf bib
YNU_DYX at SemEval-2019 Task 9 : A Stacked BiLSTM for Suggestion Mining ClassificationYNU_DYX at SemEval-2019 Task 9: A Stacked BiLSTM for Suggestion Mining Classification
Yunxia Ding | Xiaobing Zhou | Xuejie Zhang

In this paper we describe a deep-learning system that competed as SemEval 2019 Task 9-SubTask A : Suggestion Mining from Online Reviews and Forums. We use Word2Vec to learn the distributed representations from sentences. This system is composed of a Stacked Bidirectional Long-Short Memory Network (SBiLSTM) for enriching word representations before and after the sequence relationship with context. We perform an ensemble to improve the effectiveness of our model. Our official submission results achieve an F1-score 0.5659.

pdf bib
Zoho at SemEval-2019 Task 9 : Semi-supervised Domain Adaptation using Tri-training for Suggestion MiningSemEval-2019 Task 9: Semi-supervised Domain Adaptation using Tri-training for Suggestion Mining
Sai Prasanna | Sri Ananda Seelan

This paper describes our submission for the SemEval-2019 Suggestion Mining task. A simple Convolutional Neural Network (CNN) classifier with contextual word representations from a pre-trained language model was used for sentence classification. The model is trained using tri-training, a semi-supervised bootstrapping mechanism for labelling unseen data. Tri-training proved to be an effective technique to accommodate domain shift for cross-domain suggestion mining (Subtask B) where there is no hand labelled training data. For in-domain evaluation (Subtask A), we use the same technique to augment the training set. Our system ranks thirteenth in Subtask A with an F1-score of 68.07 and third in Subtask B with an F1-score of 81.94.

pdf bib
UniMelb at SemEval-2019 Task 12 : Multi-model combination for toponym resolutionUniMelb at SemEval-2019 Task 12: Multi-model combination for toponym resolution
Haonan Li | Minghan Wang | Timothy Baldwin | Martin Tomko | Maria Vasardani

This paper describes our submission to SemEval-2019 Task 12 on toponym resolution over scientific articles. We train separate NER models for toponym detection over text extracted from tables vs. text from the body of the paper, and train another auxiliary model to eliminate misdetected toponyms. For toponym disambiguation, we use an SVM classifier with hand-engineered features. The best setting achieved a strict micro-F1 score of 80.92 % and overlap micro-F1 score of 86.88 % in the toponym detection subtask, ranking 2nd out of 8 teams on F1 score. For toponym disambiguation and end-to-end resolution, we officially ranked 2nd and 3rd, respectively.

pdf bib
University of Arizona at SemEval-2019 Task 12 : Deep-Affix Named Entity Recognition of Geolocation EntitiesUniversity of Arizona at SemEval-2019 Task 12: Deep-Affix Named Entity Recognition of Geolocation Entities
Vikas Yadav | Egoitz Laparra | Ti-Tai Wang | Mihai Surdeanu | Steven Bethard

We present the Named Entity Recognition (NER) and disambiguation model used by the University of Arizona team (UArizona) for the SemEval 2019 task 12. We achieved fourth place on tasks 1 and 3. We implemented a deep-affix based LSTM-CRF NER model for task 1, which utilizes only character, word, pre- fix and suffix information for the identification of geolocation entities. Despite using just the training data provided by task organizers and not using any lexicon features, we achieved 78.85 % strict micro F-score on task 1. We used the unsupervised population heuristics for task 3 and achieved 52.99 % strict micro-F1 score in this task.