Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

Graciela Gonzalez-Hernandez, Davy Weissenbacher, Abeed Sarker, Michael Paul (Editors)

Anthology ID:
Brussels, Belgium
Association for Computational Linguistics
Football and Beer - a Social Media Analysis on Twitter in Context of the FIFA Football World Cup 2018
Roland Roller | Philippe Thomas | Sven Schmeier

In many societies alcohol is a legal and common recreational substance and socially accepted. Alcohol consumption often comes along with social events as it helps people to increase their sociability and to overcome their inhibitions. On the other hand we know that increased alcohol consumption can lead to serious health issues, such as cancer, cardiovascular diseases and diseases of the digestive system, to mention a few. This work examines alcohol consumption during the FIFA Football World Cup 2018, particularly the usage of alcohol related information on Twitter. For this we analyse the tweeting behaviour and show that the tournament strongly increases the interest in beer. Furthermore we show that countries who had to leave the tournament at early stage might have done something good to their fans as the interest in beer decreased again.

Stance-Taking in Topics Extracted from Vaccine-Related Tweets and Discussion Forum Posts
Maria Skeppstedt | Manfred Stede | Andreas Kerren

The occurrence of stance-taking towards vaccination was measured in documents extracted by topic modelling from two different corpora, one discussion forum corpus and one tweet corpus. For some of the topics extracted, their most closely associated documents contained a proportion of vaccine stance-taking texts that exceeded the corpus average by a large margin. These extracted document sets would, therefore, form a useful resource in a process for computer-assisted analysis of argumentation on the subject of vaccination.

Overview of the Third Social Media Mining for Health (SMM4H) Shared Tasks at EMNLP 2018SMM4H) Shared Tasks at EMNLP 2018
Davy Weissenbacher | Abeed Sarker | Michael J. Paul | Graciela Gonzalez-Hernandez

The goals of the SMM4H shared tasks are to release annotated social media based health related datasets to the research community, and to compare the performances of natural language processing and machine learning systems on tasks involving these datasets. The third execution of the SMM4H shared tasks, co-hosted with EMNLP-2018, comprised of four subtasks. These subtasks involve annotated user posts from Twitter (tweets) and focus on the (i) automatic classification of tweets mentioning a drug name, (ii) automatic classification of tweets containing reports of first-person medication intake, (iii) automatic classification of tweets presenting self-reports of adverse drug reaction (ADR) detection, and (iv) automatic classification of vaccine behavior mentions in tweets. A total of 14 teams participated and 78 system runs were submitted (23 for task 1, 20 for task 2, 18 for task 3, 17 for task 4).

Detecting Tweets Mentioning Drug Name and Adverse Drug Reaction with Hierarchical Tweet Representation and Multi-Head Self-Attention
Chuhan Wu | Fangzhao Wu | Junxin Liu | Sixing Wu | Yongfeng Huang | Xing Xie

This paper describes our system for the first and third shared tasks of the third Social Media Mining for Health Applications (SMM4H) workshop, which aims to detect the tweets mentioning drug names and adverse drug reactions. In our system we propose a neural approach with hierarchical tweet representation and multi-head self-attention (HTR-MSA) for both tasks. Our system achieved the first place in both the first and third shared tasks of SMM4H with an F-score of 91.83 % and 52.20 % respectively.

Shot Or Not : Comparison of NLP Approaches for Vaccination Behaviour DetectionNLP Approaches for Vaccination Behaviour Detection
Aditya Joshi | Xiang Dai | Sarvnaz Karimi | Ross Sparks | Cécile Paris | C Raina MacIntyre

Vaccination behaviour detection deals with predicting whether or not a person received / was about to receive a vaccine. We present our submission for vaccination behaviour detection shared task at the SMM4H workshop. Our findings are based on three prevalent text classification approaches : rule-based, statistical and deep learning-based. Our final submissions are : (1) an ensemble of statistical classifiers with task-specific features derived using lexicons, language processing tools and word embeddings ; and, (2) a LSTM classifier with pre-trained language models.

IRISA at SMM4H 2018 : Neural Network and Bagging for Tweet ClassificationIRISA at SMM4H 2018: Neural Network and Bagging for Tweet Classification
Anne-Lyse Minard | Christian Raymond | Vincent Claveau

This paper describes the systems developed by IRISA to participate to the four tasks of the SMM4H 2018 challenge. For these tweet classification tasks, we adopt a common approach based on recurrent neural networks (BiLSTM). Our main contributions are the use of certain features, the use of Bagging in order to deal with unbalanced datasets, and on the automatic selection of difficult examples. These techniques allow us to reach 91.4, 46.5, 47.8, 85.0 as F1-scores for Tasks 1 to 4.

Drug-Use Identification from Tweets with Word and Character N-Grams
Çağrı Çöltekin | Taraka Rama

This paper describes our systems in social media mining for health applications (SMM4H) shared task. We participated in all four tracks of the shared task using linear models with a combination of character and word n-gram features. We did not use any external data or domain specific information. The resulting systems achieved above-average scores among other participating systems, with F1-scores of 91.22, 46.8, 42.4, and 85.53 on tasks 1, 2, 3, and 4 respectively.

Deep Learning for Social Media Health Text Classification
Santosh Tokala | Vaibhav Gambhir | Animesh Mukherjee

This paper describes the systems developed for 1st and 2nd tasks of the 3rd Social Media Mining for Health Applications Shared Task at EMNLP 2018. The first task focuses on automatic detection of posts mentioning a drug name or dietary supplement, a binary classification. The second task is about distinguishing the tweets that present personal medication intake, possible medication intake and non-intake. We performed extensive experiments with various classifiers like Logistic Regression, Random Forest, SVMs, Gradient Boosted Decision Trees (GBDT) and deep learning architectures such as Long Short-Term Memory Networks (LSTM), jointed Convolutional Neural Networks (CNN) and LSTM architecture, and attention based LSTM architecture both at word and character level. We have also explored using various pre-trained embeddings like Global Vectors for Word Representation (GloVe), Word2Vec and task-specific embeddings learned using CNN-LSTM and LSTMs.

Leveraging Web Based Evidence Gathering for Drug Information Identification from Tweets
Rupsa Saha | Abir Naskar | Tirthankar Dasgupta | Lipika Dey

In this paper, we have explored web-based evidence gathering and different linguistic features to automatically extract drug names from tweets and further classify such tweets into Adverse Drug Events or not. We have evaluated our proposed models with the dataset as released by the SMM4H workshop shared Task-1 and Task-3 respectively. Our evaluation results shows that the proposed model achieved good results, with Precision, Recall and F-scores of 78.5 %, 88 % and 82.9 % respectively for Task1 and 33.2 %, 54.7 % and 41.3 % for Task3.

CLaC at SMM4H Task 1, 2, and 4CLaC at SMM4H Task 1, 2, and 4
Parsa Bagherzadeh | Nadia Sheikh | Sabine Bergler

CLaC Labs participated in Tasks 1, 2, and 4 using the same base architecture for all tasks with various parameter variations. This was our first exploration of this data and the SMM4H Tasks, thus a unified system was useful to compare the behavior of our architecture over the different datasets and how they interact with different linguistic features.