Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

Arjun Magge, Ari Klein, Antonio Miranda-Escalada, Mohammed Ali Al-garadi, Ilseyar Alimova, Zulfat Miftahutdinov, Eulalia Farre-Maduell, Salvador Lima Lopez, Ivan Flores, Karen O'Connor, Davy Weissenbacher, Elena Tutubalina, Abeed Sarker, Juan M Banda, Martin Krallinger, Graciela Gonzalez-Hernandez (Editors)

Anthology ID:
Mexico City, Mexico
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
Arjun Magge | Ari Klein | Antonio Miranda-Escalada | Mohammed Ali Al-garadi | Ilseyar Alimova | Zulfat Miftahutdinov | Eulalia Farre-Maduell | Salvador Lima Lopez | Ivan Flores | Karen O'Connor | Davy Weissenbacher | Elena Tutubalina | Abeed Sarker | Juan M Banda | Martin Krallinger | Graciela Gonzalez-Hernandez

pdf bib
View Distillation with Unlabeled Data for Extracting Adverse Drug Effects from User-Generated Data
Payam Karisani | Jinho D. Choi | Li Xiong

We present an algorithm based on multi-layer transformers for identifying Adverse Drug Reactions (ADR) in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings to extract two views from documents. Then a classifier is trained on each view to label a set of unlabeled documents to be used as an initializer for a new classifier in the other view. Finally, the initialized classifier in each view is further trained using the initial training examples. We evaluated our model in the largest publicly available ADR dataset. The experiments testify that our model significantly outperforms the transformer-based models pretrained on domain-specific data.

pdf bib
The ProfNER shared task on automatic recognition of occupation mentions in social media : systems, evaluation, guidelines, embeddings and corporaProfNER shared task on automatic recognition of occupation mentions in social media: systems, evaluation, guidelines, embeddings and corpora
Antonio Miranda-Escalada | Eulàlia Farré-Maduell | Salvador Lima-López | Luis Gascó | Vicent Briva-Iglesias | Marvin Agüero-Torales | Martin Krallinger

Detection of occupations in texts is relevant for a range of important application scenarios, like competitive intelligence, sociodemographic analysis, legal NLP or health-related occupational data mining. Despite the importance and heterogeneous data types that mention occupations, text mining efforts to recognize them have been limited. This is due to the lack of clear annotation guidelines and high-quality Gold Standard corpora. Social media data can be regarded as a relevant source of information for real-time monitoring of at-risk occupational groups in the context of pandemics like the COVID-19 one, facilitating intervention strategies for occupations in direct contact with infectious agents or affected by mental health issues. To evaluate current NLP methods and to generate resources, we have organized the ProfNER track at SMM4H 2021, providing ProfNER participants with a Gold Standard corpus of manually annotated tweets (human IAA of 0.919) following annotation guidelines available in Spanish and English, an occupation gazetteer, a machine-translated version of tweets, and FastText embeddings. Out of 35 registered teams, 11 submitted a total of 27 runs. Best-performing participants built systems based on recent NLP technologies (e.g. transformers) and achieved 0.93 F-score in Text Classification and 0.839 in Named Entity Recognition. Corpus :

pdf bib
Transformer-based Multi-Task Learning for Adverse Effect Mention Analysis in Tweets
George-Andrei Dima | Dumitru-Clementin Cercel | Mihai Dascalu

This paper presents our contribution to the Social Media Mining for Health Applications Shared Task 2021. We addressed all the three subtasks of Task 1 : Subtask A (classification of tweets containing adverse effects), Subtask B (extraction of text spans containing adverse effects) and Subtask C (adverse effects resolution). We explored various pre-trained transformer-based language models and we focused on a multi-task training architecture. For the first subtask, we also applied adversarial augmentation techniques and we formed model ensembles in order to improve the robustness of the prediction. Our system ranked first at Subtask B with 0.51 F1 score, 0.514 precision and 0.514 recall. For Subtask A we obtained 0.44 F1 score, 0.49 precision and 0.39 recall and for Subtask C we obtained 0.16 F1 score with 0.16 precision and 0.17 recall.

pdf bib
UACH-INAOE at SMM4H : a BERT based approach for classification of COVID-19 Twitter postsUACH-INAOE at SMM4H: a BERT based approach for classification of COVID-19 Twitter posts
Alberto Valdes | Jesus Lopez | Manuel Montes

This work describes the participation of the Universidad Autnoma de Chihuahua-Instituto Nacional de Astrofsica, ptica y Electrnica team at the Social Media Mining for Health Applications (SMM4H) 2021 shared task. Our team participated in task 5 and 6, both focused on the automatic classification of Twitter posts related to COVID-19. Task 5 was oriented on solving a binary classification problem, trying to identify self-reporting tweets of potential cases of COVID-19. Task 6 objective was to classify tweets containing COVID-19 symptoms. For both tasks we used models based on bidirectional encoder representations from transformers (BERT). Our objective was to determine if a model pretrained on a corpus in the domain of interest can outperform one trained on a much larger general domain corpus. Our F1 results were encouraging, 0.77 and 0.95 for task 5 and 6 respectively, having achieved the highest score among all the participants in the latter.

pdf bib
Word Embeddings, Cosine Similarity and Deep Learning for Identification of Professions & Occupations in Health-related Social Media
Sergio Santamaría Carrasco | Roberto Cuervo Rosillo

ProfNER-ST focuses on the recognition of professions and occupations from Twitter using Spanish data. Our participation is based on a combination of word-level embeddings, including pre-trained Spanish BERT, as well as cosine similarity computed over a subset of entities that serve as input for an encoder-decoder architecture with attention mechanism. Finally, our best score achieved an F1-measure of 0.823 in the official test set.

pdf bib
A Joint Training Approach to Tweet Classification and Adverse Effect Extraction and Normalization for SMM4H 2021SMM4H 2021
Mohab Elkaref | Lamiece Hassan

In this work we describe our submissions to the Social Media Mining for Health (SMM4H) 2021 Shared Task. We investigated the effectiveness of a joint training approach to Task 1, specifically classification, extraction and normalization of Adverse Drug Effect (ADE) mentions in English tweets. Our approach performed well on the normalization task, achieving an above average f1 score of 24 %, but less so on classification and extraction, with f1 scores of 22 % and 37 % respectively. Our experiments also showed that a larger dataset with more negative results led to stronger results than a smaller more balanced dataset, even when both datasets have the same positive examples. Finally we also submitted a tuned BERT model for Task 6 : Classification of Covid-19 tweets containing symptoms, which achieved an above average f1 score of 96 %.

pdf bib
Identification of profession & occupation in Health-related Social Media using tweets in SpanishSpanish
Victoria Pachón | Jacinto Mata Vázquez | Juan Luís Domínguez Olmedo

In this paper we present our approach and system description on Task 7a in ProfNer-ST : Identification of profession & occupation in Health related Social Media. Our main contribution is to show the effectiveness of using BETO-Spanish BERT as a model based on transformers pretrained with a Spanish Corpus for classification tasks. In our experiments we compared several architectures based on transformers with others based on classical machine learning algorithms. With this approach, we achieved an F1-score of 0.92 in the evaluation process.

pdf bib
UoB at ProfNER 2021 : Data Augmentation for Classification Using Machine TranslationUoB at ProfNER 2021: Data Augmentation for Classification Using Machine Translation
Frances Adriana Laureano De Leon | Harish Tayyar Madabushi | Mark Lee

This paper describes the participation of the UoB-NLP team in the ProfNER-ST shared subtask 7a. The task was aimed at detecting the mention of professions in social media text. Our team experimented with two methods of improving the performance of pre-trained models : Specifically, we experimented with data augmentation through translation and the merging of multiple language inputs to meet the objective of the task. While the best performing model on the test data consisted of mBERT fine-tuned on augmented data using back-translation, the improvement is minor possibly because multi-lingual pre-trained models such as mBERT already have access to the kind of information provided through back-translation and bilingual data.

pdf bib
PAII-NLP at SMM4H 2021 : Joint Extraction and Normalization of Adverse Drug Effect Mentions in TweetsPAII-NLP at SMM4H 2021: Joint Extraction and Normalization of Adverse Drug Effect Mentions in Tweets
Zongcheng Ji | Tian Xia | Mei Han

This paper describes our system developed for the subtask 1c of the sixth Social Media Mining for Health Applications (SMM4H) shared task in 2021. The aim of the subtask is to recognize the adverse drug effect (ADE) mentions from tweets and normalize the identified mentions to their mapping MedDRA preferred term IDs. Our system is based on a neural transition-based joint model, which is to perform recognition and normalization simultaneously. Our final two submissions outperform the average F1 score by 1-2 %.