Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

Eben Holderness, Antonio Jimeno Yepes, Alberto Lavelli, Anne-Lyse Minard, James Pustejovsky, Fabio Rinaldi (Editors)

Anthology ID:
EMNLP | Louhi
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis
Eben Holderness | Antonio Jimeno Yepes | Alberto Lavelli | Anne-Lyse Minard | James Pustejovsky | Fabio Rinaldi

pdf bib
Simple Hierarchical Multi-Task Neural End-To-End Entity Linking for Biomedical Text
Maciej Wiatrak | Juha Iso-Sipila

Recognising and linking entities is a crucial first step to many tasks in biomedical text analysis, such as relation extraction and target identification. Traditionally, biomedical entity linking methods rely heavily on heuristic rules and predefined, often domain-specific features. The features try to capture the properties of entities and complex multi-step architectures to detect, and subsequently link entity mentions. We propose a significant simplification to the biomedical entity linking setup that does not rely on any heuristic methods. The system performs all the steps of the entity linking task jointly in either single or two stages. We explore the use of hierarchical multi-task learning, using mention recognition and entity typing tasks as auxiliary tasks. We show that hierarchical multi-task models consistently outperform single-task models when trained tasks are homogeneous. We evaluate the performance of our models on the biomedical entity linking benchmarks using MedMentions and BC5CDR datasets. We achieve state-of-theart results on the challenging MedMentions dataset, and comparable results on BC5CDR.

pdf bib
Evaluation of Machine Translation Methods applied to Medical Terminologies
Konstantinos Skianis | Yann Briand | Florent Desgrippes

Medical terminologies resources and standards play vital roles in clinical data exchanges, enabling significantly the services’ interoperability within healthcare national information networks. Health and medical science are constantly evolving causing requirements to advance the terminologies editions. In this paper, we present our evaluation work of the latest machine translation techniques addressing medical terminologies. Experiments have been conducted leveraging selected statistical and neural machine translation methods. The devised procedure is tested on a validated sample of ICD-11 and ICF terminologies from English to French with promising results.

pdf bib
Defining and Learning Refined Temporal Relations in the Clinical Narrative
Kristin Wright-Bettner | Chen Lin | Timothy Miller | Steven Bethard | Dmitriy Dligach | Martha Palmer | James H. Martin | Guergana Savova

We present refinements over existing temporal relation annotations in the Electronic Medical Record clinical narrative. We refined the THYME corpus annotations to more faithfully represent nuanced temporality and nuanced temporal-coreferential relations. The main contributions are in re-defining CONTAINS and OVERLAP relations into CONTAINS, CONTAINS-SUBEVENT, OVERLAP and NOTED-ON. We demonstrate that these refinements lead to substantial gains in learnability for state-of-the-art transformer models as compared to previously reported results on the original THYME corpus. We thus establish a baseline for the automatic extraction of these refined temporal relations. Although our study is done on clinical narrative, we believe it addresses far-reaching challenges that are corpus- and domain- agnostic.

pdf bib
Context-Aware Automatic Text Simplification of Health Materials in Low-Resource Domains
Tarek Sakakini | Jong Yoon Lee | Aditya Duri | Renato F.L. Azevedo | Victor Sadauskas | Kuangxiao Gu | Suma Bhat | Dan Morrow | James Graumlich | Saqib Walayat | Mark Hasegawa-Johnson | Thomas Huang | Ann Willemsen-Dunlap | Donald Halpin

Healthcare systems have increased patients’ exposure to their own health materials to enhance patients’ health levels, but this has been impeded by patients’ lack of understanding of their health material. We address potential barriers to their comprehension by developing a context-aware text simplification system for health material. Given the scarcity of annotated parallel corpora in healthcare domains, we design our system to be independent of a parallel corpus, complementing the availability of data-driven neural methods when such corpora are available. Our system compensates for the lack of direct supervision using a biomedical lexical database : Unified Medical Language System (UMLS). Compared to a competitive prior approach that uses a tool for identifying biomedical concepts and a consumer-directed vocabulary list, we empirically show the enhanced accuracy of our system due to improved handling of ambiguous terms. We also show the enhanced accuracy of our system over directly-supervised neural methods in this low-resource setting. Finally, we show the direct impact of our system on laypeople’s comprehension of health material via a human subjects’ study (n=160).

pdf bib
Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its UpdatingRoBERTa Language Model and Its Updating
Minghao Zhu | Youzhe Song | Ge Jin | Keyuan Jiang

Post-market surveillance, the practice of monitoring the safe use of pharmaceutical drugs is an important part of pharmacovigilance. Being able to collect personal experience related to pharmaceutical product use could help us gain insight into how the human body reacts to different medications. Twitter, a popular social media service, is being considered as an important alternative data source for collecting personal experience information with medications. Identifying personal experience tweets is a challenging classification task in natural language processing. In this study, we utilized three methods based on Facebook’s Robustly Optimized BERT Pretraining Approach (RoBERTa) to predict personal experience tweets related to medication use : the first one combines the pre-trained RoBERTa model with a classifier, the second combines the updated pre-trained RoBERTa model using a corpus of unlabeled tweets with a classifier, and the third combines the RoBERTa model that was trained with our unlabeled tweets from scratch with the classifier too. Our results show that all of these approaches outperform the published methods (Word Embedding + LSTM) in classification performance (p 0.05), and updating the pre-trained language model with tweets related to medications could even improve the performance further.