Proceedings of the 3rd Clinical Natural Language Processing Workshop

Anna Rumshisky, Kirk Roberts, Steven Bethard, Tristan Naumann (Editors)

Anthology ID:
ClinicalNLP | EMNLP
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the 3rd Clinical Natural Language Processing Workshop
Anna Rumshisky | Kirk Roberts | Steven Bethard | Tristan Naumann

pdf bib
Various Approaches for Predicting Stroke Prognosis using Magnetic Resonance Imaging Text Records
Tak-Sung Heo | Chulho Kim | Jeong-Myeong Choi | Yeong-Seok Jeong | Yu-Seop Kim

Stroke is one of the leading causes of death and disability worldwide. Stroke is treatable, but it is prone to disability after treatment and must be prevented. To grasp the degree of disability caused by stroke, we use magnetic resonance imaging text records to predict stroke and measure the performance according to the document-level and sentence-level representation. As a result of the experiment, the document-level representation shows better performance.

pdf bib
BERT-XML : Large Scale Automated ICD Coding Using BERT PretrainingBERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining
Zachariah Zhang | Jingshu Liu | Narges Razavian

ICD coding is the task of classifying and cod-ing all diagnoses, symptoms and proceduresassociated with a patient’s visit. The process isoften manual, extremely time-consuming andexpensive for hospitals as clinical interactionsare usually recorded in free text medical notes. In this paper, we propose a machine learningmodel, BERT-XML, for large scale automatedICD coding of EHR notes, utilizing recentlydeveloped unsupervised pretraining that haveachieved state of the art performance on a va-riety of NLP tasks. We train a BERT modelfrom scratch on EHR notes, learning with vo-cabulary better suited for EHR tasks and thusoutperform off-the-shelf models. We furtheradapt the BERT architecture for ICD codingwith multi-label attention. We demonstratethe effectiveness of BERT-based models on thelarge scale ICD code classification task usingmillions of EHR notes to predict thousands ofunique codes.

pdf bib
Incorporating Risk Factor Embeddings in Pre-trained Transformers Improves Sentiment Prediction in Psychiatric Discharge Summaries
Xiyu Ding | Mei-Hua Hall | Timothy Miller

Reducing rates of early hospital readmission has been recognized and identified as a key to improve quality of care and reduce costs. There are a number of risk factors that have been hypothesized to be important for understanding re-admission risk, including such factors as problems with substance abuse, ability to maintain work, relations with family. In this work, we develop Roberta-based models to predict the sentiment of sentences describing readmission risk factors in discharge summaries of patients with psychosis. We improve substantially on previous results by a scheme that shares information across risk factors while also allowing the model to learn risk factor-specific information.

pdf bib
BioBERTpt-A Portuguese Neural Language Model for Clinical Named Entity RecognitionBioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition
Elisa Terumi Rubel Schneider | João Vitor Andrioli de Souza | Julien Knafou | Lucas Emanuel Silva e Oliveira | Jenny Copara | Yohan Bonescki Gumiel | Lucas Ferro Antunes de Oliveira | Emerson Cabrera Paraiso | Douglas Teodoro | Cláudia Maria Cabral Moro Barra

With the growing number of electronic health record data, clinical NLP tasks have become increasingly relevant to unlock valuable information from unstructured clinical text. Although the performance of downstream NLP tasks, such as named-entity recognition (NER), in English corpus has recently improved by contextualised language models, less research is available for clinical texts in low resource languages. Our goal is to assess a deep contextual embedding model for Portuguese, so called BioBERTpt, to support clinical and biomedical NER. We transfer learned information encoded in a multilingual-BERT model to a corpora of clinical narratives and biomedical-scientific papers in Brazilian Portuguese. To evaluate the performance of BioBERTpt, we ran NER experiments on two annotated corpora containing clinical narratives and compared the results with existing BERT models. Our in-domain model outperformed the baseline model in F1-score by 2.72 %, achieving higher performance in 11 out of 13 assessed entities. We demonstrate that enriching contextual embedding models with domain literature can play an important role in improving performance for specific NLP tasks. The transfer learning process enhanced the Portuguese biomedical NER model by reducing the necessity of labeled data and the demand for retraining a whole new model.

pdf bib
How You Ask Matters : The Effect of Paraphrastic Questions to BERT Performance on a Clinical SQuAD DatasetBERT Performance on a Clinical SQuAD Dataset
Sungrim (Riea) Moon | Jungwei Fan

Reading comprehension style question-answering (QA) based on patient-specific documents represents a growing area in clinical NLP with plentiful applications. Bidirectional Encoder Representations from Transformers (BERT) and its derivatives lead the state-of-the-art accuracy on the task, but most evaluation has treated the data as a pre-mixture without systematically looking into the potential effect of imperfect train / test questions. The current study seeks to address this gap by experimenting with full versus partial train / test data consisting of paraphrastic questions. Our key findings include 1) training with all pooled question variants yielded best accuracy, 2) the accuracy varied widely, from 0.74 to 0.80, when trained with each single question variant, and 3) questions of similar lexical / syntactic structure tended to induce identical answers. The results suggest that how you ask questions matters in BERT-based QA, especially at the training stage.

pdf bib
MeDAL : Medical Abbreviation Disambiguation Dataset for Natural Language Understanding PretrainingMeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining
Zhi Wen | Xing Han Lu | Siva Reddy

One of the biggest challenges that prohibit the use of many current NLP methods in clinical settings is the availability of public datasets. In this work, we present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.

pdf bib
Extracting Relations between Radiotherapy Treatment Details
Danielle Bitterman | Timothy Miller | David Harris | Chen Lin | Sean Finan | Jeremy Warner | Raymond Mak | Guergana Savova

We present work on extraction of radiotherapy treatment information from the clinical narrative in the electronic medical records. Radiotherapy is a central component of the treatment of most solid cancers. Its details are described in non-standardized fashions using jargon not found in other medical specialties, complicating the already difficult task of manual data extraction. We examine the performance of several state-of-the-art neural methods for relation extraction of radiotherapy treatment details, with a goal of automating detailed information extraction. The neural systems perform at 0.82-0.88 macro-average F1, which approximates or in some cases exceeds the inter-annotator agreement. To the best of our knowledge, this is the first effort to develop models for radiotherapy relation extraction and one of the few efforts for relation extraction to describe cancer treatment in general.

pdf bib
Cancer Registry Information Extraction via Transfer Learning
Yan-Jie Lin | Hong-Jie Dai | You-Chen Zhang | Chung-Yang Wu | Yu-Cheng Chang | Pin-Jou Lu | Chih-Jen Huang | Yu-Tsang Wang | Hui-Min Hsieh | Kun-San Chao | Tsang-Wu Liu | I-Shou Chang | Yi-Hsin Connie Yang | Ti-Hao Wang | Ko-Jiunn Liu | Li-Tzong Chen | Sheau-Fang Yang

A cancer registry is a critical and massive database for which various types of domain knowledge are needed and whose maintenance requires labor-intensive data curation. In order to facilitate the curation process for building a high-quality and integrated cancer registry database, we compiled a cross-hospital corpus and applied neural network methods to develop a natural language processing system for extracting cancer registry variables buried in unstructured pathology reports. The performance of the developed networks was compared with various baselines using standard micro-precision, recall and F-measure. Furthermore, we conducted experiments to study the feasibility of applying transfer learning to rapidly develop a well-performing system for processing reports from different sources that might be presented in different writing styles and formats. The results demonstrate that the transfer learning method enables us to develop a satisfactory system for a new hospital with only a few annotations and suggest more opportunities to reduce the burden of cancer registry curation.

pdf bib
Where’s the Question? A Multi-channel Deep Convolutional Neural Network for Question Identification in Textual Data
George Michalopoulos | Helen Chen | Alexander Wong

In most clinical practice settings, there is no rigorous reviewing of the clinical documentation, resulting in inaccurate information captured in the patient medical records. The gold standard in clinical data capturing is achieved via expert-review, where clinicians can have a dialogue with a domain expert (reviewers) and ask them questions about data entry rules. Automatically identifying real questions in these dialogues could uncover ambiguities or common problems in data capturing in a given clinical setting. In this study, we proposed a novel multi-channel deep convolutional neural network architecture, namely Quest-CNN, for the purpose of separating real questions that expect an answer (information or help) about an issue from sentences that are not questions, as well as from questions referring to an issue mentioned in a nearby sentence (e.g., can you clarify this?), which we will refer as c-questions. We conducted a comprehensive performance comparison analysis of the proposed multi-channel deep convolutional neural network against other deep neural networks. Furthermore, we evaluated the performance of traditional rule-based and learning-based methods for detecting question sentences. The proposed Quest-CNN achieved the best F1 score both on a dataset of data entry-review dialogue in a dialysis care setting, and on a general domain dataset.

pdf bib
An Ensemble Approach for Automatic Structuring of Radiology Reports
Morteza Pourreza Shahri | Amir Tahmasebi | Bingyang Ye | Henghui Zhu | Javed Aslam | Timothy Ferris

Automatic structuring of electronic medical records is of high demand for clinical workflow solutions to facilitate extraction, storage, and querying of patient care information. However, developing a scalable solution is extremely challenging, specifically for radiology reports, as most healthcare institutes use either no template or department / institute specific templates. Moreover, radiologists’ reporting style varies from one to another as sentences are written in a telegraphic format and do not follow general English grammar rules. In this work, we present an ensemble method that consolidates the predictions of three models, capturing various attributes of textual information for automatic labeling of sentences with section labels. These three models are : 1) Focus Sentence model, capturing context of the target sentence ; 2) Surrounding Context model, capturing the neighboring context of the target sentence ; and finally, 3) Formatting / Layout model, aimed at learning report formatting cues. We utilize Bi-directional LSTMs, followed by sentence encoders, to acquire the context. Furthermore, we define several features that incorporate the structure of reports. We compare our proposed approach against multiple baselines and state-of-the-art approaches on a proprietary dataset as well as 100 manually annotated radiology notes from the MIMIC-III dataset, which we are making publicly available. Our proposed approach significantly outperforms other approaches by achieving 97.1 % accuracy.

pdf bib
Advancing Seq2seq with Joint Paraphrase Learning
So Yeon Min | Preethi Raghavan | Peter Szolovits

We address the problem of model generalization for sequence to sequence (seq2seq) architectures. We propose going beyond data augmentation via paraphrase-optimized multi-task learning and observe that it is useful in correctly handling unseen sentential paraphrases as inputs. Our models greatly outperform SOTA seq2seq models for semantic parsing on diverse domains (Overnight-up to 3.2 % and emrQA-7 %) and Nematus, the winning solution for WMT 2017, for Czech to English translation (CzENG 1.6-1.5 BLEU).

pdf bib
On the diminishing return of labeling clinical reports
Jean-Baptiste Lamare | Oloruntobiloba Olatunji | Li Yao

Ample evidence suggests that better machine learning models may be steadily obtained by training on increasingly larger datasets on natural language processing (NLP) problems from non-medical domains. Whether the same holds true for medical NLP has by far not been thoroughly investigated. This work shows that this is indeed not always the case. We reveal the somehow counter-intuitive observation that performant medical NLP models may be obtained with small amount of labeled data, quite the opposite to the common belief, most likely due to the domain specificity of the problem. We show quantitatively the effect of training data size on a fixed test set composed of two of the largest public chest x-ray radiology report datasets on the task of abnormality classification. The trained models not only make use of the training data efficiently, but also outperform the current state-of-the-art rule-based systems by a significant margin.

pdf bib
The Chilean Waiting List Corpus : a new resource for clinical Named Entity Recognition in SpanishChilean Waiting List Corpus: a new resource for clinical Named Entity Recognition in Spanish
Pablo Báez | Fabián Villena | Matías Rojas | Manuel Durán | Jocelyn Dunstan

In this work we describe the Waiting List Corpus consisting of de-identified referrals for several specialty consultations from the waiting list in Chilean public hospitals. A subset of 900 referrals was manually annotated with 9,029 entities, 385 attributes, and 284 pairs of relations with clinical relevance. A trained medical doctor annotated these referrals, and then together with other three researchers, consolidated each of the annotations. The annotated corpus has nested entities, with 32.2 % of entities embedded in other entities. We use this annotated corpus to obtain preliminary results for Named Entity Recognition (NER). The best results were achieved by using a biLSTM-CRF architecture using word embeddings trained over Spanish Wikipedia together with clinical embeddings computed by the group. NER models applied to this corpus can leverage statistics of diseases and pending procedures within this waiting list. This work constitutes the first annotated corpus using clinical narratives from Chile, and one of the few for the Spanish language. The annotated corpus, the clinical word embeddings, and the annotation guidelines are freely released to the research community.

pdf bib
Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLPNLP
John Chen | Ian Berlot-Attwell | Xindi Wang | Safwan Hossain | Frank Rudzicz

Clinical machine learning is increasingly multimodal, collected in both structured tabular formats and unstructured forms such as free text. We propose a novel task of exploring fairness on a multimodal clinical dataset, adopting equalized odds for the downstream medical prediction tasks. To this end, we investigate a modality-agnostic fairness algorithm-equalized odds post processing-and compare it to a text-specific fairness algorithm : debiased clinical word embeddings. Despite the fact that debiased word embeddings do not explicitly address equalized odds of protected groups, we show that a text-specific approach to fairness may simultaneously achieve a good balance of performance classical notions of fairness. Our work opens the door for future work at the critical intersection of clinical NLP and fairness.fairness on a multimodal clinical dataset, adopting equalized odds for the downstream medical prediction tasks. To this end, we investigate a modality-agnostic fairness algorithm - equalized odds post processing - and compare it to a text-specific fairness algorithm: debiased clinical word embeddings. Despite the fact that debiased word embeddings do not explicitly address equalized odds of protected groups, we show that a text-specific approach to fairness may simultaneously achieve a good balance of performance classical notions of fairness. Our work opens the door for future work at the critical intersection of clinical NLP and fairness.