Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

Kim Jin-Dong, Nédellec Claire, Bossy Robert, Deléger Louise (Editors)

Anthology ID:
Hong Kong, China
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of The 5th Workshop on BioNLP Open Shared Tasks
Kim Jin-Dong | Nédellec Claire | Bossy Robert | Deléger Louise

pdf bib
PharmaCoNER : Pharmacological Substances, Compounds and proteins Named Entity Recognition trackPharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity Recognition track
Aitor Gonzalez-Agirre | Montserrat Marimon | Ander Intxaurrondo | Obdulia Rabal | Marta Villegas | Martin Krallinger

One of the biomedical entity types of relevance for medicine or biosciences are chemical compounds and drugs. The correct detection these entities is critical for other text mining applications building on them, such as adverse drug-reaction detection, medication-related fake news or drug-target extraction. Although a significant effort was made to detect mentions of drugs / chemicals in English texts, so far only very limited attempts were made to recognize them in medical documents in other languages. Taking into account the growing amount of medical publications and clinical records written in Spanish, we have organized the first shared task on detecting drug and chemical entities in Spanish medical documents. Additionally, we included a clinical concept-indexing sub-track asking teams to return SNOMED-CT identifiers related to drugs / chemicals for a collection of documents. For this task, named PharmaCoNER, we generated annotation guidelines together with a corpus of 1,000 manually annotated clinical case studies. A total of 22 teams participated in the sub-track 1, (77 system runs), and 7 teams in the sub-track 2 (19 system runs). Top scoring teams used sophisticated deep learning approaches yielding very competitive results with F-measures above 0.91. These results indicate that there is a real interest in promoting biomedical text mining efforts beyond English. We foresee that the PharmaCoNER annotation guidelines, corpus and participant systems will foster the development of new resources for clinical and biomedical text mining systems of Spanish medical data.

pdf bib
IxaMed at PharmacoNER Challenge 2019IxaMed at PharmacoNER Challenge 2019
Xabier Lahuerta | Iakes Goenaga | Koldo Gojenola | Aitziber Atutxa Salazar | Maite Oronoz

The aim of this paper is to present our approach (IxaMed) in the PharmacoNER 2019 task. The task consists of identifying chemical, drug, and gene / protein mentions from clinical case studies written in Spanish. The evaluation of the task is divided in two scenarios : one corresponding to the detection of named entities and one corresponding to the indexation of named entities that have been previously identified. In order to identify named entities we have made use of a Bi-LSTM with a CRF on top in combination with different types of word embeddings. We have achieved our best result (86.81 F-Score) combining pretrained word embeddings of Wikipedia and Electronic Health Records (50 M words) with contextual string embeddings of Wikipedia and Electronic Health Records. On the other hand, for the indexation of the named entities we have used the Levenshtein distance obtaining a 85.34 F-Score as our best result.

pdf bib
A Deep Learning-Based System for PharmaCoNERPharmaCoNER
Ying Xiong | Yedan Shen | Yuanhang Huang | Shuai Chen | Buzhou Tang | Xiaolong Wang | Qingcai Chen | Jun Yan | Yi Zhou

The Biological Text Mining Unit at BSC and CNIO organized the first shared task on chemical & drug mention recognition from Spanish medical texts called PharmaCoNER (Pharmacological Substances, Compounds and proteins and Named Entity Recognition track) in 2019, which includes two tracks : one for NER offset and entity classification (track 1) and the other one for concept indexing (track 2). We developed a pipeline system based on deep learning methods for this shared task, specifically, a subsystem based on BERT (Bidirectional Encoder Representations from Transformers) for NER offset and entity classification and a subsystem based on Bpool (Bi-LSTM with max / mean pooling) for concept indexing. Evaluation conducted on the shared task data showed that our system achieves a micro-average F1-score of 0.9105 on track 1 and a micro-average F1-score of 0.8391 on track 2.

pdf bib
A Neural Pipeline Approach for the PharmaCoNER Shared Task using Contextual Exhaustive ModelsPharmaCoNER Shared Task using Contextual Exhaustive Models
Mohammad Golam Sohrab | Minh Thang Pham | Makoto Miwa | Hiroya Takamura

We present a neural pipeline approach that performs named entity recognition (NER) and concept indexing (CI), which links them to concept unique identifiers (CUIs) in a knowledge base, for the PharmaCoNER shared task on pharmaceutical drugs and chemical entities. We proposed a neural NER model that captures the surrounding semantic information of a given sequence by capturing the forward- and backward-context of bidirectional LSTM (Bi-LSTM) output of a target span using contextual span representation-based exhaustive approach. The NER model enumerates all possible spans as potential entity mentions and classify them into entity types or no entity with deep neural networks. For representing span, we compare several different neural network architectures and their ensembling for the NER model. We then perform dictionary matching for CI and, if there is no matching, we further compute similarity scores between a mention and CUIs using entity embeddings to assign the CUI with the highest score to the mention. We evaluate our approach on the two sub-tasks in the shared task. Among the five submitted runs, the best run for each sub-task achieved the F-score of 86.76 % on Sub-task 1 (NER) and the F-score of 79.97 % (strict) on Sub-task 2 (CI).

pdf bib
Biomedical Named Entity Recognition with Multilingual BERTBERT
Kai Hakala | Sampo Pyysalo

We present the approach of the Turku NLP group to the PharmaCoNER task on Spanish biomedical named entity recognition. We apply a CRF-based baseline approach and multilingual BERT to the task, achieving an F-score of 88 % on the development data and 87 % on the test set with BERT. Our approach reflects a straightforward application of a state-of-the-art multilingual model that is not specifically tailored to either the language nor the application domain. The source code is available at :

pdf bib
An Overview of the Active Gene Annotation Corpus and the BioNLP OST 2019 AGAC Track TasksBioNLP OST 2019 AGAC Track Tasks
Yuxing Wang | Kaiyin Zhou | Mina Gachloo | Jingbo Xia

The active gene annotation corpus (AGAC) was developed to support knowledge discovery for drug repurposing. Based on the corpus, the AGAC track of the BioNLP Open Shared Tasks 2019 was organized, to facilitate cross-disciplinary collaboration across BioNLP and Pharmacoinformatics communities, for drug repurposing. The AGAC track consists of three subtasks : 1) named entity recognition, 2) thematic relation extraction, and 3) loss of function (LOF) / gain of function (GOF) topic classification. The AGAC track was participated by five teams, of which the performance are compared and analyzed. The the results revealed a substantial room for improvement in the design of the task, which we analyzed in terms of imbalanced data, selective annotation and latent topic annotation.

pdf bib
A Multi-Task Learning Framework for Extracting Bacteria Biotope Information
Qi Zhang | Chao Liu | Ying Chi | Xuansong Xie | Xiansheng Hua

This paper presents a novel transfer multi-task learning method for Bacteria Biotope rel+ner task at BioNLP-OST 2019. To alleviate the data deficiency problem in domain-specific information extraction, we use BERT(Bidirectional Encoder Representations from Transformers) and pre-train it using mask language models and next sentence prediction on both general corpus and medical corpus like PubMed. In fine-tuning stage, we fine-tune the relation extraction layer and mention recognition layer designed by us on the top of BERT to extract mentions and relations simultaneously. The evaluation results show that our method achieves the best performance on all metrics (including slot error rate, precision and recall) in the Bacteria Biotope rel+ner subtask.

pdf bib
Using Snomed to recognize and index chemical and drug mentions.
Pilar López Úbeda | Manuel Carlos Díaz Galiano | L. Alfonso Urena Lopez | Maite Martin

In this paper we describe a new named entity extraction system. Our work proposes a system for the identification and annotation of drug names in Spanish biomedical texts based on machine learning and deep learning models. Subsequently, a standardized code using Snomed is assigned to these drugs, for this purpose, Natural Language Processing tools and techniques have been used, and a dictionary of different sources of information has been built. The results are promising, we obtain 78 % in F1 score on the first sub-track and in the second task we map with Snomed correctly 72 % of the found entities.

pdf bib
Linguistically Informed Relation Extraction and Neural Architectures for Nested Named Entity Recognition in BioNLP-OST 2019BioNLP-OST 2019
Pankaj Gupta | Usama Yaseen | Hinrich Schütze

Named Entity Recognition (NER) and Relation Extraction (RE) are essential tools in distilling knowledge from biomedical literature. This paper presents our findings from participating in BioNLP Shared Tasks 2019. We addressed Named Entity Recognition including nested entities extraction, Entity Normalization and Relation Extraction. Our proposed approach of Named Entities can be generalized to different languages and we have shown it’s effectiveness for English and Spanish text. We investigated linguistic features, hybrid loss including ranking and Conditional Random Fields (CRF), multi-task objective and token level ensembling strategy to improve NER. We employed dictionary based fuzzy and semantic search to perform Entity Normalization. Finally, our RE system employed Support Vector Machine (SVM) with linguistic features. Our NER submission (team : MIC-CIS) ranked first in BB-2019 norm+NER task with standard error rate (SER) of 0.7159 and showed competitive performance on PharmaCo NER task with F1-score of 0.8662. Our RE system ranked first in the SeeDev-binary Relation Extraction Task with F1-score of 0.3738.

pdf bib
An ensemble CNN method for biomedical entity normalizationCNN method for biomedical entity normalization
Pan Deng | Haipeng Chen | Mengyao Huang | Xiaowen Ruan | Liang Xu

Different representations of the same concept could often be seen in scientific reports and publications. Entity normalization (or entity linking) is the task to match the different representations to their standard concepts. In this paper, we present a two-step ensemble CNN method that normalizes microbiology-related entities in free text to concepts in standard dictionaries. The method is capable of linking entities when only a small microbiology-related biomedical corpus is available for training, and achieved reasonable performance in the online test of the BioNLP-OST19 shared task Bacteria Biotope.

pdf bib
BOUN-ISIK Participation : An Unsupervised Approach for the Named Entity Normalization and Relation Extraction of Bacteria BiotopesBOUN-ISIK Participation: An Unsupervised Approach for the Named Entity Normalization and Relation Extraction of Bacteria Biotopes
İlknur Karadeniz | Ömer Faruk Tuna | Arzucan Özgür

This paper presents our participation to the Bacteria Biotope Task of the BioNLP Shared Task 2019. Our participation includes two systems for the two subtasks of the Bacteria Biotope Task : the normalization of entities (BB-norm) and the identification of the relations between the entities given a biomedical text (BB-rel). For the normalization of entities, we utilized word embeddings and syntactic re-ranking. For the relation extraction task, pre-defined rules are used. Although both approaches are unsupervised, in the sense that they do not need any labeled data, they achieved promising results. Especially, for the BB-norm task, the results have shown that the proposed method performs as good as deep learning based methods, which require labeled data.

pdf bib
Integration of Deep Learning and Traditional Machine Learning for Knowledge Extraction from Biomedical Literature
Jihang Mao | Wanli Liu

In this paper, we present our participation in the Bacteria Biotope (BB) task at BioNLP-OST 2019. Our system utilizes fine-tuned language representation models and machine learning approaches based on word embedding and lexical features for entities recognition, normalization and relation extraction. It achieves the state-of-the-art performance and is among the top two systems in five of all six subtasks.

pdf bib
CRAFT Shared Tasks 2019 Overview Integrated Structure, Semantics, and CoreferenceCRAFT Shared Tasks 2019 Overview — Integrated Structure, Semantics, and Coreference
William Baumgartner | Michael Bada | Sampo Pyysalo | Manuel R. Ciosici | Negacy Hailu | Harrison Pielke-Lombardo | Michael Regan | Lawrence Hunter

As part of the BioNLP Open Shared Tasks 2019, the CRAFT Shared Tasks 2019 provides a platform to gauge the state of the art for three fundamental language processing tasks dependency parse construction, coreference resolution, and ontology concept identification over full-text biomedical articles. The structural annotation task requires the automatic generation of dependency parses for each sentence of an article given only the article text. The coreference resolution task focuses on linking coreferring base noun phrase mentions into chains using the symmetrical and transitive identity relation. The ontology concept annotation task involves the identification of concept mentions within text using the classes of ten distinct ontologies in the biomedical domain, both unmodified and augmented with extension classes. This paper provides an overview of each task, including descriptions of the data provided to participants and the evaluation metrics used, and discusses participant results relative to baseline performances for each of the three tasks.

pdf bib
UZH@CRAFT-ST : a Sequence-labeling Approach to Concept RecognitionUZH@CRAFT-ST: a Sequence-labeling Approach to Concept Recognition
Lenz Furrer | Joseph Cornelius | Fabio Rinaldi

As our submission to the CRAFT shared task 2019, we present two neural approaches to concept recognition. We propose two different systems for joint named entity recognition (NER) and normalization (NEN), both of which model the task as a sequence labeling problem. Our first system is a BiLSTM network with two separate outputs for NER and NEN trained from scratch, whereas the second system is an instance of BioBERT fine-tuned on the concept-recognition task. We exploit two strategies for extending concept coverage, ontology pretraining and backoff with a dictionary lookup. Our results show that the backoff strategy effectively tackles the problem of unseen concepts, addressing a major limitation of the chosen design. In the cross-system comparison, BioBERT proves to be a strong basis for creating a concept-recognition system, although some entity types are predicted more accurately by the BiLSTM-based system.

pdf bib
Neural Dependency Parsing of Biomedical Text : TurkuNLP entry in the CRAFT Structural Annotation TaskTurkuNLP entry in the CRAFT Structural Annotation Task
Thang Minh Ngo | Jenna Kanerva | Filip Ginter | Sampo Pyysalo

We present the approach taken by the TurkuNLP group in the CRAFT Structural Annotation task, a shared task on dependency parsing. Our approach builds primarily on the Turku neural parser, a native dependency parser that ranked among the best in the recent CoNLL tasks on parsing Universal Dependencies. To adapt the parser to the biomedical domain, we considered and evaluated a number of approaches, including the generation of custom word embeddings, combination with other in-domain resources, and the incorporation of information from named entity recognition. We achieved a labeled attachment score of 89.7 %, the best result among task participants.