4.7 Article

A span-based joint model for extracting entities and relations of bacteria biotopes

Journal

BIOINFORMATICS
Volume 38, Issue 1, Pages 220-227

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab593

Keywords

-

Funding

  1. Natural Science Foundation of Shenzhen City [JCYJ20180306172131515]

Ask authors/readers for more resources

Motivation: Information about bacteria biotopes (BB) is crucial for microbiological research and applications. The BB task at BioNLP-OST 2019 focuses on extracting microorganism locations and phenotypes from biomedical texts. Our span-based model, utilizing a pre-trained BERT model, achieves significantly better performance in entity and relation extraction tasks for BBs compared to previous methods, showing a reduction of 21.6% in slot error rate (SER). The model also shows effectiveness in recognizing nested entities and can be applied to other related tasks with state-of-the-art performance.
Motivation: Information about bacteria biotopes (BB) is important for fundamental research and applications in microbiology. BB task at BioNLP-OST 2019 focuses on the extraction of locations and phenotypes of microorganisms from PubMed abstracts and full-text excerpts. The subtask BB-rel+ner aims to recognize relevant entities and extract interrelationships about BBs. The corresponding corpus owns some distinctive features (e.g. nested entities) which are challenging to deal with. Therefore, previous methods achieved low performance on entity and relation extraction and limited the mutual effect between named entity recognition and relation extraction. There is still much room for improvement. Results: We propose a span-based model to extract entities and relations jointly from biomedical text regarding the BBs. For alleviating the problem of annotated data deficiency in domain-specific task, we employ a BERT (Bidirectional Encoder Representations from Transformers) model pre-trained on the domain-specific corpus to encode sentences. Our model considers all spans in a sentence as potential entity mentions and computes relation scores between the most confident entity spans based on representations of spans and contexts between spans. Experiments on the BB-rel+ner 2019 corpus demonstrate that our model achieves significantly better performance than the state-of-the-art method, with a reduction of 21.6% slot error rate (SER) for extracting relations. Our model is also effective in recognizing nested entities. Furthermore, the model can be applied to the CHEMPROT corpus for joint extraction of chemical-protein entities and relations, achieving state-of-the-art performance.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available