☆ 4.7 Article

A span-based joint model for extracting entities and relations of bacteria biotopes

BIOINFORMATICS (2022)

Journal

BIOINFORMATICS

Volume 38, Issue 1, Pages 220-227

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btab593

Keywords

Funding

Natural Science Foundation of Shenzhen City [JCYJ20180306172131515]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Motivation: Information about bacteria biotopes (BB) is crucial for microbiological research and applications. The BB task at BioNLP-OST 2019 focuses on extracting microorganism locations and phenotypes from biomedical texts. Our span-based model, utilizing a pre-trained BERT model, achieves significantly better performance in entity and relation extraction tasks for BBs compared to previous methods, showing a reduction of 21.6% in slot error rate (SER). The model also shows effectiveness in recognizing nested entities and can be applied to other related tasks with state-of-the-art performance.

Motivation: Information about bacteria biotopes (BB) is important for fundamental research and applications in microbiology. BB task at BioNLP-OST 2019 focuses on the extraction of locations and phenotypes of microorganisms from PubMed abstracts and full-text excerpts. The subtask BB-rel+ner aims to recognize relevant entities and extract interrelationships about BBs. The corresponding corpus owns some distinctive features (e.g. nested entities) which are challenging to deal with. Therefore, previous methods achieved low performance on entity and relation extraction and limited the mutual effect between named entity recognition and relation extraction. There is still much room for improvement. Results: We propose a span-based model to extract entities and relations jointly from biomedical text regarding the BBs. For alleviating the problem of annotated data deficiency in domain-specific task, we employ a BERT (Bidirectional Encoder Representations from Transformers) model pre-trained on the domain-specific corpus to encode sentences. Our model considers all spans in a sentence as potential entity mentions and computes relation scores between the most confident entity spans based on representations of spans and contexts between spans. Experiments on the BB-rel+ner 2019 corpus demonstrate that our model achieves significantly better performance than the state-of-the-art method, with a reduction of 21.6% slot error rate (SER) for extracting relations. Our model is also effective in recognizing nested entities. Furthermore, the model can be applied to the CHEMPROT corpus for joint extraction of chemical-protein entities and relations, achieving state-of-the-art performance.

A span-based joint model for extracting entities and relations of bacteria biotopes

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A span-based joint model for extracting entities and relations of bacteria biotopes

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper