☆ 4.7 Article

Transferring From Textual Entailment to Biomedical Named Entity Recognition

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2023)

Journal

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Volume 20, Issue 4, Pages 2577-2586

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TCBB.2023.3236477

Keywords

Index Terms-Biomedical named entity recognition; contrastive learning; textual entailment; transfer learning

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Biomedical Named Entity Recognition (BioNER) aims to identify biomedical entities such as genes, proteins, diseases, and chemical compounds in textual data. However, due to ethical and privacy issues, as well as the specialized nature of biomedical data, BioNER lacks quality labeled data, especially at the token-level. This study proposes a gazetteer-based approach to BioNER, where the task is to build a BioNER system from scratch without any token-level annotations. By formulating BioNER as a Textual Entailment problem and using Textual Entailment with Dynamic Contrastive learning (TEDC), this work addresses the noisy labeling issue and transfers knowledge from pre-trained textual entailment models. The dynamic contrastive learning framework improves the model's discrimination ability by contrasting entities and non-entities in the same sentence. Experimental results on real-world biomedical datasets demonstrate that TEDC achieves state-of-the-art performance for gazetteer-based BioNER.

Biomedical Named Entity Recognition (BioNER) aims at identifying biomedical entities such as genes, proteins, diseases, and chemical compounds in the given textual data. However, due to the issues of ethics, privacy, and high specialization of biomedical data, BioNER suffers from the more severe problem of lacking in quality labeled data than the general domain especially for the token-level. Facing the extremely limited labeled biomedical data, this work studies the problem of gazetteer-based BioNER, which aims at building a BioNER system from scratch. It needs to identify the entities in the given sentences when we have zero token-level annotations for training. Previous works usually use sequential labeling models to solve the NER or BioNER task and obtain weakly labeled data from gazetteers when we don't have full annotations. However, these labeled data are quite noisy since we need the labels for each token and the entity coverage of the gazetteers is limited. Here we propose to formulate the BioNER task as a Textual Entailment problem and solve the task via Textual Entailment with Dynamic Contrastive learning (TEDC). TEDC not only alleviates the noisy labeling issue, but also transfers the knowledge from pre-trained textual entailment models. Additionally, the dynamic contrastive learning framework contrasts the entities and non-entities in the same sentence and improves the model's discrimination ability. Experiments on two real-world biomedical datasets show that TEDC can achieve state-of-the-art performance for gazetteer-based BioNER.

Transferring From Textual Entailment to Biomedical Named Entity Recognition

Journal

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Transferring From Textual Entailment to Biomedical Named Entity Recognition

Journal

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper