☆ 4.7 Article

Incorporating entity-level knowledge in pretrained language model for biomedical dense retrieval

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Journal

COMPUTERS IN BIOLOGY AND MEDICINE

Volume 166, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.compbiomed.2023.107535

Keywords

Dense retrieval; Natural language processing; Knowledge graph; Semantic matching

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a method to incorporate external knowledge into a dense retrieval model to improve the effectiveness of biomedical retrieval based on pre-trained language models. Experimental results demonstrate that the proposed method outperforms existing methods and has relatively low query latency.

In recent years, pre-trained language models (PLMs) have dominated natural language processing (NLP) and achieved outstanding performance in various NLP tasks, including dense retrieval based on PLMs. However, in the biomedical domain, the effectiveness of dense retrieval models based on PLMs still needs to be improved due to the diversity and ambiguity of entity expressions caused by the enrichment of biomedical entities. To alleviate the semantic gap, in this paper, we propose a method that incorporates external knowledge at the entity level into a dense retrieval model to enrich the dense representations of queries and documents. Specifically, we first add additional self-attention and information interaction modules in the Transformer layer of the BERT archi-tecture to perform fusion and interaction between query/document text and entity embeddings from knowledge graphs. We then propose an entity similarity loss to constrain the model to better learn external knowledge from entity embeddings, and further propose a weighted entity concatenation mechanism to balance the impact of entity representations when matching queries and documents. Experiments on two publicly available biomedical retrieval datasets show that our proposed method outperforms state-of-the-art dense retrieval methods. In term of NDCG metrics, the proposed method (called ELK) improves the ranking performance of coCondenser by at least 5% on both two datasets, and also obtains further performance gain over state-of-the-art EVA methods. Though having a more sophisticated architecture, the average query latency of ELK is still within the same order of magnitude as that of other efficient methods.

Incorporating entity-level knowledge in pretrained language model for biomedical dense retrieval

Journal

COMPUTERS IN BIOLOGY AND MEDICINE

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Incorporating entity-level knowledge in pretrained language model for biomedical dense retrieval

Journal

COMPUTERS IN BIOLOGY AND MEDICINE

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper