☆ 4.7 Article

Incorporating entity-level knowledge in pretrained language model for biomedical dense retrieval

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

期刊

COMPUTERS IN BIOLOGY AND MEDICINE

卷 166, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.compbiomed.2023.107535

关键词

Dense retrieval; Natural language processing; Knowledge graph; Semantic matching

类别

Biology Computer Science, Interdisciplinary Applications Engineering, Biomedical Mathematical & Computational Biology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a method to incorporate external knowledge into a dense retrieval model to improve the effectiveness of biomedical retrieval based on pre-trained language models. Experimental results demonstrate that the proposed method outperforms existing methods and has relatively low query latency.

In recent years, pre-trained language models (PLMs) have dominated natural language processing (NLP) and achieved outstanding performance in various NLP tasks, including dense retrieval based on PLMs. However, in the biomedical domain, the effectiveness of dense retrieval models based on PLMs still needs to be improved due to the diversity and ambiguity of entity expressions caused by the enrichment of biomedical entities. To alleviate the semantic gap, in this paper, we propose a method that incorporates external knowledge at the entity level into a dense retrieval model to enrich the dense representations of queries and documents. Specifically, we first add additional self-attention and information interaction modules in the Transformer layer of the BERT archi-tecture to perform fusion and interaction between query/document text and entity embeddings from knowledge graphs. We then propose an entity similarity loss to constrain the model to better learn external knowledge from entity embeddings, and further propose a weighted entity concatenation mechanism to balance the impact of entity representations when matching queries and documents. Experiments on two publicly available biomedical retrieval datasets show that our proposed method outperforms state-of-the-art dense retrieval methods. In term of NDCG metrics, the proposed method (called ELK) improves the ranking performance of coCondenser by at least 5% on both two datasets, and also obtains further performance gain over state-of-the-art EVA methods. Though having a more sophisticated architecture, the average query latency of ELK is still within the same order of magnitude as that of other efficient methods.

Incorporating entity-level knowledge in pretrained language model for biomedical dense retrieval

期刊

COMPUTERS IN BIOLOGY AND MEDICINE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Incorporating entity-level knowledge in pretrained language model for biomedical dense retrieval

期刊

COMPUTERS IN BIOLOGY AND MEDICINE

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文