☆ 4.7 Article

STonKGs: a sophisticated transformer trained on biomedical text and knowledge graphs

BIOINFORMATICS (2022)

期刊

BIOINFORMATICS

卷 38, 期 6, 页码 1648-1656

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btac001

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

Fraunhofer Cluster of Excellence 'Cognitive Internet Technologies'
Defense Advanced Research Projects Agency (DARPA) Automating Scientific Knowledge Extraction (ASKE) program [HR00111990009]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this study, a multimodal model called STonKGs, based on Transformer, is proposed to generate better representations of biological knowledge by combining knowledge graphs and biomedical text data. The results demonstrate that STonKGs outperforms baseline models in various classification tasks and can be adapted to other transfer learning applications.

Motivation: The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models. However, representations based on a single modality are inherently limited. Results: To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs (KGs). This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations in a shared embedding space. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against three baseline models trained on either one of the modalities (i.e. text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.084 (i.e. from 0.881 to 0.965). Finally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications.

STonKGs: a sophisticated transformer trained on biomedical text and knowledge graphs

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

STonKGs: a sophisticated transformer trained on biomedical text and knowledge graphs

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文