☆ 4.7 Article

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

JOURNAL OF MEDICAL INTERNET RESEARCH (2022)

期刊

JOURNAL OF MEDICAL INTERNET RESEARCH

卷 24, 期 7, 页码 -

出版社

JMIR PUBLICATIONS, INC

DOI: 10.2196/38584

关键词

adversarial generative network; knowledge graph; deep denoising; machine learning; COVID-19; biomedical; neural network; network model; training data

类别

Health Care Sciences & Services Medical Informatics

资金

National Institutes of Health (NIH) National Institute of General Medical Sciences (NIGMS) [K99GM135488]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study investigates the problem of false-positive predictions in the applications derived from biomedical knowledge graphs, where the co-occurrences of entities in literature do not always indicate a true biomedical association. The proposed framework uses deep neural networks to generate a graph that can distinguish unknown associations and remove noise from the raw training graph. The results demonstrate that the method achieves favorable link prediction performance, even with limited labeled data.

Background: Multiple types of biomedical associations of knowledge graphs, including COVID-19-related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. Objective: Data quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model's performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. Methods: The proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. Results: The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. Conclusions: Our preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data.

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

期刊

JOURNAL OF MEDICAL INTERNET RESEARCH

出版社

JMIR PUBLICATIONS, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation

期刊

JOURNAL OF MEDICAL INTERNET RESEARCH

出版社

JMIR PUBLICATIONS, INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文