4.5 Article

DisGeReExT: a knowledge discovery system for exploration of disease-gene associations through large-scale literature-wide analysis study

期刊

KNOWLEDGE AND INFORMATION SYSTEMS
卷 65, 期 8, 页码 3463-3487

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s10115-023-01862-1

关键词

Text mining; Gene-disease association; LWAS; Disease-disease similarity; Sentence ranking

向作者/读者索取更多资源

This study used advanced experimental methods such as next-generation sequencing to identify potential genetic biomarkers and gene variants related to diseases. To extract meaningful information from the scientific literature, sophisticated text mining-based knowledge-driven frameworks were utilized.
Objective Advanced experimental methods such as next-generation sequencing (NGS) produced a large number of potential indicative genetic biomarkers and gene variants to diseases mentioned as research outputs in the scientific literature. To elucidate novel biomarkers and therapeutic candidates from this larger number of literature, highly sophisticated text mining-based knowledge-driven frameworks are a necessity. Materials and Methods This paper presents DisGeReExT Web server for performing a literature-wide analysis study (LWAS) to extract both direct and indirect gene-disease associations using joint ensemble learning (explicit) along with concept profiling using the ABC principle (implicit) for prioritizing and rationalizing potential informative discoveries of the genetic role on diseases. In addition, we ranked the informative sentences using a scoring model and calculated the disease-disease similarity using functional association among shared genes. Results From complete MEDLINE corpus dated September 2020 with 28 million records, DisGeReExT identified a total of 2,237,545 gene-disease associations and 2,851,662 disease-disease similarities. Discussion DisGeReExT was able to extract informative sentences related to both diseases and genes in large scale. It also explored the gene-disease association of two diseases, namely Alzheimer's disease and liver carcinoma, and identified its top 10 associated genes and diseases of both diseases. Conclusion Overall, we strongly believe that our large-scale automated approach for knowledge discovery of gene-associated diseases from literature could provide new insights into the genetic mechanism and disease etiology and can play a pivotal role in translational research, drug discovery, and repurposing.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据