4.5 Article

DisGeReExT: a knowledge discovery system for exploration of disease-gene associations through large-scale literature-wide analysis study

Journal

KNOWLEDGE AND INFORMATION SYSTEMS
Volume 65, Issue 8, Pages 3463-3487

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s10115-023-01862-1

Keywords

Text mining; Gene-disease association; LWAS; Disease-disease similarity; Sentence ranking

Ask authors/readers for more resources

This study used advanced experimental methods such as next-generation sequencing to identify potential genetic biomarkers and gene variants related to diseases. To extract meaningful information from the scientific literature, sophisticated text mining-based knowledge-driven frameworks were utilized.
Objective Advanced experimental methods such as next-generation sequencing (NGS) produced a large number of potential indicative genetic biomarkers and gene variants to diseases mentioned as research outputs in the scientific literature. To elucidate novel biomarkers and therapeutic candidates from this larger number of literature, highly sophisticated text mining-based knowledge-driven frameworks are a necessity. Materials and Methods This paper presents DisGeReExT Web server for performing a literature-wide analysis study (LWAS) to extract both direct and indirect gene-disease associations using joint ensemble learning (explicit) along with concept profiling using the ABC principle (implicit) for prioritizing and rationalizing potential informative discoveries of the genetic role on diseases. In addition, we ranked the informative sentences using a scoring model and calculated the disease-disease similarity using functional association among shared genes. Results From complete MEDLINE corpus dated September 2020 with 28 million records, DisGeReExT identified a total of 2,237,545 gene-disease associations and 2,851,662 disease-disease similarities. Discussion DisGeReExT was able to extract informative sentences related to both diseases and genes in large scale. It also explored the gene-disease association of two diseases, namely Alzheimer's disease and liver carcinoma, and identified its top 10 associated genes and diseases of both diseases. Conclusion Overall, we strongly believe that our large-scale automated approach for knowledge discovery of gene-associated diseases from literature could provide new insights into the genetic mechanism and disease etiology and can play a pivotal role in translational research, drug discovery, and repurposing.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available