4.4 Article

SnorkelPlus: A Novel Approach for Identifying Relationships Among Biomedical Entities Within Abstracts

Journal

COMPUTER JOURNAL
Volume -, Issue -, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/comjnl/bxad051

Keywords

disease-gene association; generative-discriminative models; Snorkel; LF; biomedical text mining

Ask authors/readers for more resources

SnorkelPlus model is proposed to extract biomedical relations between gene and disease entities from unstructured biomedical text without human effort. It achieves an AUROC of 85.60% and an AUPR of 45.73%, outperforming the baseline model, and creates a gene-disease relation database from 29 million scientific abstracts.
Identifying relationships between biomedical entities from unstructured biomedical text is a challenging task. SnorkelPlus has been proposed to provide the flexibility to extract these biomedical relations without any human effort. Our proposed model, SnorkelPlus, is aimed at finding connections between gene and disease entities. We achieved three objectives: (i) extract only gene and disease articles from NCBI's, PubMed or PubMed central database, (ii) define reusable label functions and (iii) ensure label function accuracy using generative and discriminative models. We utilized deep learning methods to achieve label training data and achieved an AUROC of 85.60% for the generated gene and disease corpus from PubMed articles. Snorkel achieved an AUPR of 45.73%, which is +2.3% higher than the baseline model. We created a gene-disease relation database using SnorkelPlus from approximately 29 million scientific abstracts without involving annotated training datasets. Furthermore, we demonstrated the generalizability of our proposed application on abstracts of PubMed articles enriched with different gene and disease relations. In the future, we plan to design a graphical database using Neo4j.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available