4.7 Article

OARD: Open annotations for rare diseases and their phenotypes based on real-world data

期刊

AMERICAN JOURNAL OF HUMAN GENETICS
卷 109, 期 9, 页码 1591-1604

出版社

CELL PRESS
DOI: 10.1016/j.ajhg.2022.08.002

关键词

-

资金

  1. National Library of Medicine/Na-tional Human Genomic Research Institute grant R01LM012895and National Center for Advancing Translational Sciences of the National Institutes of Health [OT2TR003434]

向作者/读者索取更多资源

This study presents an open annotation resource, derived from real-world data, for annotating phenotypic traits related to rare genetic diseases. By leveraging ontology mapping and natural language processing methods, this resource can automatically extract concepts for rare diseases and their phenotypic traits. Compared to manual annotation, it can identify more disease-phenotype associations and can be shared across different institutions.
Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinge on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case reports. Despite their potential, real-world data such as electronic health records (EHRs) have not been fully exploited to derive rare disease annotations. Here, we present open annotation for rare diseases (OARD), a real-world-data -derived resource with annotation for rare-disease-related phenotypes. This resource is derived from the EHRs of two academic health institutions containing more than 10 million individuals spanning wide age ranges and different disease subgroups. By leveraging ontology mapping and advanced natural-language-processing (NLP) methods, OARD automatically and efficiently extracts concepts for both rare diseases and their phenotypic traits from billing codes and lab tests as well as over 100 million clinical narratives. The rare disease prevalence derived by OARD is highly correlated with those annotated in the original rare disease knowledgebase. By per-forming association analysis, we identified more than 1 million novel disease-phenotype association pairs that were previously missed by human annotation, and >60% were confirmed true associations via manual review of a list of sampled pairs. Compared to the manual curated annotation, OARD is 100% data driven and its pipeline can be shared across different institutions. By supporting privacy -pre-serving sharing of aggregated summary statistics, such as term frequencies and disease-phenotype associations, it fills an important gap to facilitate data-driven research in the rare disease community.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据