期刊
GENOME BIOLOGY
卷 20, 期 -, 页码 -出版社
BMC
DOI: 10.1186/s13059-019-1634-2
关键词
Mendelian diseases; Whole genome sequencing; Rare variant analysis; Non-coding genetic variants; Pathogenicity score
资金
- French National Research Agency (Agence Nationale de la Recherche, ANR) Investissements d'Avenir program [ANR-10-IAHU-01, ANR-17-RHUS-0002 - C'IL-LICO]
- MSDAvenir fund, Devo-Decode project
State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogenic non-coding variants associated with monogenic Mendelian diseases. In addition to interspecies conservation, a comprehensive set of recent and ongoing purifying selection signals in humans is explored, accounting for lineage-specific regulatory elements. Supervised learning using gradient tree boosting on such features achieves a high predictive performance and overcomes positional bias. NCBoost performs consistently across diverse learning and independent testing data sets and outperforms other existing reference methods.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据