4.5 Article Proceedings Paper

Computational identification of deleterious synonymous variants in human genomes using a feature-based approach

期刊

BMC MEDICAL GENOMICS
卷 12, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/s12920-018-0455-6

关键词

Synonymous variant; Pathogenicity prediction; Feature selection; Random forest

资金

  1. National Natural Science Foundation of China [61672037, 11835014, 61873001, 21601001]
  2. Young Wanjiang Scholar Program of Anhui Province
  3. Key Project of Anhui Provincial Education Department [KJ2017ZD01]
  4. Anhui Provincial Outstanding Young Talent Support Plan [gxyqZD2017005]

向作者/读者索取更多资源

BackgroundAlthough synonymous single nucleotide variants (sSNVs) do not alter the protein sequences, they have been shown to play an important role in human disease. Distinguishing pathogenic sSNVs from neutral ones is challenging because pathogenic sSNVs tend to have low prevalence. Although many methods have been developed for predicting the functional impact of single nucleotide variants, only a few have been specifically designed for identifying pathogenic sSNVs.ResultsIn this work, we describe a computational model, IDSV (Identification of Deleterious Synonymous Variants), which uses random forest (RF) to detect deleterious sSNVs in human genomes. We systematically investigate a total of 74 multifaceted features across seven categories: splicing, conservation, codon usage, sequence, pre-mRNA folding energy, translation efficiency, and function regions annotation features. Then, to remove redundant and irrelevant features and improve the prediction performance, feature selection is employed using the sequential backward selection method. Based on the optimized 10 features, a RF classifier is developed to identify deleterious sSNVs. The results on benchmark datasets show that IDSV outperforms other state-of-the-art methods in identifying sSNVs that are pathogenic.ConclusionsWe have developed an efficient feature-based prediction approach (IDSV) for deleterious sSNVs by using a wide variety of features. Among all the features, a compact and useful feature subset that has an important implication for identifying deleterious sSNVs is identified. Our results indicate that besides splicing and conservation features, a new translation efficiency feature is also an informative feature for identifying deleterious sSNVs. While the function regions annotation and sequence features are weakly informative, they may have the ability to discriminate deleterious sSNVs from benign ones when combined with other features. The data and source code are available on website http://bioinfo.ahu.edu.cn:8080/IDSV.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据