4.7 Article

Supervised learning-based tagSNP selection for genome-wide disease classifications

期刊

BMC GENOMICS
卷 9, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/1471-2164-9-S1-S6

关键词

-

资金

  1. NHLBI NIH HHS [T32 HL066991, T32 HL66991-05] Funding Source: Medline
  2. NICHD NIH HHS [R01 HD3728] Funding Source: Medline
  3. NIDDK NIH HHS [T32-DK07664, T32 DK007664] Funding Source: Medline
  4. NATIONAL HEART, LUNG, AND BLOOD INSTITUTE [T32HL066991] Funding Source: NIH RePORTER
  5. NATIONAL INSTITUTE OF DIABETES AND DIGESTIVE AND KIDNEY DISEASES [T32DK007664] Funding Source: NIH RePORTER

向作者/读者索取更多资源

Background: Comprehensive evaluation of common genetic variations through association of single nucleotide polymorphisms (SNPs) with complex human diseases on the genome-wide scale is an active area in human genome research. One of the fundamental questions in a SNP-disease association study is to find an optimal subset of SNPs with predicting power for disease status. To find that subset while reducing study burden in terms of time and costs, one can potentially reconcile information redundancy from associations between SNP markers. Results: We have developed a feature selection method named Supervised Recursive Feature Addition (SRFA). This method combines supervised learning and statistical measures for the chosen candidate features/SNPs to reconcile the redundancy information and, in doing so, improve the classification performance in association studies. Additionally, we have proposed a Support Vector based Recursive Feature Addition (SVRFA) scheme in SNP-disease association analysis. Conclusions: We have proposed using SRFA with different statistical learning classifiers and SVRFA for both SNP selection and disease classification and then applying them to two complex disease data sets. In general, our approaches outperform the well-known feature selection method of Support Vector Machine Recursive Feature Elimination and logic regression-based SNP selection for disease classification in genetic association studies. Our study further indicates that both genetic and environmental variables should be taken into account when doing disease predictions and classifications for the most complex human diseases that have gene-environment interactions.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据