☆ 4.7 Article

SPADIS: An Algorithm for Selecting Predictive and Diverse SNPs in GWAS

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2021)

期刊

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

卷 18, 期 3, 页码 1208-1216

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TCBB.2019.2935437

关键词

Greedy algorithms; Genomics; Bioinformatics; Optimization; Prediction algorithms; Phenotype prediction; GWAS; SNP selection; SNP-SNP networks; Hi-C; submodular function

类别

Biochemical Research Methods Computer Science, Interdisciplinary Applications Mathematics, Interdisciplinary Applications Statistics & Probability

资金

TUBITAK [116E148]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study introduces a novel method called SPADIS, which favors selecting remotely located SNPs to account for their complementary effects in explaining a phenotype. Compared to the state-of-the-art method SConES, SPADIS shows better phenotype prediction performance, consistently improving across multiple networks and settings, identifying more candidate genes, and running faster.

Phenotypic heritability of complex traits and diseases is seldom explained by individual genetic variants identified in genome-wide association studies (GWAS). Many methods have been developed to select a subset of variant loci, which are associated with or predictive of the phenotype. Selecting connected SNPs on SNP-SNP networks have been proven successful in finding biologically interpretable and predictive SNPs. However, we argue that the connectedness constraint favors selecting redundant features that affect similar biological processes and therefore does not necessarily yield better predictive performance. In this paper, we propose a novel method called SPADIS that favors the selection of remotely located SNPs in order to account for their complementary effects in explaining a phenotype. SPADIS selects a diverse set of loci on a SNP-SNP network. This is achieved by maximizing a submodular set function with a greedy algorithm that ensures a constant factor approximation to the optimal solution. We compare SPADIS to the state-of-the-art method SConES, on a dataset of Arabidopsis Thaliana with continuous flowering time phenotypes. SPADIS has better average phenotype prediction performance in 15 out of 17 phenotypes when the same number of SNPs are selected and provides consistent improvements across multiple networks and settings on average. Moreover, it identifies more candidate genes and runs faster.

SPADIS: An Algorithm for Selecting Predictive and Diverse SNPs in GWAS

期刊

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

SPADIS: An Algorithm for Selecting Predictive and Diverse SNPs in GWAS

期刊

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

出版社

IEEE COMPUTER SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文