☆ 4.0 Article

A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data

STATISTICS AND ITS INTERFACE (2015)

Journal

STATISTICS AND ITS INTERFACE

Volume 8, Issue 2, Pages 137-151

Publisher

INT PRESS BOSTON, INC

DOI: 10.4310/SII.2015.v8.n2.a2

Keywords

Bayesian variable selection; Hardy-Weinberg equilibrium law; Linear models; Linkage disequilibrium; Markov random field; SNP data

Funding

NIH/NHLBI [P01-HL082798]
NSF/DMS [1007871]
NCI [1R03 CA141998, 5K07 CA123109, P30 CA016672]
Direct For Mathematical & Physical Scien
Division Of Mathematical Sciences [1007871] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data. In this paper we propose a novel Bayesian modeling framework to identify molecular biomarkers for disease prediction. Our method combines pathway-based approaches with multiple SNP analyses of a specified region of interest. The model's development is motivated by SNP data from a lung cancer study. In our approach we define gene-level scores based on SNP allele frequencies and use a linear modeling setting to study the scores association to the observed phenotype. The basic idea behind the definition of gene-level scores is to weigh the SNPs within the gene according to their rarity, based on genotype frequencies expected under the Hardy-Weinberg equilibrium law. This results in scores giving more importance to the unusually low frequencies, i.e. to SNPs that might indicate peculiar genetic differences between subjects belonging to different groups. An additional feature of our approach is that we incorporate information on SNP-to-SNP associations into the model. In particular, we use network priors that model the linkage disequilibrium between SNPs. For posterior inference, we design a stochastic search method that identifies significant biomarkers (genes and SNPs) for disease prediction. We assess performances on simulated data and compare results to existing approaches. We then show the ability of the proposed methodology to detect relevant genes and associated SNPs in a lung cancer dataset.

A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data

Journal

STATISTICS AND ITS INTERFACE

Publisher

INT PRESS BOSTON, INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data

Journal

STATISTICS AND ITS INTERFACE

Publisher

INT PRESS BOSTON, INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper