4.0 Article

A Bayesian approach to identify genes and gene-level SNP aggregates in a genetic analysis of cancer data

Journal

STATISTICS AND ITS INTERFACE
Volume 8, Issue 2, Pages 137-151

Publisher

INT PRESS BOSTON, INC
DOI: 10.4310/SII.2015.v8.n2.a2

Keywords

Bayesian variable selection; Hardy-Weinberg equilibrium law; Linear models; Linkage disequilibrium; Markov random field; SNP data

Funding

  1. NIH/NHLBI [P01-HL082798]
  2. NSF/DMS [1007871]
  3. NCI [1R03 CA141998, 5K07 CA123109, P30 CA016672]
  4. Direct For Mathematical & Physical Scien
  5. Division Of Mathematical Sciences [1007871] Funding Source: National Science Foundation

Ask authors/readers for more resources

Complex diseases, such as cancer, arise from complex etiologies consisting of multiple single-nucleotide polymorphisms (SNPs), each contributing a small amount to the overall risk of disease. Thus, many researchers have gone beyond single-SNPs analysis methods, focusing instead on groups of SNPs, for example by analysing haplotypes. More recently, pathway-based methods have been proposed that use prior biological knowledge on gene function to achieve a more powerful analysis of genome-wide association studies (GWAS) data. In this paper we propose a novel Bayesian modeling framework to identify molecular biomarkers for disease prediction. Our method combines pathway-based approaches with multiple SNP analyses of a specified region of interest. The model's development is motivated by SNP data from a lung cancer study. In our approach we define gene-level scores based on SNP allele frequencies and use a linear modeling setting to study the scores association to the observed phenotype. The basic idea behind the definition of gene-level scores is to weigh the SNPs within the gene according to their rarity, based on genotype frequencies expected under the Hardy-Weinberg equilibrium law. This results in scores giving more importance to the unusually low frequencies, i.e. to SNPs that might indicate peculiar genetic differences between subjects belonging to different groups. An additional feature of our approach is that we incorporate information on SNP-to-SNP associations into the model. In particular, we use network priors that model the linkage disequilibrium between SNPs. For posterior inference, we design a stochastic search method that identifies significant biomarkers (genes and SNPs) for disease prediction. We assess performances on simulated data and compare results to existing approaches. We then show the ability of the proposed methodology to detect relevant genes and associated SNPs in a lung cancer dataset.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available