4.7 Article

Finding haplotype tagging SNPs by use of principal components analysis

Journal

AMERICAN JOURNAL OF HUMAN GENETICS
Volume 75, Issue 5, Pages 850-861

Publisher

CELL PRESS
DOI: 10.1086/425587

Keywords

-

Funding

  1. NIGMS NIH HHS [U01-GM61374, U01 GM061374] Funding Source: Medline
  2. NLM NIH HHS [T15 LM007033, LM007033] Funding Source: Medline

Ask authors/readers for more resources

The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as haplotype tagging SNPs (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific region. This information-reduction process facilitates cost-effective genotyping and, subsequently, genotype-phenotype association studies. It also has implications for assessing the risk of identifying research subjects on the basis of SNP information deposited in public domain databases. We have investigated methods for selecting htSNPs by use of principal components analysis (PCA). These methods first identify eigenSNPs and then map them to actual SNPs. We evaluated two mapping strategies, greedy discard and varimax rotation, by assessing the ability of the selected htSNPs to reconstruct genotypes of non-htSNPs. We also compared these methods with two other htSNP finders, one of which is PCA based. We applied these methods to three experimental data sets and found that the PCA-based methods tend to select the smallest set of htSNPs to achieve a 90% reconstruction precision.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available