4.5 Article

A multilevel model to address batch effects in copy number estimation using SNP arrays

期刊

BIOSTATISTICS
卷 12, 期 1, 页码 33-50

出版社

OXFORD UNIV PRESS
DOI: 10.1093/biostatistics/kxq043

关键词

Bioinformatics; Hierarchical models; DNA copy number variations; Single nucleotide polymorphism array

资金

  1. National Institutes of Health [1K99HG005015]
  2. CTSA
  3. National Heart, Lung, and Blood Institute [5T32HL007024]
  4. National Institute of General Medicine [R01GM083084]
  5. National Center for Research Resource [5R01RR021967]
  6. NATIONAL CENTER FOR RESEARCH RESOURCES [R01RR021967] Funding Source: NIH RePORTER
  7. NATIONAL HEART, LUNG, AND BLOOD INSTITUTE [T32HL007024] Funding Source: NIH RePORTER
  8. NATIONAL HUMAN GENOME RESEARCH INSTITUTE [K99HG005015, R00HG005015] Funding Source: NIH RePORTER
  9. NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES [R01GM083084, R01GM103552] Funding Source: NIH RePORTER

向作者/读者索取更多资源

Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of base pairs in the genome. Genomewide association studies (GWAS) may simultaneously screen for copy number phenotype and single nucleotide polymorphism (SNP) phenotype associations as part of the analytic strategy. However, genomewide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post hoc quality control procedures to exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of biallelic genotype calls from experimental data to estimate batch-specific and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in the quantile-normalized intensities, while the latter illustrates the robustness of our approach to a data set in which approximately 27% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package crlmm at Bioconductor (http:www.bioconductor.org).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据