☆ 4.5 Article

Asymptotic Conditional Singular Value Decomposition for High-Dimensional Genomic Data

BIOMETRICS (2011)

期刊

BIOMETRICS

卷 67, 期 2, 页码 344-352

出版社

WILEY

DOI: 10.1111/j.1541-0420.2010.01455.x

关键词

False discovery rate; Gene expression; Genomics; High-dimensional; Singular value decomposition; Surrogate variables

类别

Biology Mathematical & Computational Biology Statistics & Probability

资金

NIH [R01 HG002913]
NHGRI

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

High-dimensional data, such as those obtained from a gene expression microarray or second generation sequencing experiment, consist of a large number of dependent features measured on a small number of samples. One of the key problems in genomics is the identification and estimation of factors that associate with many features simultaneously. Identifying the number of factors is also important for unsupervised statistical analyses such as hierarchical clustering. A conditional factor model is the most common model for many types of genomic data, ranging from gene expression, to single nucleotide polymorphisms, to methylation. Here we show that under a conditional factor model for genomic data with a fixed sample size, the right singular vectors are asymptotically consistent for the unobserved latent factors as the number of features diverges. We also propose a consistent estimator of the dimension of the underlying conditional factor model for a finite fixed sample size and an infinite number of features based on a scaled eigen-decomposition. We propose a practical approach for selection of the number of factors in real data sets, and we illustrate the utility of these results for capturing batch and other unmodeled effects in a microarray experiment using the dependence kernel approach of Leek and Storey (2008, Proceedings of the National Academy of Sciences of the United States of America 105, 18718-18723).

Asymptotic Conditional Singular Value Decomposition for High-Dimensional Genomic Data

期刊

BIOMETRICS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Asymptotic Conditional Singular Value Decomposition for High-Dimensional Genomic Data

期刊

BIOMETRICS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文