4.6 Article

Estimating and Accounting for Unobserved Covariates in High-Dimensional Correlated Data

Journal

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
Volume 117, Issue 537, Pages 225-236

Publisher

TAYLOR & FRANCIS INC
DOI: 10.1080/01621459.2020.1769635

Keywords

Batch effects; Cell-type heterogeneity; Confounding; Correlation; Multi-tissue; Unwanted variation

Funding

  1. NIH - NIH [R01-HL129735, R01MH101820]

Ask authors/readers for more resources

This article presents two methods, CBCV and CorrConf, for handling high-dimensional biological datasets with complex sample correlation structures. These methods demonstrate superior performance in choosing the number of latent confounding factors and estimating them, as evidenced by analysis of simulated and real data applications.
Many high-dimensional and high-throughput biological datasets have complex sample correlation structures, which include longitudinal and multiple tissue data, as well as data with multiple treatment conditions or related individuals. These data, as well as nearly all high-throughput omic data, are influenced by technical and biological factors unknown to the researcher, which, if unaccounted for, can severely obfuscate estimation of and inference on the effects of interest. We therefore developed CBCV and CorrConf: provably accurate and computationally efficient methods to choose the number of and estimate latent confounding factors present in high-dimensional data with correlated or nonexchangeable residuals. We demonstrate each method's superior performance compared to other state of the art methods by analyzing simulated multi-tissue gene expression data and identifying sex-associated DNA methylation sites in a real, longitudinal twin study.for this article are available online.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available