4.5 Article

Separating and reintegrating latent variables to improve classification of genomic data

期刊

BIOSTATISTICS
卷 23, 期 4, 页码 1133-1149

出版社

OXFORD UNIV PRESS
DOI: 10.1093/biostatistics/kxab046

关键词

Classification; Gene expression; Linear discriminant analysis

资金

  1. National Science Foundation Graduate Research Fellowship [DGE 1841052]
  2. National Science Foundation RTG grant [DMS 1646108]

向作者/读者索取更多资源

Genomic data sets contain latent variables that can both help and add noise to classification. To address this issue, a cross-residualization classifier (CRC) is proposed, which adjusts and integrates latent variables without discarding potentially predictive information. Experimental results show that CRC performs well compared to existing classifiers.
Genomic data sets contain the effects of various unobserved biological variables in addition to the variable of primary interest. These latent variables often affect a large number of features (e.g., genes), giving rise to dense latent variation. This latent variation presents both challenges and opportunities for classification. While some of these latent variables may be partially correlated with the phenotype of interest and thus helpful, others may be uncorrelated and merely contribute additional noise. Moreover, whether potentially helpful or not, these latent variables may obscure weaker effects that impact only a small number of features but more directly capture the signal of primary interest. To address these challenges, we propose the cross-residualization classifier (CRC). Through an adjustment and ensemble procedure, the CRC estimates and residualizes out the latent variation, trains a classifier on the residuals, and then reintegrates the latent variation in a final ensemble classifier. Thus, the latent variables are accounted for without discarding any potentially predictive information. We apply the method to simulated data and a variety of genomic data sets from multiple platforms. In general, we find that the CRC performs well relative to existing classifiers and sometimes offers substantial gains.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据