☆ 4.8 Article

Supervised dimensionality reduction for big data

NATURE COMMUNICATIONS (2021)

期刊

NATURE COMMUNICATIONS

卷 12, 期 1, 页码 -

出版社

NATURE PORTFOLIO

DOI: 10.1038/s41467-021-23102-2

关键词

类别

Multidisciplinary Sciences

资金

XDATA program of the Defense Advanced Research Projects Agency (DARPA) [FA8750-12-2-0303]
DARPA GRAPHS contract [N66001-14-1-4028]
DARPA SIMPLEX program through SPAWAR contract [N66001-15-C-4041]
DARPA Lifelong Learning Machines program [FA8650-18-2-7834]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Researchers have introduced a method that combines class-conditional moment estimates into low-dimensional projection, aiming to achieve more accurate dimensionality reduction for high-dimensional biomedical data for subsequent classification. The method has been validated on datasets with varying numbers of features, demonstrating improved accuracy and computational efficiency.

To solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation that preserves the discriminating information (e.g., whether the individual suffers from a particular disease). There is a lack of interpretable supervised dimensionality reduction methods that scale to millions of dimensions with strong statistical theoretical guarantees. We introduce an approach to extending principal components analysis by incorporating class-conditional moment estimates into the low-dimensional projection. The simplest version, Linear Optimal Low-rank projection, incorporates the class-conditional means. We prove, and substantiate with both synthetic and real data benchmarks, that Linear Optimal Low-Rank Projection and its generalizations lead to improved data representations for subsequent classification, while maintaining computational efficiency and scalability. Using multiple brain imaging datasets consisting of more than 150 million features, and several genomics datasets with more than 500,000 features, Linear Optimal Low-Rank Projection outperforms other scalable linear dimensionality reduction techniques in terms of accuracy, while only requiring a few minutes on a standard desktop computer. Biomedical measurements usually generate high-dimensional data where individual samples are classified in several categories. Vogelstein et al. propose a supervised dimensionality reduction method which estimates the low-dimensional data projection for classification and prediction in big datasets.

Supervised dimensionality reduction for big data

期刊

NATURE COMMUNICATIONS

出版社

NATURE PORTFOLIO

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Supervised dimensionality reduction for big data

期刊

NATURE COMMUNICATIONS

出版社

NATURE PORTFOLIO

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文