☆ 4.4 Article

Feature screening in large scale cluster analysis

JOURNAL OF MULTIVARIATE ANALYSIS (2017)

期刊

JOURNAL OF MULTIVARIATE ANALYSIS

卷 161, 期 -, 页码 191-212

出版社

ELSEVIER INC

DOI: 10.1016/j.jmva.2017.08.001

关键词

Convex clustering; Empirical processes; High-dimensionality; Modality detection; Non-asymptotic screening rate; RNA-Seq data; Single-cell biology

类别

Statistics & Probability

资金

University of Southern California's James H. Zumberge Faculty Research and Innovation Fund

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We propose a novel methodology for feature screening in the clustering of massive datasets, in which both the number of features and the number of observations can potentially be very large. Taking advantage of a fusion penalization based convex clustering criterion, we propose a highly scalable screening procedure that efficiently discards non informative features by first computing a clustering score corresponding to the clustering tree constructed for each feature, and then thresholding the resulting values. We provide theoretical support for our approach by establishing uniform non-asymptotic bounds on the clustering scores of the noise features. These bounds imply perfect screening of non-informative features with high probability and are derived via careful analysis of the empirical processes corresponding to the clustering trees that are constructed for each of the features by the associated clustering procedure. Through extensive simulation experiments, we compare the performance of our proposed method with other screening approaches popularly used in cluster analysis and obtain encouraging results. We demonstrate empirically that our method is applicable to cluster analysis of big datasets arising in single-cell gene expression studies. (C) 2017 Elsevier Inc. All rights reserved.

Feature screening in large scale cluster analysis

期刊

JOURNAL OF MULTIVARIATE ANALYSIS

出版社

ELSEVIER INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Feature screening in large scale cluster analysis

期刊

JOURNAL OF MULTIVARIATE ANALYSIS

出版社

ELSEVIER INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文