4.6 Article

A Data-Driven Clustering Recommendation Method for Single-Cell RNA-Sequencing Data

期刊

TSINGHUA SCIENCE AND TECHNOLOGY
卷 26, 期 5, 页码 772-789

出版社

TSINGHUA UNIV PRESS
DOI: 10.26599/TST.2020.9010028

关键词

Clustering methods; Correlation; Clustering algorithms; Biology; Shape; Dimensionality reduction; Gold; single-cell RNA-sequencing (scRNA-seq); cellular heterogeneity; cell type identification; data latent shape; clustering

资金

  1. National Natural Science Foundation of China [U19A2064]
  2. Hunan Provincial Science and Technology Program [2019CB1007]
  3. Fundamental Research Funds for the Central Universities, CSU [2282019SYLB004]
  4. Fundamental Research Funds for the CentralUniversities of Central South University [2020zzts593]

向作者/读者索取更多资源

The study combined different strategies with clustering methods and found that spectral clustering is more suitable for data sets with continuous shapes, while hierarchical clustering is more suitable for data sets with clear cluster boundaries. Inspired by this, a new strategy called QRS was developed to evaluate dataset shapes, and a data-driven method called DDCR was proposed to recommend suitable clustering methods for scRNA-seq data.
Recently, the emergence of single-cell RNA-sequencing (scRNA-seq) technology makes it possible to solve biological problems at the single-cell resolution. One of the critical steps in cellular heterogeneity analysis is the cell type identification. Diverse scRNA-seq clustering methods have been proposed to partition cells into clusters. Among all the methods, hierarchical clustering and spectral clustering are the most popular approaches in the downstream clustering analysis with different preprocessing strategies such as similarity learning, dropout imputation, and dimensionality reduction. In this study, we carry out a comprehensive analysis by combining different strategies with these two categories of clustering methods on scRNA-seq datasets under different biological conditions. The analysis results show that the methods with spectral clustering tend to perform better on datasets with continuous shapes in two-dimension, while those with hierarchical clustering achieve better results on datasets with obvious boundaries between clusters in two-dimension. Motivated by this finding, a new strategy, called QRS, is developed to quantitatively evaluate the latent representative shape of a dataset to distinguish whether it has clear boundaries or not. Finally, a data-driven clustering recommendation method, called DDCR, is proposed to recommend hierarchical clustering or spectral clustering for scRNA-seq data. We perform DDCR on two typical single cell clustering methods, SC3 and RAFSIL, and the results show that DDCR recommends a more suitable downstream clustering method for different scRNA-seq datasets and obtains more robust and accurate results.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据