☆ 4.5 Article

Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data

GENOME BIOLOGY (2022)

期刊

GENOME BIOLOGY

卷 23, 期 1, 页码 -

出版社

BMC

DOI: 10.1186/s13059-022-02622-0

关键词

类别

Biotechnology & Applied Microbiology Genetics & Heredity

资金

Australia National Health and Medical Research Council (NHMRC) [1173469]
Postgraduate Research Excellence Award (PREA) Tuition Fee and Stipend Scholarship
Research Training Program Tuition Fee Offset
University of Sydney Postgraduate Award Stipend Scholarship
National Health and Medical Research Council of Australia [1173469] Funding Source: NHMRC

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study systematically benchmarks a range of clustering algorithms for single-cell RNA-seq data and summarizes the strengths and weaknesses of each method. The authors evaluate the performance of the algorithms using a large number of datasets and provide a multi-aspect recommendation to users.

Background: A key task in single-cell RNA-seq (scRNA-seq) data analysis is to accurately detect the number of cell types in the sample, which can be critical for downstream analyses such as cell type identification. Various scRNA-seq data clustering algorithms have been specifically designed to automatically estimate the number of cell types through optimising the number of clusters in a dataset. The lack of benchmark studies, however, complicates the choice of the methods. Results: We systematically benchmark a range of popular clustering algorithms on estimating the number of cell types in a variety of settings by sampling from the Tabula Muris data to create scRNA-seq datasets with a varying number of cell types, varying number of cells in each cell type, and different cell type proportions. The large number of datasets enables us to assess the performance of the algorithms, covering four broad categories of approaches, from various aspects using a panel of criteria. We further cross-compared the performance on datasets with high cell numbers using Tabula Muris and Tabula Sapiens data. Conclusions: We identify the strengths and weaknesses of each method on multiple criteria including the deviation of estimation from the true number of cell types, variability of estimation, clustering concordance of cells to their predefined cell types, and running time and peak memory usage. We then summarise these results into a multi-aspect recommendation to the users. The proposed stability-based approach for estimating the number of cell types is implemented in an R package and is freely available from (https://github.com/PYanaLab/scCCESS).

Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data

期刊

GENOME BIOLOGY

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data

期刊

GENOME BIOLOGY

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文