4.7 Article

scLM: Automatic Detection of Consensus Gene Clusters Across Multiple Single-cell Datasets

期刊

GENOMICS PROTEOMICS & BIOINFORMATICS
卷 19, 期 2, 页码 330-341

出版社

ELSEVIER
DOI: 10.1016/j.gpb.2020.09.002

关键词

Single-cell RNA sequencing; Consensus clustering; Latent space; Markov Chain Monte Carlo; Maximum likelihood approach

资金

  1. Cancer Genomics, Tumor Tissue Repository, and Bioinformatics Shared Resources under the NCI Cancer Center Support Grant [P30CA012197]
  2. Hanes and Willis Professorship in Cancer, USA
  3. National Foundation for Cancer Research, USA
  4. Indiana University Precision Health Initiative, USA

向作者/读者索取更多资源

In gene expression profiling studies, specifically in single-cell RNA sequencing analyses, accurately identifying and clustering co-expressed genes is essential for understanding cell identity and function. Existing methods for single-cell data often fail to accurately identify co-expressed genes, but the scLM algorithm tailored for single-cell data proves to be effective in detecting biologically significant gene clusters and can cluster multiple single-cell datasets simultaneously. Results from simulation and experimental data show that scLM outperforms existing methods and provides novel biological insights for mechanism discovery and understanding complex biosystems like cancer.
In gene expression profiling studies, including single-cell RNA sequencing (scRNA-seq) analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in scRNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model (scLM), a gene co clustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, scLM can simultaneously cluster multiple single-cell data sets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. scLM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that scLM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of scLM, we apply it to our in-house and public experimental scRNA-seq datasets. scLM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the scLM method is available at https://github.com/QSong-github/scLM.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据