4.7 Article

Single-cell RNA-seq data semi-supervised clustering and annotation via structural regularized domain adaptation

期刊

BIOINFORMATICS
卷 37, 期 6, 页码 775-784

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btaa908

关键词

-

资金

  1. National Key Research and Development Program of China [2016YFA0502303]
  2. National Key Basic Research Project of China [2015CB910303]
  3. National Natural Science Foundation of China [31871342]

向作者/读者索取更多资源

Inspired by unsupervised domain adaptation, the study introduces a flexible single-cell semi-supervised clustering and annotation framework, scSemiCluster. By integrating reference and target data for training, the model utilizes structure similarity regularization and pairwise constraints to optimize clustering results. Without explicit domain alignment and batch effect correction, scSemiCluster outperforms other state-of-the-art algorithms, making it the first to utilize both deep discriminative clustering and deep generative clustering in the single-cell field.
Motivation: The rapid development of single-cell RNA sequencing (scRNA-seq) technologies allows us to explore tissue heterogeneity at the cellular level. The identification of cell types plays an essential role in the analysis of scRNA-seq data, which, in turn, influences the discovery of regulatory genes that induce heterogeneity. As the scale of sequencing data increases, the classical method of combining clustering and differential expression analysis to annotate cells becomes more costly in terms of both labor and resources. Existing scRNA-seq supervised classification method can alleviate this issue through learning a classifier trained on the labeled reference data and then making a prediction based on the unlabeled target data. However, such label transference strategy carries with risks, such as susceptibility to batch effect and further compromise of inherent discrimination of target data. Results: In this article, inspired by unsupervised domain adaptation, we propose a flexible single cell semi-supervised clustering and annotation framework, scSemiCluster, which integrates the reference data and target data for training. We utilize structure similarity regularization on the reference domain to restrict the clustering solutions of the target domain. We also incorporates pairwise constraints in the feature learning process such that cells belonging to the same cluster are close to each other, and cells belonging to different clusters are far from each other in the latent space. Notably, without explicit domain alignment and batch effect correction, scSemiCluster outperforms other state-of-the-art, single-cell supervised classification and semi-supervised clustering annotation algorithms in both simulation and real data. To the best of our knowledge, we are the first to use both deep discriminative clustering and deep generative clustering techniques in the single-cell field.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据