4.7 Article

SAIC: an iterative clustering approach for analysis of single cell RNA-seq data

期刊

BMC GENOMICS
卷 18, 期 -, 页码 -

出版社

BIOMED CENTRAL LTD
DOI: 10.1186/s12864-017-4019-5

关键词

Single cell; RNA-seq; Clustering; K-means; ANOVA; PCA; Signature genes; T-SNE

资金

  1. NIH [P30 CA033572, R01 NS075393]

向作者/读者索取更多资源

Background: Research interests toward single cell analysis have greatly increased in basic, translational and clinical research areas recently, as advances in whole-transcriptome amplification technique allow scientists to get accurate sequencing result at single cell level. An important step in the single-cell transcriptome analysis is to identify distinct cell groups that have different gene expression patterns. Currently there are limited bioinformatics approaches available for single-cell RNA-seq analysis. Many studies rely on principal component analysis (PCA) with arbitrary parameters to identify the genes that will be used to cluster the single cells. Results: We have developed a novel algorithm, called SAIC (Single cell Analysis via Iterative Clustering), that identifies the optimal set of signature genes to separate single cells into distinct groups. Our method utilizes an iterative clustering approach to perform an exhaustive search for the best parameters within the search space, which is defined by a number of initial centers and P values. The end point is identification of a signature gene set that gives the best separation of the cell clusters. Using a simulated data set, we showed that SAIC can successfully identify the pre-defined signature gene sets that can correctly separated the cells into predefined clusters. We applied SAIC to two published single cell RNA-seq datasets. For both datasets, SAIC was able to identify a subset of signature genes that can cluster the single cells into groups that are consistent with the published results. The signature genes identified by SAIC resulted in better clusters of cells based on DB index score, and many genes also showed tissue specific expression. Conclusions: In summary, we have developed an efficient algorithm to identify the optimal subset of genes that separate single cells into distinct clusters based on their expression patterns. We have shown that it performs better than PCA method using published single cell RNA-seq datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据