☆ 4.5 Article

scVAEBGM: Clustering Analysis of Single-Cell ATAC-seq Data Using a Deep Generative Model

INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES (2022)

Journal

INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES

Volume 14, Issue 4, Pages 917-928

Publisher

SPRINGER HEIDELBERG

DOI: 10.1007/s12539-022-00536-w

Keywords

scATAC-seq; Clustering; Deep learning; Variational autoencoder; Bayesian Gaussian-mixture model

Funding

National Natural Science Foundation of China [61902216, 61972236, 61972226]
Natural Science Foundation of Shandong Province [ZR2018MF013]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The development of single-cell technologies has led to a surge in research, particularly in the analysis of chromatin accessibility differences at the single-cell level using scATAC-seq. However, challenges in distinguishing cell types have emerged due to the increasing number of cells and data characteristics. We propose a method called scVAEBGM, which combines a Variational Autoencoder (VAE) with a Bayesian Gaussian-mixture model (BGM) to process and analyze scATAC-seq data. This method can estimate the number of cell types without prior knowledge and is more robust to noise and better represents single-cell data in lower dimensions.

A surge in research has occurred because of current developments in single-cell technologies. Above all, single-cell Assay for Transposase-Accessible Chromatin with high throughput sequencing (scATAC-seq) is a popular approach of analyzing chromatin accessibility differences at the level of single cell, either within or between groups. As a result, it is critical to examine cell heterogeneity at a previously unseen level and to identify both recognized and unknown cell types. However, with the ever-increasing number of cells engendered by technological development and the characteristics of the data, such as high noise, sparsity and dimension, challenges in distinguishing cell types have emerged. We propose scVAEBGM, which integrates a Variational Autoencoder (VAE) with a Bayesian Gaussian-mixture model (BGM) to process and analyze scATAC-seq data. This method combines and takes benefits of a Bayesian Gaussian mixture model to estimate the number of cell types without determining the cluster number in a beforehand. In other words, the size of the clusters is inferred from the data, thus avoiding biases introduced by subjective assessments when manually determining the size of the clusters. Additionally, the method is more robust to noise and can better represent single-cell data in lower dimensions. We also create a further clustering strategy. It is indicated by experiments that further clustering based on the already completed clustering can improve the clustering accuracy again. We test on six public datasets, and scVAEBGM outperforms various dimension reduction baselines. In downstream applications, scVAEBGM can reveal biological cell types. [GRAPHICS] .

scVAEBGM: Clustering Analysis of Single-Cell ATAC-seq Data Using a Deep Generative Model

Journal

INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

scVAEBGM: Clustering Analysis of Single-Cell ATAC-seq Data Using a Deep Generative Model

Journal

INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES

Publisher

SPRINGER HEIDELBERG

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper