☆ 4.7 Article

ACTIVA: realistic single-cell RNA-seq generation with automatic cell-type identification using introspective variational autoencoders

BIOINFORMATICS (2022)

Journal

BIOINFORMATICS

Volume 38, Issue 8, Pages 2194-2201

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btac095

Keywords

Funding

National Institutes of Health [R15HL146779, R01-GM126548]
National Science Foundation [DMS1840265]
University of California Office of the President
University of California Merced COVID-19 Seed Grant

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this work, a novel framework called ACTIVA is proposed for generating realistic synthetic data using a single-stream adversarial variational autoencoder conditioned with cell-type information. Compared to other GAN-based models, ACTIVA generates cells that are more realistic, harder to identify, and have better gene-gene correlation. Data augmentation with ACTIVA significantly improves classification of rare subtypes and reduces runtime compared to other models.

Motivation: Single-cell RNA sequencing (scRNAseq) technologies allow for measurements of gene expression at a single-cell resolution. This provides researchers with a tremendous advantage for detecting heterogeneity, delineating cellular maps or identifying rare subpopulations. However, a critical complication remains: the low number of single-cell observations due to limitations by rarity of subpopulation, tissue degradation or cost. This absence of sufficient data may cause inaccuracy or irreproducibility of downstream analysis. In this work, we present Automated Cell-Type-informed Introspective Variational Autoencoder (ACTIVA): a novel framework for generating realistic synthetic data using a single-stream adversarial variational autoencoder conditioned with cell-type information. Within a single framework, ACTIVA can enlarge existing datasets and generate specific subpopulations on demand, as opposed to two separate models [such as single-cell GAN (scGAN) and conditional scGAN (cscGAN)]. Data generation and augmentation with ACTIVA can enhance scRNAseq pipelines and analysis, such as benchmarking new algorithms, studying the accuracy of classifiers and detecting marker genes. ACTIVA will facilitate analysis of smaller datasets, potentially reducing the number of patients and animals necessary in initial studies. Results: We train and evaluate models on multiple public scRNAseq datasets. In comparison to GAN-based models (scGAN and cscGAN), we demonstrate that ACTIVA generates cells that are more realistic and harder for classifiers to identify as synthetic which also have better pair-wise correlation between genes. Data augmentation with ACTIVA significantly improves classification of rare subtypes (more than 45% improvement compared with not augmenting and 4% better than cscGAN) all while reducing run-time by an order of magnitude in comparison to both models.

ACTIVA: realistic single-cell RNA-seq generation with automatic cell-type identification using introspective variational autoencoders

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

ACTIVA: realistic single-cell RNA-seq generation with automatic cell-type identification using introspective variational autoencoders

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper