4.7 Review

Supervised application of internal validation measures to benchmark dimensionality reduction methods in scRNA-seq data

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 22, Issue 6, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbab304

Keywords

single-cell RNA-sequencing; dimensionality reduction methods; internal validation measures; benchmarking

Funding

  1. University of New South Wales (UNSW) Faculty Research Grant (2019)
  2. Cellular Genomics Future Institute (CGFI) Seed Funding (2020)

Ask authors/readers for more resources

The study conducted a comprehensive benchmark of dimensionality reduction methods in scRNA-seq data, evaluating the performance of over 25000 low-dimensional embeddings across 33 methods and 55 datasets using IVMs. The findings suggest that hyperparameter optimization with IVMs can lead to near-optimal clustering results.
A typical single-cell RNA sequencing (scRNA-seq) experiment will measure on the order of 20000 transcripts and thousands, if not millions, of cells. The high dimensionality of such data presents serious complications for traditional data analysis methods and, as such, methods to reduce dimensionality play an integral role in many analysis pipelines. However, few studies have benchmarked the performance of these methods on scRNA-seq data, with existing comparisons assessing performance via downstream analysis accuracy measures, which may confound the interpretation of their results. Here, we present the most comprehensive benchmark of dimensionality reduction methods in scRNA-seq data to date, utilizing over 300000 compute hours to assess the performance of over 25000 low-dimension embeddings across 33 dimensionality reduction methods and 55 scRNA-seq datasets. We employ a simple, yet novel, approach, which does not rely on the results of downstream analyses. Internal validation measures (IVMs), traditionally used as an unsupervised method to assess clustering performance, are repurposed to measure how well-formed biological clusters are after dimensionality reduction. Performance was further evaluated over nearly 200000000 iterations of DBSCAN, a density-based clustering algorithm, showing that hyperparameter optimization using IVMs as the objective function leads to near-optimal clustering. Methods were also assessed on the extent to which they preserve the global structure of the data, and on their computational memory and time requirements across a large range of sample sizes. Our comprehensive benchmarking analysis provides a valuable resource for researchers and aims to guide best practice for dimensionality reduction in scRNA-seq analyses, and we highlight Latent Dirichlet Allocation and Potential of Heat-diffusion for Affinity-based Transition Embedding as high-performing algorithms.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available