4.7 Article

CosTaL: an accurate and scalable graph-based clustering algorithm for high-dimensional single-cell data analysis

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 24, Issue 3, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbad157

Keywords

Clustering; Mass Cytometry; Flow Cytometry; Single-cell RNA sequencing; k nearest neighbors; Graph-based clustering

Ask authors/readers for more resources

We describe a method called Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) to analyze large-sized multidimensional single-cell datasets. CosTaL transforms high-dimensional feature cells into a weighted k-nearest-neighbor (kNN) graph, where cells are represented by vertices and edges represent the relatedness between cells. CosTaL achieves equivalent or higher effectiveness scores compared to other graph-based clustering methods on benchmark datasets, demonstrating its high efficiency for small datasets and acceptable scalability for large datasets.
With the aim of analyzing large-sized multidimensional single-cell datasets, we are describing a method for Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL). As a graph-based clustering method, CosTaL transforms the cells with high-dimensional features into a weighted k-nearest-neighbor (kNN) graph. The cells are represented by the vertices of the graph, while an edge between two vertices in the graph represents the close relatedness between the two cells. Specifically, CosTaL builds an exact kNN graph using cosine similarity and uses the Tanimoto coefficient as the refining strategy to re-weight the edges in order to improve the effectiveness of clustering. We demonstrate that CosTaL generally achieves equivalent or higher effectiveness scores on seven benchmark cytometry datasets and six single-cell RNA-sequencing datasets using six different evaluation metrics, compared with other state-of-the-art graph-based clustering methods, including PhenoGraph, Scanpy and PARC. As indicated by the combined evaluation metrics, Costal has high efficiency with small datasets and acceptable scalability for large datasets, which is beneficial for large-scale analysis.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available