4.7 Article

Structure-preserved dimension reduction using joint triplets sampling for multi-batch integration of single-cell transcriptomic data

期刊

BRIEFINGS IN BIOINFORMATICS
卷 24, 期 1, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbac608

关键词

single-cell RNA-seq; dimension reduction; triplets; structure preserved; batch effect

向作者/读者索取更多资源

Dimension reduction is crucial in single-cell RNA sequencing. This study proposes a novel method called SPDR, which can simultaneously identify cell types, preserve data structure, and handle batch effects. Comprehensive evaluations demonstrate that SPDR outperforms other existing methods in removing batch effects, preserving biological variation, facilitating visualization, and improving clustering accuracy.
Dimension reduction (DR) plays an important role in single-cell RNA sequencing (scRNA-seq), such as data interpretation, visualization and other downstream analysis. A desired DR method should be applicable to various application scenarios, including identifying cell types, preserving the inherent structure of data and handling with batch effects. However, most of the existing DR methods fail to accommodate these requirements simultaneously, especially removing batch effects. In this paper, we develop a novel structure preserved dimension reduction (SPDR) method using intra- and inter-batch triplets sampling. The constructed triplets jointly consider each anchor's mutual nearest neighbors from inter-batch, k-nearest neighbors from intra-batch and randomly selected cells from the whole data, which capture higher order structure information and meanwhile account for batch information of the data. Then we minimize a robust loss function for the chosen triplets to obtain a structure-preserved and batch-corrected low-dimensional representation. Comprehensive evaluations show that SPDR outperforms other competing DR methods, such as INSCT, IVIS, Trimap, Scanorama, scVI and UMAP, in removing batch effects, preserving biological variation, facilitating visualization and improving clustering accuracy. Besides, the two-dimensional (2D) embedding of SPDR presents a clear and authentic expression pattern, and can guide researchers to determine how many cell types should be identified. Furthermore, SPDR is robust to complex data characteristics (such as down-sampling, duplicates and outliers) and varying hyperparameter settings. We believe that SPDR will be a valuable tool for characterizing complex cellular heterogeneity.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据