4.7 Article

Accurate and interpretable gene expression imputation on scRNA-seq data using IGSimpute

期刊

BRIEFINGS IN BIOINFORMATICS
卷 24, 期 3, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbad124

关键词

single-cell RNA sequencing; dropout imputation; deep neural network; model interpretability

向作者/读者索取更多资源

Single-cell ribonucleic acid sequencing (scRNA-seq) is used to quantify gene expression at the transcriptomic level with single-cell resolution, but excessive missing values in scRNA-seq data impede analysis. IGSimpute is an accurate and interpretable imputation method that outperforms 12 other methods in recovering missing values in scRNA-seq data. It also has applications in denoising gene expression profiles and indicating cellular age.
Single-cell ribonucleic acid sequencing (scRNA-seq) enables the quantification of gene expression at the transcriptomic level with single-cell resolution, enhancing our understanding of cellular heterogeneity. However, the excessive missing values present in scRNA-seq data hinder downstream analysis. While numerous imputation methods have been proposed to recover scRNA-seq data, high imputation performance often comes with low or no interpretability. Here, we present IGSimpute, an accurate and interpretable imputation method for recovering missing values in scRNA-seq data with an interpretable instance-wise gene selection layer (GSL). IGSimpute outperforms 12 other state-of-the-art imputation methods on 13 out of 17 datasets from different scRNA-seq technologies with the lowest mean squared error as the chosen benchmark metric. We demonstrate that IGSimpute can give unbiased estimates of the missing values compared to other methods, regardless of whether the average gene expression values are small or large. Clustering results of imputed profiles show that IGSimpute offers statistically significant improvement over other imputation methods. By taking the heart-and-aorta and the limb muscle tissues as examples, we show that IGSimpute can also denoise gene expression profiles by removing outlier entries with unexpectedly high expression values via the instance-wise GSL. We also show that genes selected by the instance-wise GSL could indicate the age of B cells from bladder fat tissue of the Tabula Muris Senis atlas. IGSimpute can impute one million cells using 64 min, and thus applicable to large datasets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据