3.8 Article

Penalized Latent Dirichlet Allocation Model in Single-Cell RNA Sequencing

期刊

STATISTICS IN BIOSCIENCES
卷 13, 期 3, 页码 543-562

出版社

SPRINGER
DOI: 10.1007/s12561-021-09304-8

关键词

Single-cell RNA sequencing; Latent Dirichlet Allocation; Topic models; Genomics; Transcriptomics

资金

  1. National Institute of General Medical Sciences [R01GM122083]

向作者/读者索取更多资源

Single-cell RNA sequencing quantifies gene expression variation at individual cell level, and the penalized Latent Dirichlet Allocation (pLDA) model is developed to reduce the data dimension and extract robust and interpretable biological information from scRNA-seq data. The pLDA model considers genes as words, cells as documents, and latent biological functions as topics, showing improved performances in cell-type classification and providing interpretable topics with biological functions.
Single-cell RNA sequencing (scRNA-seq) quantifies RNA transcripts at individual cell level, providing cellular-level resolution of gene expression variation. The scRNA-seq data are counts of RNA transcripts of all genes in species' genome, which are of very high dimension and contain excessive zero counts. In order to better reduce the data dimension and extract robust and interpretable biological information, we develop a penalized Latent Dirichlet Allocation (pLDA) model for scRNA-seq data. The method is adapted from the generative probabilistic model LDA originated in natural language processing. pLDA models the scRNA-seq data by considering genes as words, cells as documents, and latent biological functions as topics. It imposes a penalty to reflect the characteristics in scRNA-seq that only a small subset of genes are expected to be topic-specific, which increases the robustness of the estimation and interpretability of the results. We apply pLDA to scRNA-seq datasets from both Drop-seq and SMARTer v1 technologies, and demonstrate improved performances in cell-type classification. The topics identified by pLDA are interpretable with biological functions.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据