4.7 Article

Automated quality control and cell identification of droplet-based single-cell data using dropkick

期刊

GENOME RESEARCH
卷 31, 期 10, 页码 1742-1752

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
DOI: 10.1101/gr.271908.120

关键词

-

资金

  1. National Institutes of Health (NIH) [R01DK103831]
  2. National Institute of Diabetes and Digestive and Kidney Diseases [National Cancer Institute] [P50CA236733, U01CA215798, U54CA217450]
  3. NIH (National Institute of General Medical Sciences) [U2CCA233291]
  4. NIH (National Cancer Institute) [U2CCA233291]

向作者/读者索取更多资源

dropkick is a fully automated software tool for quality control and filtering of single-cell RNA sequencing data, which outperforms conventional thresholding approaches and EmptyDrops in recovering rare cell types and excluding uninformative barcodes. It provides a fast and reproducible solution for cell identification critical to downstream analysis, compatible with popular single-cell Python packages.
A major challenge for droplet-based single-cell sequencing technologies is distinguishing true cells from uninformative barcodes in data sets with disparate library sizes confounded by high technical noise (i.e., batch-specific ambient RNA). We present dropkick, a fully automated software tool for quality control and filtering of single-cell RNA sequencing (scRNA-seq) data with a focus on excluding ambient barcodes and recovering real cells bordering the quality threshold. By automatically determining data set-specific training labels based on predictive global heuristics, dropkick learns a gene-based representation of real cells and ambient noise, calculating a cell probability score for each barcode. Using simulated and real-world scRNA-seq data, we benchmarked dropkick against conventional thresholding approaches and EmptyDrops, a popular computational method, showing greater recovery of rare cell types and exclusion of empty droplets and noisy, uninformative barcodes. We show for both low- and high-background data sets that dropkick's weakly supervised model reliably learns which genes are enriched in ambient barcodes and draws a multidimensional boundary that is more robust to data set-specific variation than existing filtering approaches. dropkick provides a fast, automated tool for reproducible cell identification from scRNA-seq data that is critical to downstream analysis and compatible with popular single-cell Python packages.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据