4.7 Article

The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets

期刊

GIGASCIENCE
卷 10, 期 3, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/gigascience/giab011

关键词

RNA-Seq; sequencing; depth; duplicate; unmapped; exonic; quality

资金

  1. American Association for Cancer Research NextGen Grant for Transformative Cancer Research Award
  2. Emily Beazley Kures for Kids Fund St. Baldrick's Consortium Grant
  3. Alex's Lemonade Stand Foundation for Childhood Cancer Research
  4. Unravel Pediatric Cancer
  5. Team G Childhood Cancer Foundation
  6. California Initiative to Advance Precision Medicine
  7. Live for Others Foundation
  8. Schmidt Futures Foundation

向作者/读者索取更多资源

The reproducibility of gene expression measured by RNA-Seq is dependent on sequencing depth, with MEND reads being a useful measure for reproducibility. The fraction of reads contributing to reproducibility varies greatly among datasets, suggesting the importance of reporting sequencing depth in MEND reads.
Background: The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis. Findings: In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1-77% of all reads (median [IQR], 3% [3-6%]); duplicate reads constitute 3-100% of mapped reads (median [IQR], 27% [13-43%]); and non-exonic reads constitute 4-97% of mapped, non-duplicate reads (median [IQR], 25% [16-37%]). MEND reads constitute 0-79% of total reads (median [IQR], 50% [30-61%]). Conclusions: Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据