4.6 Article

Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data

期刊

BMC BIOINFORMATICS
卷 18, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/s12859-017-1847-x

关键词

GTEx; RNA-Seq; Quality control; Filtering; Preprocessing; Normalization

资金

  1. US National institutes of Health
  2. National Heart, Lung, and Blood Institute [5P01HL105339, 5R01HL111759, 5P01HL114501, K25HL133599]
  3. National Cancer Institute [5P50CA127003, 1R35CA197449, 1U01CA190234, 5P30CA006516, P50CA165962]
  4. National Institute of Allergy and Infectious Disease [5R01AI099204]
  5. Charles A. King Postdoctoral Research Fellowship Program, Sara Elizabeth O'Brien, Bank of America
  6. NVIDIA foundation

向作者/读者索取更多资源

Background: Although ultrahigh-throughput RNA-Sequencing has become the dominant technology for genomewide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data -critical first steps for any subsequent analysis. Results: We find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project. Conclusions: An R package instantiating YARN is available at http://bioconductor. org/packages/yarn.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据