4.6 Article Proceedings Paper

Detrimental effects of duplicate reads and low complexity regions on RNA- and ChIP-seq data

期刊

BMC BIOINFORMATICS
卷 16, 期 -, 页码 -

出版社

BMC
DOI: 10.1186/1471-2105-16-S13-S10

关键词

-

资金

  1. Grants-in-Aid for Scientific Research [15J03981] Funding Source: KAKEN

向作者/读者索取更多资源

Background: Adapter trimming and removal of duplicate reads are common practices in next-generation sequencing pipelines. Sequencing reads ambiguously mapped to repetitive and low complexity regions can also be problematic for accurate assessment of the biological signal, yet their impact on sequencing data has not received much attention. We investigate how trimming the adapters, removing duplicates, and filtering out reads overlapping low complexity regions influence the significance of biological signal in RNA- and ChIP-seq experiments. Methods: We assessed the effect of data processing steps on the alignment statistics and the functional enrichment analysis results of RNA- and ChIP-seq data. We compared differentially processed RNA-seq data with matching microarray data on the same patient samples to determine whether changes in pre-processing improved correlation between the two. We have developed a simple tool to remove low complexity regions, RepeatSoaker, available at https://github.com/mdozmorov/RepeatSoaker, and tested its effect on the alignment statistics and the results of the enrichment analyses. Results: Both adapter trimming and duplicate removal moderately improved the strength of biological signals in RNA-seq and ChIP-seq data. Aggressive filtering of reads overlapping with low complexity regions, as defined by RepeatMasker, further improved the strength of biological signals, and the correlation between RNA-seq and microarray gene expression data. Conclusions: Adapter trimming and duplicates removal, coupled with filtering out reads overlapping low complexity regions, is shown to increase the quality and reliability of detecting biological signals in RNA-seq and ChIP-seq data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据