4.8 Article

Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments

期刊

NUCLEIC ACIDS RESEARCH
卷 43, 期 20, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkv671

关键词

-

资金

  1. NHGRI [5U41HG002371, 3U41HG004568-09S1]
  2. Kent Informatics Inc.

向作者/读者索取更多资源

The human reference assembly remains incomplete due to the underrepresentation of repeat-rich sequences that are found within centromeric regions and acrocentric short arms. Although these sequences are marginally represented in the assembly, they are often fully represented in whole-genome short-read datasets and contribute to inappropriate alignments and high read-depth signals that localize to a small number of assembled homologous regions. As a consequence, these regions often provide artifactual peak calls that confound hypothesis testing and large-scale genomic studies. To address this problem, we have constructed mapping targets that represent roughly 8% of the human genome generally omitted from the human reference assembly. By integrating these data into standard mapping and peak-calling pipelines we demonstrate a 10-fold reduction in signals in regions common to the blacklisted region and identify a comprehensive set of regions that exhibit mapping sensitivity with the presence of the repeat-rich targets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据