4.7 Article

Segmental duplications: Organization and impact within the current Human Genome Project assembly

期刊

GENOME RESEARCH
卷 11, 期 6, 页码 1005-1017

出版社

COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
DOI: 10.1101/gr.GR-1871R

关键词

-

资金

  1. NCI NIH HHS [CA80295] Funding Source: Medline
  2. NIGMS NIH HHS [R01 GM058815, T32 GM007250, GM58815] Funding Source: Medline

向作者/读者索取更多资源

Segmental duplications play fundamental roles in both genomic disease and gene evolution. To understand their organization within the human genome, we have developed the computational tools and methods necessary to detect identity between long stretches of genomic sequence despite the presence of high copy repeats and large insertion-deletions. Here we present our analysis of the most recent genome assembly (January 2001) in which we focus on the global organization of these segments and the role they play in the whole-genome assembly process. initially, we considered only large recent duplication events that fell well-below levels of draft sequencing error (alignments 90%-98% similar and greater than or equal to1 kb in length). Duplications (90%-98%; greater than or equal to1 kb) comprise 3.6% of all human sequence. These duplications show clustering and up to 10-fold enrichment within pericentromeric and subtelomeric regions. In terms of assembly, duplicated sequences were found to be over-represented in unordered and unassigned contigs indicating that duplicated sequences are difficult to assign to their proper position. To assess coverage of these regions within the genome, we selected BACs containing interchromosomal duplications and characterized their duplication pattern by FISH. Only 47% (106/224) of chromosomes positive by FISH had a corresponding chromosomal position by BLAST comparison. We present data that indicate that this is attributable to misassembly, misassignment, and/or decreased sequencing coverage within duplicated regions. Surprisingly, if we consider putative duplications > 98% identity, we identify 10.6% (286 Mb) of the current assembly as paralogous. The majority of these alignments, we believe, represent unmerged overlaps within unique regions. Taken together the above data indicate that segmental duplications represent a significant impediment to accurate human genome assembly, requiring the development of specialized techniques to finish these exceptional regions of the genome. The identification and characterization of these highly duplicated regions represents an important step in the complete sequencing of a human reference genome.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据