4.8 Article

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

期刊

NUCLEIC ACIDS RESEARCH
卷 47, 期 21, 页码 10994-11006

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkz841

关键词

-

资金

  1. EU COST-Action [BM1405]
  2. Research Council of Norway [251076]
  3. University of Oslo, Faculty of Mathematics and Natural Sciences
  4. Institute of Informatics [BK-204/RAU2/2019]
  5. European Union through the European Social Fund [POWR.03.02.00-00-I029]
  6. Institutional Funds, University of Oslo

向作者/读者索取更多资源

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据