4.7 Editorial Material

Sequencing platform shifts provide opportunities but pose challenges for combining genomic data sets

期刊

MOLECULAR ECOLOGY RESOURCES
卷 21, 期 3, 页码 653-660

出版社

WILEY
DOI: 10.1111/1755-0998.13309

关键词

HiSeq; NGS; NovaSeq; poly-G; reproducibility; reusability

资金

  1. SNSF grant [31003A_163446]
  2. grant 'SeeWandel: Life in Lake Constance-the past, present and future' - European Regional Development Fund
  3. Swiss Confederation
  4. Swiss Federal Office for the Environment
  5. Eawag (Eawag Discretionary Funds 2018-2022)
  6. FCT - Portuguese National Science Foundation (Fundacao para a Ciencia e a Tecnologia-FCT) [SFRH/BD/145153/2019]
  7. Fundação para a Ciência e a Tecnologia [SFRH/BD/145153/2019] Funding Source: FCT
  8. Swiss National Science Foundation (SNF) [31003A_163446] Funding Source: Swiss National Science Foundation (SNF)

向作者/读者索取更多资源

Technological advances in DNA sequencing have enabled the production and curation of large genomic data sets in nonmodel species, but also present challenges in combining data from different sequencing platforms. Combining data from various platforms may introduce potential biases and errors in base calling, highlighting the importance of caution and proper solutions when analyzing such data. Archiving tissue samples and associated sequences is essential for reproducibility and reusability of sequencing data in the face of evolving sequencing platform technology.
Technological advances in DNA sequencing over the last decade now permit the production and curation of large genomic data sets in an increasing number of nonmodel species. Additionally, these new data provide the opportunity for combining data sets, resulting in larger studies with a broader taxonomic range. Whilst the development of new sequencing platforms has been beneficial, resulting in a higher throughput of data at a lower per-base cost, shifts in sequencing technology can also pose challenges for those wishing to combine new sequencing data with data sequenced on older platforms. Here, we outline the types of studies where the use of curated data might be beneficial, and highlight potential biases that might be introduced by combining data from different sequencing platforms. As an example of the challenges associated with combining data across sequencing platforms, we focus on the impact of the shift in Illumina's base calling technology from a four-channel system to a two-channel system. We caution that when data are combined from these two systems, erroneous guanine base calls that result from the two-channel chemistry can make their way through a bioinformatic pipeline, eventually leading to inaccurate and potentially misleading conclusions. We also suggest solutions for dealing with such potential artefacts, which make samples sequenced on different sequencing platforms appear more differentiated from one another than they really are. Finally, we stress the importance of archiving tissue samples and the associated sequences for the continued reproducibility and reusability of sequencing data in the face of ever-changing sequencing platform technology.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据