4.7 Review

Prevention, diagnosis and treatment of high-throughput sequencing data pathologies

期刊

MOLECULAR ECOLOGY
卷 23, 期 7, 页码 1679-1700

出版社

WILEY
DOI: 10.1111/mec.12680

关键词

preprocessing; bioinformatics; high-throughput sequencing; next-generation sequencing; sequence read; quality control

资金

  1. National Science Foundation [DEB-0844968]
  2. March of Dimes
  3. Division Of Environmental Biology
  4. Direct For Biological Sciences [0844968] Funding Source: National Science Foundation

向作者/读者索取更多资源

High-throughput sequencing (HTS) technologies generate millions of sequence reads from DNA/RNA molecules rapidly and cost-effectively, enabling single investigator laboratories to address a variety of 'omics' questions in nonmodel organisms, fundamentally changing the way genomic approaches are used to advance biological research. One major challenge posed by HTS is the complexity and difficulty of data quality control (QC). While QC issues associated with sample isolation, library preparation and sequencing are well known and protocols for their handling are widely available, the QC of the actual sequence reads generated by HTS is often overlooked. HTS-generated sequence reads can contain various errors, biases and artefacts whose identification and amelioration can greatly impact subsequent data analysis. However, a systematic survey on QC procedures for HTS data is still lacking. In this review, we begin by presenting standard 'health check-up' QC procedures recommended for HTS data sets and establishing what 'healthy' HTS data look like. We next proceed by classifying errors, biases and artefacts present in HTS data into three major types of 'pathologies', discussing their causes and symptoms and illustrating with examples their diagnosis and impact on downstream analyses. We conclude this review by offering examples of successful 'treatment' protocols and recommendations on standard practices and treatment options. Notwithstanding the speed with which HTS technologies - and consequently their pathologies - change, we argue that careful QC of HTS data is an important - yet often neglected - aspect of their application in molecular ecology, and lay the groundwork for developing a HTS data QC 'best practices' guide.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据