4.6 Article

A Systematic Evaluation of High-Throughput Sequencing Approaches to Identify Low-Frequency Single Nucleotide Variants in Viral Populations

期刊

VIRUSES-BASEL
卷 12, 期 10, 页码 -

出版社

MDPI
DOI: 10.3390/v12101187

关键词

high-throughput sequencing; viral populations; sub-consensus variants; sequencing error

类别

资金

  1. Biotechnology and Biological Sciences Research Council (BBSRC) of the United Kingdom [BBS/E/I/00007035, BBS/E/I/00007036, BBS/E/I/00007037]
  2. BBSRC Industrial CASE studentship award [1646570]
  3. Defra [SE2944]
  4. Veterinary Biocontained Facility Network for Excellence in Animal Infectious Disease Research and Experimentation (VetBioNext) grant (Horizon 2020) [731014]
  5. BBSRC [1646570, BBS/E/I/00007035, BBS/E/I/00007037, BBS/E/I/00007036] Funding Source: UKRI

向作者/读者索取更多资源

High-throughput sequencing such as those provided by Illumina are an efficient way to understand sequence variation within viral populations. However, challenges exist in distinguishing process-introduced error from biological variance, which significantly impacts our ability to identify sub-consensus single-nucleotide variants (SNVs). Here we have taken a systematic approach to evaluate laboratory and bioinformatic pipelines to accurately identify low-frequency SNVs in viral populations. Artificial DNA and RNA populations were created by introducing known SNVs at predetermined frequencies into template nucleic acid before being sequenced on an Illumina MiSeq platform. These were used to assess the effects of abundance and starting input material type, technical replicates, read length and quality, short-read aligner, and percentage frequency thresholds on the ability to accurately call variants. Analyses revealed that the abundance and type of input nucleic acid had the greatest impact on the accuracy of SNV calling as measured by a micro-averaged Matthews correlation coefficient score, with DNA and high RNA inputs (10(7) copies) allowing for variants to be called at a 0.2% frequency. Reduced input RNA (10(5) copies) required more technical replicates to maintain accuracy, while low RNA inputs (10(3) copies) suffered from consensus-level errors. Base errors identified at specific motifs identified in all technical replicates were also identified which can be excluded to further increase SNV calling accuracy. These findings indicate that samples with low RNA inputs should be excluded for SNV calling and reinforce the importance of optimising the technical and bioinformatics steps in pipelines that are used to accurately identify sequence variants.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据