4.7 Article

In Silico Evaluation of Variant Calling Methods for Bacterial Whole-Genome Sequencing Assays

Journal

JOURNAL OF CLINICAL MICROBIOLOGY
Volume 61, Issue 8, Pages -

Publisher

AMER SOC MICROBIOLOGY
DOI: 10.1128/jcm.01842-22

Keywords

bioinformatics; whole-genome sequencing; bacterial genomes; next generation sequencing; computer simulation; DNA mutational analysis

Categories

Ask authors/readers for more resources

The identification and analysis of clinically relevant bacteria strains increasingly rely on whole-genome sequencing. However, the accuracy of variant callers for short-read sequences has seldom been validated against haploid genomes. In this study, a computational workflow was developed to introduce mutations into bacterial reference genomes and generate synthetic sequencing reads. The method was applied to three different bacterial strains, and several variant callers were evaluated using the synthetic reads as a truth set. The results showed that variant callers with high-quality soft-clipped reads and base mismatches had the highest precision and recall for identifying insertions and deletions.
Identification and analysis of clinically relevant strains of bacteria increasingly relies on whole-genome sequencing. The downstream bioinformatics steps necessary for calling variants from short-read sequences are well-established but seldom validated against haploid genomes. We devised an in silico workflow to introduce single nucleotide polymorphisms (SNP) and indels into bacterial reference genomes, and computationally generate sequencing reads based on the mutated genomes. We then applied the method to Mycobacterium tuberculosis H37Rv, Staphylococcus aureus NCTC 8325, and Klebsiella pneumoniae HS11286, and used the synthetic reads as truth sets for evaluating several popular variant callers. Insertions proved especially challenging for most variant callers to correctly identify, relative to deletions and single nucleotide polymorphisms. With adequate read depth, however, variant callers that use high quality soft-clipped reads and base mismatches to perform local realignment consistently had the highest precision and recall in identifying insertions and deletions ranging from1 to 50 bp. The remaining variant callers had lower recall values associated with identification of insertions greater than 20 bp. Identification and analysis of clinically relevant strains of bacteria increasingly relies on whole-genome sequencing. The downstream bioinformatics steps necessary for calling variants from short-read sequences are well-established but seldom validated against haploid genomes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available