3.8 Proceedings Paper

ddRAD-seq variant calling in peach and the effect of removing PCR duplicates

Journal

X INTERNATIONAL PEACH SYMPOSIUM
Volume 1352, Issue -, Pages 405-412

Publisher

INT SOC HORTICULTURAL SCIENCE
DOI: 10.17660/ActaHortic.2022.1352.56

Keywords

Prunus persica; DNA-variants; SAMtools; Stacks

Funding

  1. Spanish Ministry of Economy and the Government of Aragon - FEDER funds [AGL2014-52063-R, AGL2017-83358-R, 2020AEP119 y A09_20R]
  2. Government of Aragon

Ask authors/readers for more resources

Double digest RAD-seq (ddRAD-seq) is a popular genotyping method in plants. This study evaluates the necessity and effects of PCR duplicates on SNP and indel calling, and provides a reproducible workflow for variant detection.
Double digest RAD-seq (ddRAD-seq) is a flexible and cost-effective strategy that has emerged as one of the most popular genotyping approaches in plants. It relies on combining two restriction enzymes for library preparation followed by PCR amplification of the template molecules. However, PCR introduces sequence duplicates and may erroneously inflate the confidence of genotype calls at a particular site. Although the process of variant calling is relatively straightforward, it is time-consuming, involving multiple steps. Thus, removing any unneeded steps would reduce the computation time and simplify the analysis. Hence, the primary aim of this study is to evaluate the necessity of PCR duplicates and their effects on SNP and indel calling in peach. On the other hand, the accuracy of genetic variant identification in plants is a crucial step toward understanding phenotypical traits and monitoring breeding programs. However, false positive calls are a common issue that could hamper the detection of relevant variants. Thereby, a good combination of computational tools for alignment and variant calling is crucial to tackle these artifacts. In response to this challenge, three variant callers (BCFtools-mpileup, Freebayes and GATK-HaplotypeCaller) were combined on top of the BWA-mem read mapper. Variants derived from the intersection of these callers are selected as a high confidence set and flagged for subsequent analysis. The pipeline is documented and available as a set of Makefiles that can be adapted to any species. This work provides useful guidelines and a reproducible workflow for variant detection using ddRAD-seq data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available