4.6 Article

Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels

期刊

BMC BIOINFORMATICS
卷 22, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s12859-021-04144-1

关键词

Next-generation sequencing (NGS); Germline NGS data analysis; Variant calling; Alignment; Interval padding; Pipeline comparison

资金

  1. Cyprus Institute of Neurology and Genetics
  2. European Commission Research Executive Agency (REA) Grant BIORISE under the Spreading Excellence, Widening Participation, Science [669026]

向作者/读者索取更多资源

Next-generation sequencing (NGS) has revolutionized clinical genetics, but presents challenges in data analysis. This study compared 28 NGS data analysis pipelines and found that interval padding is crucial for detection of intronic variants. Recommendations include using BWA-MEM for alignment and a combination of GATK-HaplotypeCaller, SAMtools, and GATK-UnifiedGenotyper for variant calling. Further improvements in bioinformatics tools and pipelines are necessary for more reliable clinical variant detection.
Background Next-generation sequencing (NGS) represents a significant advancement in clinical genetics. However, its use creates several technical, data interpretation and management challenges. It is essential to follow a consistent data analysis pipeline to achieve the highest possible accuracy and avoid false variant calls. Herein, we aimed to compare the performance of twenty-eight combinations of NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM, Bowtie2, Stampy), variant calling (GATK-HaplotypeCaller, GATK-UnifiedGenotyper, SAMtools) and interval padding (null, 50 bp, 100 bp) methods, along with a commercially available pipeline (BWA Enrichment, Illumina (R)). Fourteen germline DNA samples from breast cancer patients were sequenced using a targeted NGS panel approach and subjected to data analysis. Results We highlight that interval padding is required for the accurate detection of intronic variants including spliceogenic pathogenic variants (PVs). In addition, using nearly default parameters, the BWA Enrichment algorithm, failed to detect these spliceogenic PVs and a missense PV in the TP53 gene. We also recommend the BWA-MEM algorithm for sequence alignment, whereas variant calling should be performed using a combination of variant calling algorithms; GATK-HaplotypeCaller and SAMtools for the accurate detection of insertions/deletions and GATK-UnifiedGenotyper for the efficient detection of single nucleotide variant calls. Conclusions These findings have important implications towards the identification of clinically actionable variants through panel testing in a clinical laboratory setting, when dedicated bioinformatics personnel might not always be available. The results also reveal the necessity of improving the existing tools and/or at the same time developing new pipelines to generate more reliable and more consistent data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据