☆ 4.7 Article

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

BMC GENOMICS (2021)

Journal

BMC GENOMICS

Volume 22, Issue 1, Pages -

Publisher

BMC

DOI: 10.1186/s12864-021-07686-z

Keywords

Copy number variation; Whole-genome sequencing; Ultra-low-coverage; Human embryonic stem cell

Funding

European Research Council ERC [677943]
European Union [675395]
Academy of Finland [296801, 304995, 310561, 314443, 329278, 116713]
Sigrid Juselius Foundation
University of Turku
Abo Akademi University
Univrsity of Turku Graduate School (UTUGS)
Biocenter Finland
ELIXIR Finland
Academy of Finland (AKA) [314443, 329278, 329278, 116713, 296801, 304995, 314443, 304995, 116713] Funding Source: Academy of Finland (AKA)
Marie Curie Actions (MSCA) [675395] Funding Source: Marie Curie Actions (MSCA)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study evaluated the performance of six read-depth based CNV detection algorithms in ultra-low-coverage WGS data, finding that these methods perform well in detecting large CNVs but may produce false positives with smaller CNVs. BIC-seq2 was identified as the best method in statistical performance, while FREEC was considered the second-best method due to its faster runtime.

BackgroundDetection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (0.0005-0.8x) data that is used in various research and clinical applications, such as digital karyotyping and single-cell CNV detection.ResultHere, the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was studied using ultra-low-coverage WGS data. Real-world array- and karyotyping kit-based validation were used as a benchmark in the evaluation. Additionally, ultra-low-coverage WGS data was simulated to investigate the ability of the algorithms to identify CNVs in the sex chromosomes and the theoretical minimum coverage at which these tools can accurately function. Our results suggest that while all the methods were able to detect large CNVs, many methods were susceptible to producing false positives when smaller CNVs (<2 Mbp) were detected. There was also significant variability in their ability to identify CNVs in the sex chromosomes. Overall, BIC-seq2 was found to be the best method in terms of statistical performance. However, its significant drawback was by far the slowest runtime among the methods (>3 h) compared with FREEC (similar to 3 min), which we considered the second-best method.ConclusionsOur comparative analysis demonstrates that CNV detection from ultra-low-coverage WGS data can be a highly accurate method for the detection of large copy number variations when their length is in millions of base pairs. These findings facilitate applications that utilize ultra-low-coverage CNV detection.

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

Journal

BMC GENOMICS

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Evaluation of tools for identifying large copy number variations from ultra-low-coverage whole-genome sequencing data

Journal

BMC GENOMICS

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper