4.7 Article

Exploration of whole genome amplification generated chimeric sequences in long-read sequencing data

Journal

BRIEFINGS IN BIOINFORMATICS
Volume -, Issue -, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbad275

Keywords

chimeric sequence; long-reads sequencing; multiple displacement amplification

Ask authors/readers for more resources

Multiple displacement amplification (MDA) is widely used for whole genome amplification, but the formation of chimeric sequences in MDA interferes with bioinformatics analysis. This study developed a pipeline for recognizing and restoring chimeras in long-read sequencing data, which can reduce the influence of chimeras and improve the analysis of structural variation.
Motivation: Multiple displacement amplification (MDA) has become the most commonly used method of whole genome amplification, generating a vast amount of DNA with higher molecular weight and greater genome coverage. Coupling with long-read sequencing, it is possible to sequence the amplicons of over 20 kb in length. However, the formation of chimeric sequences (chimeras, expressed as structural errors in sequencing data) in MDA seriously interferes with the bioinformatics analysis but its influence on long-read sequencing data is unknown.Results: We sequenced the phi29 DNA polymerase-mediated MDA amplicons on the PacBio platform and analyzed chimeras within the generated data. The 3(rd)-ChimeraMiner has been constructed as a pipeline for recognizing and restoring chimeras into the original structures in long-read sequencing data, improving the efficiency of using TGS data. Five long-read datasets and one high-fidelity long-read dataset with various amplification folds were analyzed. The result reveals that the mis-priming events in amplification are more frequently occurring than widely perceived, and the proportion gradually accumulates from 42% to over 78% as the amplification continues. In total, 99.92% of recognized chimeric sequences were demonstrated to be artifacts, whose structures were wrongly formed in MDA instead of existing in original genomes. By restoring chimeras to their original structures, the vast majority of supplementary alignments that introduce false-positive structural variants are recycled, removing 97% of inversions on average and contributing to the analysis of structural variation in MDA-amplified samples. The impact of chimeras in long-read sequencing data analysis should be emphasized, and the 3(rd)-ChimeraMiner can help to quantify and reduce the influence of chimeras.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available