4.7 Article

An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses

Journal

BMC GENOMICS
Volume 21, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s12864-020-6647-4

Keywords

Retrovirus; Patient samples; HIV; Integration; PCR mispriming; PCR recombination

Funding

  1. National Cancer Institute, National Institutes of Health [HHSN261200800001E]
  2. Leidos Biomedical research, Inc. [l3XS110]
  3. NATIONAL CANCER INSTITUTE [ZIABC011426] Funding Source: NIH RePORTER

Ask authors/readers for more resources

Background All retroviruses, including human immunodeficiency virus (HIV), must integrate a DNA copy of their genomes into the genome of the infected host cell to replicate. Although integrated retroviral DNA, known as a provirus, can be found at many sites in the host genome, integration is not random. The adaption of linker-mediated PCR (LM-PCR) protocols for high-throughput integration site mapping, using randomly-sheared genomic DNA and Illumina paired-end sequencing, has dramatically increased the number of mapped integration sites. Analysis of samples from human donors has shown that there is clonal expansion of HIV infected cells and that clonal expansion makes an important contribution to HIV persistence. However, analysis of HIV integration sites in samples taken from patients requires extensive PCR amplification and high-throughput sequencing, which makes the methodology prone to certain specific artifacts. Results To address the problems with artifacts, we use a comprehensive approach involving experimental procedures linked to a bioinformatics analysis pipeline. Using this combined approach, we are able to reduce the number of PCR/sequencing artifacts that arise and identify the ones that remain. Our streamlined workflow combines random cleavage of the DNA in the samples, end repair, and linker ligation in a single step. We provide guidance on primer and linker design that reduces some of the common artifacts. We also discuss how to identify and remove some of the common artifacts, including the products of PCR mispriming and PCR recombination, that have appeared in some published studies. Our improved bioinformatics pipeline rapidly parses the sequencing data and identifies bona fide integration sites in clonally expanded cells, producing an Excel-formatted report that can be used for additional data processing. Conclusions We provide a detailed protocol that reduces the prevalence of artifacts that arise in the analysis of retroviral integration site data generated from in vivo samples and a bioinformatics pipeline that is able to remove the artifacts that remain.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available