4.4 Article

Generating human papillomavirus (HPV) reference databases to maximize genomic mapping

Journal

ARCHIVES OF VIROLOGY
Volume 167, Issue 1, Pages 57-65

Publisher

SPRINGER WIEN
DOI: 10.1007/s00705-021-05256-y

Keywords

-

Categories

Funding

  1. Mexican Council of Science and Technology, CONACYT [252984, 263943, 271386]

Ask authors/readers for more resources

Genomic experiments analyzing human papillomaviruses (HPVs) require a carefully selected list of sequences as a reference database to map millions of reads. However, existing sources are organized based on variations in the L1 gene and use complex multiple sequence alignments, complicating the process and leading to per-analysis-defined databases.
Genomic experiments analyzing human papillomaviruses (HPVs) require a carefully selected list of sequences as a reference database to map millions of reads. The available sources, such as the Papillomavirus Episteme (PaVE), are organized based on variations in the L1 gene rather than the whole HPV sequence. Moreover, the PaVE process uses complex multiple sequence alignments containing hundreds or thousands of sequences. These issues complicate the generation of a reference database for genomics, leading to the generation of per-analysis-defined databases. Here, we propose a de novo strategy considering all HPV sequences reported in the NCBI database to define a subset of highly representative HPV sequences. The strategy is based on oligonucleotide frequency profiling of the whole sequence followed by hierarchical clustering. Using data from HPV capture experiments, we demonstrate that this strategy selects suitable sequences as a reference database to map most mappable reads unambiguously. We provide some recommendations to improve HPV mapping. The generated .fasta files can be accessed at https://github.com/vtrevino/HPV-Ref-Genomes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available