4.6 Review

Probably Correct: Rescuing Repeats with Short and Long Reads

Journal

GENES
Volume 12, Issue 1, Pages -

Publisher

MDPI
DOI: 10.3390/genes12010048

Keywords

repeats; satellite; multi-mapping; reference; long reads

Funding

  1. Ministry of Education, Youth, and Sports of the Czech Republic under the project CEITEC 2020 [LQ1601]

Ask authors/readers for more resources

The challenge of assembling short reads into a high-quality reference genome has been complicated by the repetitive nature of the human genome. The emergence of long reads has allowed for better characterization of difficult genomic regions and differentiation of identical sequences based on epigenetic marks. Although long reads still contain some sequencing errors, they provide new possibilities for solving the problem of multi-mapping reads.
Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from telomere to telomere. Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available