4.8 Article

Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen

Journal

NATURE COMMUNICATIONS
Volume 12, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41467-021-23862-x

Keywords

-

Funding

  1. Fondation Pierre Mercier pour la science
  2. Swiss National Science Foundation [31003A_173265]
  3. Swiss National Science Foundation (SNF) [31003A_173265] Funding Source: Swiss National Science Foundation (SNF)

Ask authors/readers for more resources

This study analyzed a global set of telomere-to-telomere genome assemblies of a fungal pathogen of wheat to establish a nucleotide-level map of structural variation, demonstrating the causality between specific genomic features and chromosomal rearrangements. The use of machine learning to train a model on structural variation events revealed that base composition and gene density are major determinants of structural variation, with retrotransposons explaining most of the inversion, indel, and duplication events. The model trained on species-wide structural variation accurately predicted the position of over 74% of newly generated variants, highlighting the predictive power of specific sequence features in inducing chromosomal rearrangements.
Species harbor extensive structural variation underpinning recent adaptive evolution. However, the causality between genomic features and the induction of new rearrangements is poorly established. Here, we analyze a global set of telomere-to-telomere genome assemblies of a fungal pathogen of wheat to establish a nucleotide-level map of structural variation. We show that the recent emergence of pesticide resistance has been disproportionally driven by rearrangements. We use machine learning to train a model on structural variation events based on 30 chromosomal sequence features. We show that base composition and gene density are the major determinants of structural variation. Retrotransposons explain most inversion, indel and duplication events. We apply our model to Arabidopsis thaliana and show that our approach extends to more complex genomes. Finally, we analyze complete genomes of haploid offspring in a four-generation pedigree. Meiotic crossover locations are enriched for new rearrangements consistent with crossovers being mutational hotspots. The model trained on species-wide structural variation accurately predicts the position of >74% of newly generated variants along the pedigree. The predictive power highlights causality between specific sequence features and the induction of chromosomal rearrangements. Our work demonstrates that training sequence-derived models can accurately identify regions of intrinsic DNA instability in eukaryotic genomes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available