4.7 Article

Chromosome-scale inference of hybrid speciation and admixture with convolutional neural networks

Journal

MOLECULAR ECOLOGY RESOURCES
Volume 21, Issue 8, Pages 2676-2688

Publisher

WILEY
DOI: 10.1111/1755-0998.13355

Keywords

admixture; convolutional neural networks; deep learning; gene flow; hybridization; model selection

Funding

  1. National Institute of General Medical Sciences [R01GM127348]
  2. National Science Foundation [IOS-1811784]

Ask authors/readers for more resources

In order to understand the process of speciation and uncover phylogenetic patterns, researchers use a deep learning method like CNNs to infer the frequency and mode of hybridization among closely related organisms. By analyzing genealogical discordance and selecting hybridization scenario models, this approach helps to better comprehend patterns of admixture, especially when dealing with closely linked data where nonindependence needs to be considered.
Inferring the frequency and mode of hybridization among closely related organisms is an important step for understanding the process of speciation and can help to uncover reticulated patterns of phylogeny more generally. Phylogenomic methods to test for the presence of hybridization come in many varieties and typically operate by leveraging expected patterns of genealogical discordance in the absence of hybridization. An important assumption made by these tests is that the data (genes or SNPs) are independent given the species tree. However, when the data are closely linked, it is especially important to consider their nonindependence. Recently, deep learning techniques such as convolutional neural networks (CNNs) have been used to perform population genetic inferences with linked SNPs coded as binary images. Here, we use CNNs for selecting among candidate hybridization scenarios using the tree topology (((P-1, P-2), P-3), Out) and a matrix of pairwise nucleotide divergence (d(XY)) calculated in windows across the genome. Using coalescent simulations to train and independently test a neural network showed that our method, HyDe-CNN, was able to accurately perform model selection for hybridization scenarios across a wide breath of parameter space. We then used HyDe-CNN to test models of admixture in Heliconius butterflies, as well as comparing it to phylogeny-based introgression statistics. Given the flexibility of our approach, the dropping cost of long-read sequencing and the continued improvement of CNN architectures, we anticipate that inferences of hybridization using deep learning methods like ours will help researchers to better understand patterns of admixture in their study organisms.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available