4.6 Article

Coestimating Reticulate Phylogenies and Gene Trees from Multilocus Sequence Data

Journal

SYSTEMATIC BIOLOGY
Volume 67, Issue 3, Pages 439-457

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/sysbio/syx085

Keywords

Bayesian inference; incomplete lineage sorting; multispecies network coalescent; phylogenetic network; reticulation; RJMCMC

Funding

  1. National Science Foundation [CCF-1302179, CCF-1514177, DBI-1062463, OCI-0959097, CNS-1338099]
  2. Direct For Biological Sciences
  3. Div Of Biological Infrastructure [1355998] Funding Source: National Science Foundation
  4. Division of Computing and Communication Foundations
  5. Direct For Computer & Info Scie & Enginr [1514177] Funding Source: National Science Foundation

Ask authors/readers for more resources

The multispecies network coalescent (MSNC) is a stochastic process that captures how gene trees grow within the branches of a phylogenetic network. Coupling the MSNC with a stochastic mutational process that operates along the branches of the gene trees gives rise to a generative model of how multiple loci from within and across species evolve in the presence of both incomplete lineage sorting (ILS) and reticulation (e.g., hybridization). We report on a Bayesian method for sampling the parameters of this generative model, including the species phylogeny, gene trees, divergence times, and population sizes, from DNA sequences of multiple independent loci. We demonstrate the utility of our method by analyzing simulated data and reanalyzing an empirical data set. Our results demonstrate the significance of not only coestimating species phylogenies and gene trees, but also accounting for reticulation and ILS simultaneously. In particular, we show that when gene flow occurs, our method accurately estimates the evolutionary histories, coalescence times, and divergence times. Tree inference methods, on the other hand, underestimate divergence times and overestimate coalescence times when the evolutionary history is reticulate. While the MSNC corresponds to an abstract model of intermixture, we study the performance of the model and method on simulated data generated under a gene flow model. We show that the method accurately infers the most recent time at which gene flowoccurs. Finally, we demonstrate the application of the new method to a 106-locus yeast data set.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available