4.6 Article

Consistency of SVDQuartets and Maximum Likelihood for Coalescent-Based Species Tree Estimation

Journal

SYSTEMATIC BIOLOGY
Volume 70, Issue 1, Pages 33-48

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/sysbio/syaa039

Keywords

Consistency; gene tree; maximum likelihood; multilocus data; hylogenetic inference; species tree; SVDQuartets

Ask authors/readers for more resources

The study reveals that SVDQuartets is statistically consistent for all data types, while ML is consistent only for CIS data under the JC69 model. Proof of consistency for the more general multilocus case remains challenging.
Numerous methods for inferring species-level phylogenies under the coalescent model have been proposed within the last 20 years, and debates continue about the relative strengths and weaknesses of these methods. One desirable property of a phylogenetic estimator is that of statistical consistency, which means intuitively that as more data are collected, the probability that the estimated tree has the same topology as the true tree goes to 1. To date, consistency results for species tree inference under the multispecies coalescent (MSC) have been derived only for summary statistics methods, such as ASTRAL and MP-EST. These methods have been found to be consistent given true gene trees but may be inconsistent when gene trees are estimated from data for loci of finite length. Here, we consider the question of statistical consistency for four taxa for SVDQuartets for general data types, as well as for the maximum likelihood (ML) method in the case in which the data are a collection of sites generated under the MSC model such that the sites are conditionally independent given the species tree (we call these data coalescent independent sites [CIS] data). We show that SVDQuartets is statistically consistent for all data types (i.e., for both CIS data and formultilocus data), and we derive its rate of convergence. We additionally show that ML is consistent for CIS data under the JC69 model and discuss why a proof for the more general multilocus case is difficult. Finally, we compare the performance of ML and SDVQuartets using simulation for both data types.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available