4.4 Article

Information geometry for phylogenetic trees

Journal

JOURNAL OF MATHEMATICAL BIOLOGY
Volume 82, Issue 3, Pages -

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s00285-021-01553-x

Keywords

Phylogenetic tree; Information geometry; Tree space

Funding

  1. DFG [GRK 2088]
  2. Niedersachsen Vorab of the Volkswagen Foundation

Ask authors/readers for more resources

The paper introduces a new space of phylogenetic trees called wald space, motivated by the need for a space suitable for statistical analysis of phylogenies with a geometry based on biologically principled assumptions. It investigates two related geometries on wald space involving Fisher information metric and continuous-valued Gaussian process. Computational methods are derived to compute geodesics in polynomial time for both geometries and numerical results show their similarity. The canonical and biologically motivated space proposed in the study is shown to be substantially different from the BHV geometry.
We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce similar distributions on genetic sequence data. As a point set, wald space contains the previously developed Billera-Holmes-Vogtmann (BHV) tree space; it also contains disconnected forests, like the edge-product (EP) space but without certain singularities of the EP space. We investigate two related geometries on wald space. The first is the geometry of the Fisher information metric of character distributions induced by the two-state symmetric Markov substitution process on each tree. Infinitesimally, the metric is proportional to the Kullback-Leibler divergence, or equivalently, as we show, to any f-divergence. The second geometry is obtained analogously but using a related continuous-valued Gaussian process on each tree, and it can be viewed as the trace metric of the affine-invariant metric for covariance matrices. We derive a gradient descent algorithm to project from the ambient space of covariance matrices to wald space. For both geometries we derive computational methods to compute geodesics in polynomial time and show numerically that the two information geometries (discrete and continuous) are very similar. In particular, geodesics are approximated extrinsically. Comparison with the BHV geometry shows that our canonical and biologically motivated space is substantially different.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available