4.6 Article

Fidelity of hyperbolic space for Bayesian phylogenetic inference

Journal

PLOS COMPUTATIONAL BIOLOGY
Volume 19, Issue 4, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1011084

Keywords

-

Ask authors/readers for more resources

Bayesian inference is widely used in phylogenetics to compute distributions of phylogenies. This paper explores the use of hyperbolic space as a low dimensional representation for tree-like data. The authors embed genomic sequences in hyperbolic space and perform hyperbolic Markov Chain Monte Carlo for Bayesian inference. They demonstrate the effectiveness of this method on eight data sets and investigate the impact of embedding dimension and hyperbolic curvature on performance. The results show that hyperbolic space is suitable for phylogenetic inference.
Bayesian inference for phylogenetics is a gold standard for computing distributions of phylogenies. However, Bayesian phylogenetics faces the challenging computational problem of moving throughout the high-dimensional space of trees. Fortunately, hyperbolic space offers a low dimensional representation of tree-like data. In this paper, we embed genomic sequences as points in hyperbolic space and perform hyperbolic Markov Chain Monte Carlo for Bayesian inference in this space. The posterior probability of an embedding is computed by decoding a neighbour-joining tree from the embedding locations of the sequences. We empirically demonstrate the fidelity of this method on eight data sets. We systematically investigated the effect of embedding dimension and hyperbolic curvature on the performance in these data sets. The sampled posterior distribution recovers the splits and branch lengths to a high degree over a range of curvatures and dimensions. We systematically investigated the effects of the embedding space's curvature and dimension on the Markov Chain's performance, demonstrating the suitability of hyperbolic space for phylogenetic inference. Author summary Why was this study done? Tree structures are widely used in fields such as phylogenetics, however modifying the layout and branch lengths of these structures simultaniously is a high-dimensional problem. Recent work in machine learning has demonstrated the usefulness of representing tree-like data as points in low dimensional hyperbolic space. We aimed to explore new ways of representing phylogenetic trees so they can be modified in a continuous manner. What did the researchers do and find? We represented trees by the locations of their embedded genomic sequences in hyperbolic space. We perturbed these continuous encoding locations and decoded an altered discrete tree structure. Using this technique, we performed Bayesian inference and computed the posterior distribution of standard eight datasets, to demonstrate the feasibility of phylogenetic inference with this representation. We found that hyperbolic space is suitable for Bayasian phylogenetics and is most efficient across a broad range of hyperbolic curvatures with low dimensionality. What do these findings mean? This method diversifies the way numerical methods can navigate the space of trees both in phylogenetics and more broadly. With hyperbolic embeddings, scaleable online inference is possible by quickly adding taxa to a tree or a distribution of trees. This method could open a wealth of powerful continuum-based methods to navigate the space of trees.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available