4.6 Article

Sampling bias and model choice in continuous phylogeography: Getting lost on a random walk

Journal

PLOS COMPUTATIONAL BIOLOGY
Volume 17, Issue 1, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1008561

Keywords

-

Funding

  1. Cambridge Mathematics Placements (CMP)
  2. Interne Fondsen KU Leuven/Internal Funds KU Leuven [C14/18/094]
  3. Research Foundation -Flanders (`Fonds voor Wetenschappelijk Onderzoek -Vlaanderen')
  4. Agence Nationale pour la Recherche through the grant GENOSPACE
  5. European Molecular Biology Laboratory

Ask authors/readers for more resources

The author explores the effects of different model assumptions on phylogeographic inference and discovers that sample collection biases can strongly impact the quality of reconstruction. They suggest various strategies to counter these effects, but note that they come with additional computational burden. Additionally, they investigate the differences of various phylogeographic models and their suitability in different scenarios.
Author summary Phylogeography studies past location and migration using information from current geographic locations of genetic sequences. For example, phylogeography can be used to reconstruct the history of geographical spread of an outbreak using the genetic sequences of the pathogen collected at different times and locations. Here, we investigate the effects of different model assumptions on phylogeographic inference. In particular, we examine the effects of the strategy used to collect samples. We show that sample collection biases can have a strong impact on the quality of phylogeographic reconstruction: geographically biased sampling scheme can be very detrimental for popular continuous phylogeography models. We consider different ways to counter these effects, from utilising alternative phylogeographic models, to the inclusion of partially informative samples (known cases without genetic sequences). While these strategies do alleviate the effects of sampling biases, they also lead to considerable additional computational burden. We also investigate the intrinsic differences of different phylogeographic models, and their effects on reconstructed patterns in different scenarios. Phylogeographic inference allows reconstruction of past geographical spread of pathogens or living organisms by integrating genetic and geographic data. A popular model in continuous phylogeography-with location data provided in the form of latitude and longitude coordinates-describes spread as a Brownian motion (Brownian Motion Phylogeography, BMP) in continuous space and time, akin to similar models of continuous trait evolution. Here, we show that reconstructions using this model can be strongly affected by sampling biases, such as the lack of sampling from certain areas. As an attempt to reduce the effects of sampling bias on BMP, we consider the addition of sequence-free samples from under-sampled areas. While this approach alleviates the effects of sampling bias, in most scenarios this will not be a viable option due to the need for prior knowledge of an outbreak's spatial distribution. We therefore consider an alternative model, the spatial ?-Fleming-Viot process (?FV), which has recently gained popularity in population genetics. Despite the ?FV's robustness to sampling biases, we find that the different assumptions of the ?FV and BMP models result in different applicabilities, with the ?FV being more appropriate for scenarios of endemic spread, and BMP being more appropriate for recent outbreaks or colonizations.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available