4.4 Article

Evaluation of methods for estimating coalescence times using ancestral recombination graphs

Journal

GENETICS
Volume 221, Issue 1, Pages -

Publisher

GENETICS SOCIETY AMERICA
DOI: 10.1093/genetics/iyac044

Keywords

ancestral recombination graph; ARGweaver; Relate; tsinfer; tsdate; simulation; calibration

Funding

  1. National Science Foundation [2146752]
  2. National Institutes of Health [R01GM138634]
  3. Directorate for STEM Education [2146752] Funding Source: National Science Foundation
  4. Division Of Graduate Education [2146752] Funding Source: National Science Foundation

Ask authors/readers for more resources

This study compared the estimates of coalescence times from three ancestral recombination graph inference programs using standard neutral coalescent simulations. The results showed that ARGweaver had the most accurate estimates at each locus, while Relate was often more accurate than tsinfer+tsdate. However, all three methods tended to overestimate small coalescence times and underestimate large ones. The posterior distribution of ARGweaver was closer to the expected distribution compared to Relate, but it sacrificed scalability.
The ancestral recombination graph is a structure that describes the joint genealogies of sampled DNA sequences along the genome. Recent computational methods have made impressive progress toward scalably estimating whole-genome genealogies. In addition to inferring the ancestral recombination graph, some of these methods can also provide ancestral recombination graphs sampled from a defined posterior distribution. Obtaining good samples of ancestral recombination graphs is crucial for quantifying statistical uncertainty and for estimating population genetic parameters such as effective population size, mutation rate, and allele age. Here, we use standard neutral coalescent simulations to benchmark the estimates of pairwise coalescence times from 3 popular ancestral recombination graph inference programs: ARGweaver, Relate, and tsinfer+tsdate. We compare (1) the true coalescence times to the inferred times at each locus; (2) the distribution of coalescence times across all loci to the expected exponential distribution; (3) whether the sampled coalescence times have the properties expected of a valid posterior distribution. We find that inferred coalescence times at each locus are most accurate in ARGweaver, and often more accurate in Relate than in tsinfer+tsdate. However, all 3 methods tend to overestimate small coalescence times and underestimate large ones. Lastly, the posterior distribution of ARGweaver is closer to the expected posterior distribution than Relate's, but this higher accuracy comes at a substantial trade-off in scalability. The best choice of method will depend on the number and length of input sequences and on the goal of downstream analyses, and we provide guidelines for the best practices.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available