4.6 Article

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

Journal

PLOS COMPUTATIONAL BIOLOGY
Volume 17, Issue 9, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pcbi.1008949

Keywords

-

Funding

  1. Australian Research Council Discovery Project Grant [DP160103474]
  2. National Natural Science Foundation of China [31501879]

Ask authors/readers for more resources

AFPhyloMix is a novel method that reconstructs the phylogeny of homologous sequences from a mixed sample without the need for barcoding. By aligning short reads to a reference alignment and identifying variable sites, AFPhyloMix computes the phylogenetic tree and haplotype relative abundances using a Bayesian inference model. Results show that AFPhyloMix works well on both simulated and real data sets.
Author summary In evolutionary studies, it is customary to obtain homologous sequences from different individuals in a population or a species to construct a phylogeny. Frequently, sequences from different individuals will be identical; we refer to a set of identical sequences as a haplotype. If short-read sequencing technologies are used to obtain sequences from many individuals, the sequence from each individual is tagged with a unique barcode, and a mixed sample of tagged sequences is subsequently sequenced. The tagged sequences can be identified using the appropriate bioinformatics tools, for further downstream analyses. We have developed a novel method, AFPhyloMix, to reconstruct the phylogeny of a mixed sample of homologous sequences, and the relative abundance of different haplotypes, from different individuals without the need for barcoding. AFPhyloMix aligns the short reads obtained to a reference alignment, and identifies the variable sites along the alignment. On the basis of the patterns of nucleotide frequencies at these and neighbouring sites, AFPhyloMix uses a Bayesian inference model to compute the phylogenetic tree and the haplotype relative abundances. Our results show that AFPhyloMix works well on both the simulated data set and the real data set.

A current strategy for obtaining haplotype information from several individuals involves short-read sequencing of pooled amplicons, where fragments from each individual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled amplicons from a mixture of individuals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian inference model to estimate the phylogeny of the haplotypes and their relative abundances, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and relative abundances of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available