4.8 Article

Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult

Journal

MOLECULAR BIOLOGY AND EVOLUTION
Volume 38, Issue 5, Pages 1777-1791

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/molbev/msaa314

Keywords

SARS-CoV-2; phylogenetic inference; phylogeny rooting; outgroups; strain classification

Funding

  1. Klaus Tschira Foundation
  2. EU IGNITE ITN project

Ask authors/readers for more resources

Inferring reliable phylogenies from virus sequence data is difficult due to the large number of sequences and low number of mutations; rooting the inferred phylogeny with external outgroups or applying novel computational methods is not credible; automatic classification of current sequences into subclasses for molecular species delimitation is also not possible due to sequences being too closely related.
Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available