4.8 Article

Inadvertent Paralog Inclusion Drives Artifactual Topologies and Timetree Estimates in Phylogenomics

期刊

MOLECULAR BIOLOGY AND EVOLUTION
卷 36, 期 6, 页码 1344-1356

出版社

OXFORD UNIV PRESS
DOI: 10.1093/molbev/msz067

关键词

phylogenomics; orthology; paralogy; Lissamphibia; timetree

资金

  1. Bioinformatics Core at IBERS (AU)
  2. Irish Research Council-Marie Sklodowska-Curie cofund program [ELEVATEPD/2014/69]
  3. Biotechnology and Biological Sciences Research Council [BB/E/W/10964A01, BBS/OS/GC/000011B]
  4. Ministry of Economy and Competitiveness of Spain [RYC-2011-09321, CGL2012-40082, BES-2013-062723, EEBB-I-15-09665]
  5. NCBI [PRJNA387587, PRJNA430346]
  6. Irish Research Council (IRC) [ELEVATEPD/2014/69] Funding Source: Irish Research Council (IRC)

向作者/读者索取更多资源

Increasingly, large phylogenomic data sets include transcriptomic data from nonmodel organisms. This not only has allowed controversial and unexplored evolutionary relationships in the tree of life to be addressed but also increases the risk of inadvertent inclusion of paralogs in the analysis. Although this may be expected to result in decreased phylogenetic support, it is not clear if it could also drive highly supported artifactual relationships. Many groups, including the hyperdiverse Lissamphibia, are especially susceptible to these issues due to ancient gene duplication events and small numbers of sequenced genomes and because transcriptomes are increasingly applied to resolve historically conflicting taxonomic hypotheses. We tested the potential impact of paralog inclusion on the topologies and timetree estimates of the Lissamphibia using published and de novo sequencing data including 18 amphibian species, from which 2,656 singlecopy gene families were identified. A novel paralog filtering approach resulted in four differently curated data sets, which were used for phylogenetic reconstructions using Bayesian inference, maximum likelihood, and quartet-based supertrees. We found that paralogs drive strongly supported conflicting hypotheses within the Lissamphibia (Batrachia and Procera) and older divergence time estimates even within groups where no variation in topology was observed. All investigated methods, except Bayesian inference with the CAT-GTR model, were found to be sensitive to paralogs, but with filtering convergence to the same answer (Batrachia) was observed. This is the first large-scale study to address the impact of orthology selection using transcriptomic data and emphasizes the importance of quality over quantity particularly for understanding relationships of poorly sampled taxa.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据