4.7 Article

OMAmer: tree-driven and alignment-free protein assignment to subfamilies outperforms closest sequence approaches

期刊

BIOINFORMATICS
卷 37, 期 18, 页码 2866-2873

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab219

关键词

-

资金

  1. Swiss National Foundation [167276, 183723]

向作者/读者索取更多资源

Assigning new sequences to known protein families and subfamilies is crucial for many functional, comparative and evolutionary genomics analyses. However, relying solely on the closest sequence in a reference database for assignment can lead to misassignments, as a query sequence may not necessarily belong to the same subfamily as its closest sequence. To overcome this issue, a novel alignment-free protein subfamily assignment method called OMAmer has been introduced, which provides better and quicker subfamily-level assignments compared to methods relying on the closest sequence.
Motivation: Assigning new sequences to known protein families and subfamilies is a prerequisite for many functional, comparative and evolutionary genomics analyses. Such assignment is commonly achieved by looking for the closest sequence in a reference database, using a method such as BLAST. However, ignoring the gene phylogeny can be misleading because a query sequence does not necessarily belong to the same subfamily as its closest sequence. For example, a hemoglobin which branched out prior to the hemoglobin alpha/beta duplication could be closest to a hemoglobin alpha or beta sequence, whereas it is neither. To overcome this problem, phylogeny-driven tools have emerged but rely on gene trees, whose inference is computationally expensive. Results: Here, we first show that in multiple animal and plant datasets, 18-62% of assignments by closest sequence are misassigned, typically to an over-specific subfamily. Then, we introduce OMAmer, a novel alignment-free protein subfamily assignment method, which limits over-specific subfamily assignments and is suited to phylogenomic databases with thousands of genomes. OMAmer is based on an innovative method using evolutionarily informed k-mers for alignment-free mapping to ancestral protein subfamilies. Whilst able to reject non-homologous family-level assignments, we show that OMAmer provides better and quicker subfamily-level assignments than approaches relying on the closest sequence, whether inferred exactly by Smith-Waterman or by the fast heuristic DIAMOND.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据