4.7 Article

pLM-BLAST: distant homology detection based on direct comparison of sequence representations from protein language models

向作者/读者索取更多资源

This study explores the feasibility of using protein language models and the pLM-BLAST tool to detect protein homology through sequence comparison. Benchmark tests show that pLM-BLAST maintains accuracy comparable to HHsearch while being faster, and it is able to compute local alignments, highlighting its potential for discovering previously undiscovered homologous relationships and improving protein annotation.
Motivation: The detection of homology through sequence comparison is a typical first step in the study of protein function and evolution. In this work, we explore the applicability of protein language models to this task.Results: We introduce pLM-BLAST, a tool inspired by BLAST, that detects distant homology by comparing single-sequence representations (embeddings) derived from a protein language model, ProtT5. Our benchmarks reveal that pLM-BLAST maintains a level of accuracy on par with HHsearch for both highly similar sequences (with >50% identity) and markedly divergent sequences (with <30% identity), while being significantly faster. Additionally, pLM-BLAST stands out among other embedding-based tools due to its ability to compute local alignments. We show that these local alignments, produced by pLM-BLAST, often connect highly divergent proteins, thereby highlighting its potential to uncover previously undiscovered homologous relationships and improve protein annotation. Availability and implementation:pLM-BLAST is accessible via the MPI Bioinformatics Toolkit as a web server for searching precomputed data-bases (https://toolkit.tuebingen.mpg.de/tools/plmblast). It is also available as a standalone tool for building custom databases and performingbatch searches (https://github.com/labstructbioinf/pLM-BLAST).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据