4.7 Article

Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection

期刊

BIOINFORMATICS
卷 30, 期 4, 页码 472-479

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btt709

关键词

-

资金

  1. National Natural Science Foundation of China [61300112, 61370165, 61173075, 61272383, 61370010]
  2. Natural Science Foundation of Guangdong Province [S2012040007390, S2013010014475]
  3. Scientific Research Innovation Foundation in Harbin Institute of Technology [HIT.NSRIF.2013103]
  4. Shanghai Key Laboratory of Intelligent Information Processing, China [IIPL-2012-002]

向作者/读者索取更多资源

Motivation: Owing to its importance in both basic research (such as molecular evolution and protein attribute prediction) and practical application (such as timely modeling the 3D structures of proteins targeted for drug development), protein remote homology detection has attracted a great deal of interest. It is intriguing to note that the profile-based approach is promising and holds high potential in this regard. To further improve protein remote homology detection, a key step is how to find an optimal means to extract the evolutionary information into the profiles. Results: Here, we propose a novel approach, the so-called profile-based protein representation, to extract the evolutionary information via the frequency profiles. The latter can be calculated from the multiple sequence alignments generated by PSI-BLAST. Three top performing sequence-based kernels (SVM-Ngram, SVM-pairwise and SVM-LA) were combined with the profile-based protein representation. Various tests were conducted on a SCOP benchmark dataset that contains 54 families and 23 superfamilies. The results showed that the new approach is promising, and can obviously improve the performance of the three kernels. Furthermore, our approach can also provide useful insights for studying the features of proteins in various families. It has not escaped our notice that the current approach can be easily combined with the existing sequence-based methods so as to improve their performance as well.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据