4.7 Article Proceedings Paper

DeepFam: deep learning based alignment-free method for protein family modeling and prediction

期刊

BIOINFORMATICS
卷 34, 期 13, 页码 254-262

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/bty275

关键词

-

资金

  1. Collaborative Genome Program for Fostering New Post-Genome industry through the National Research Foundation of Korea (NRF) - Ministry of Science ICT and Future Planning [NRF-2014M3C9A3063541]
  2. Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) - Ministry of Science, ICT [NRF-2017M3C4A7065887]
  3. Institute for Information & communications Technology Promotion (IITP) - the Korea government (MSIP) [B0717-16-0098]
  4. Institute for Information & Communication Technology Planning & Evaluation (IITP), Republic of Korea [2016-0-00132-003] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)
  5. National Research Foundation of Korea [2017M3C4A7065887, 21A20151113068] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

向作者/读者索取更多资源

Motivation: A large number of newly sequenced proteins are generated by the next-generation sequencing technologies and the biochemical function assignment of the proteins is an important task. However, biological experiments are too expensive to characterize such a large number of protein sequences, thus protein function prediction is primarily done by computational modeling methods, such as profile Hidden Markov Model (pHMM) and k-mer based methods. Nevertheless, existing methods have some limitations; k-mer based methods are not accurate enough to assign protein functions and pHMM is not fast enough to handle large number of protein sequences from numerous genome projects. Therefore, a more accurate and faster protein function prediction method is needed. Results: In this paper, we introduce DeepFam, an alignment-free method that can extract functional information directly from sequences without the need of multiple sequence alignments. In extensive experiments using the Clusters of Orthologous Groups (COGs) and G protein-coupled receptor (GPCR) dataset, DeepFam achieved better performance in terms of accuracy and runtime for predicting functions of proteins compared to the state-of-the-art methods, both alignment-free and alignment-based methods. Additionally, we showed that DeepFam has a power of capturing conserved regions to model protein families. In fact, DeepFam was able to detect conserved regions documented in the Prosite database while predicting functions of proteins. Our deep learning method will be useful in characterizing functions of the ever increasing protein sequences.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据