4.3 Article

Learning generative models for protein fold families

期刊

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
卷 79, 期 4, 页码 1061-1078

出版社

WILEY
DOI: 10.1002/prot.22934

关键词

protein sequence; probabilistic graphical models; Markov random fields; regularization; generative model

资金

  1. NSF [IIS-0905193, CCF-1019104]
  2. Gordon and Betty Moore Foundation
  3. Div Of Information & Intelligent Systems
  4. Direct For Computer & Info Scie & Enginr [0905313, 0905193] Funding Source: National Science Foundation

向作者/读者索取更多资源

We introduce a new approach to learning statistical models from multiple sequence alignments (MSA) of proteins. Our method, called GREMLIN (Generative REgularized ModeLs of proteINs), learns an undirected probabilistic graphical model of the amino acid composition within the MSA. The resulting model encodes both the position- specific conservation statistics and the correlated mutation statistics between sequential and long-range pairs of residues. Existing techniques for learning graphical models from MSA either make strong, and often inappropriate assumptions about the conditional independencies within the MSA (e. g., Hidden Markov Models), or else use suboptimal algorithms to learn the parameters of the model. In contrast, GREMLIN makes no a priori assumptions about the conditional independencies within the MSA. We formulate and solve a convex optimization problem, thus guaranteeing that we find a globally optimal model at convergence. The resulting model is also generative, allowing for the design of new protein sequences that have the same statistical properties as those in the MSA. We perform a detailed analysis of covariation statistics on the extensively studied WW and PDZ domains and show that our method out-performs an existing algorithm for learning undirected probabilistic graphical models from MSA. We then apply our approach to 71 additional families from the PFAM database and demonstrate that the resulting models significantly out-perform Hidden Markov Models in terms of predictive accuracy.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据