4.8 Article

Learning protein fitness models from evolutionary and assay-labeled data

期刊

NATURE BIOTECHNOLOGY
卷 40, 期 7, 页码 1114-+

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41587-021-01146-5

关键词

-

资金

  1. US Department of Energy, Office of Biological and Environmental Research, Genomic Science Program Lawrence Livermore National Laboratory's Secure Biosystems Design Scientific Focus Area [SCW1710]
  2. Chan Zuckerberg Investigator program
  3. C3.ai
  4. National Library Of Medicine of the National Institutes of Health [T32LM012417]
  5. National Science Foundation [DGE 2146752]

向作者/读者索取更多资源

This study proposes a simple machine learning algorithm that combines evolutionary and experimental data for improved protein fitness prediction. They find that using ridge regression on site-specific amino acid features combined with a probability density feature from modeling the evolutionary data performs well on this task.
A simple machine learning algorithm combines evolutionary and experimental data for improved protein fitness prediction. Machine learning-based models of protein fitness typically learn from either unlabeled, evolutionarily related sequences or variant sequences with experimentally measured labels. For regimes where only limited experimental data are available, recent work has suggested methods for combining both sources of information. Toward that goal, we propose a simple combination approach that is competitive with, and on average outperforms more sophisticated methods. Our approach uses ridge regression on site-specific amino acid features combined with one probability density feature from modeling the evolutionary data. Within this approach, we find that a variational autoencoder-based probability density model showed the best overall performance, although any evolutionary density model can be used. Moreover, our analysis highlights the importance of systematic evaluations and sufficient baselines.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据