4.7 Article

Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING
卷 60, 期 8, 页码 4098-4107

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.0c00489

关键词

-

资金

  1. National Science Foundation [CBET-1552355]
  2. U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy
  3. U.S. Department of Energy, Advanced Manufacturing Office (AMO)
  4. AMO [DE-AC36-08GO28308]
  5. BETO [DE-AC36-08GO28308]
  6. National Renewable Energy Laboratory
  7. NSF
  8. U.S. Department of Energy, Bioenergy Technologies Office (BETO)

向作者/读者索取更多资源

Accurate prediction of the optimal catalytic temperature (T-opt) of enzymes is vital in biotechnology, as enzymes with high T-opt values are desired for enhanced reaction rates. Recently, a machine learning method (temperature optima for microorganisms and enzymes, TOME) for predicting Topt was developed. TOME was trained on a normally distributed data set with a median T-opt of 37 degrees C and less than 5% of T-opt values above 85 degrees C, limiting the method's predictive capabilities for thermostable enzymes. Due to the distribution of the training data, the mean squared error on T-opt values greater than 85 degrees C is nearly an order of magnitude higher than the error on values between 30 and 50 degrees C. In this study, we apply ensemble learning and resampling strategies that tackle the data imbalance to significantly decrease the error on high T-opt values (>85 degrees C) by 60% and increase the overall R-2 value from 0.527 to 0.632. The revised method, temperature optima for enzymes with resampling (TOMER), and the resampling strategies applied in this work are freely available to other researchers as Python packages on GitHub.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据