4.7 Article Data Paper

SoluProtMut(DB): A manually curated database of protein solubility changes upon mutations

期刊

出版社

ELSEVIER
DOI: 10.1016/j.csbj.2022.11.009

关键词

Mutational database; Protein engineering; Soluble expression; Protein yield; Machine learning; Protein aggregation

资金

  1. Czech Ministry of Education, Youth and Sports [INBIO CZ.02.1.01/0.0/0.0/16_026/0008451, CZ.02.1.01/0.0/0.0/17_043/0009632, 2.1.01/0.0/0.0/15_003/0000469, ESFRI RECETOX RI LM2018121, ESFRI ELIXIR LM2018131]
  2. Technology Agency of the Czech Republic [FW03010208]
  3. Brno University of Technology [FIT-S-20-6293]
  4. CETOCOEN EXCELLENCE Teaming 2 project - Horizon2020 of the European Union [857560]
  5. National Institute for Cancer Research - EU -Next Generation EU [LX22NPO5102]
  6. Czech Science Foundation [20-15915Y]

向作者/读者索取更多资源

Protein solubility is crucial for protein production and manufacturing yields. Understanding the structural determinants of solubility and the effects of mutations can help connect human diseases with protein aggregation. The SoluProtMut(DB) database contains extensive data on protein solubility changes upon mutations, serving as a valuable resource for researchers designing improved protein variants and developing machine learning tools. The database includes previously published datasets and additional data from recent studies, and it has been curated for machine learning applications.
Protein solubility is an attractive engineering target primarily due to its relation to yields in protein production and manufacturing. Moreover, better knowledge of the mutational effects on protein solubility could connect several serious human diseases with protein aggregation. However, we have limited understanding of the protein structural determinants of solubility, and the available data have mostly been scattered in the literature. Here, we present SoluProtMut(DB) - the first database containing data on protein solubility changes upon mutations. Our database accommodates 33000 measurements of 17000 protein variants in 103 different proteins. The database can serve as an essential source of information for the researchers designing improved protein variants or those developing machine learning tools to predict the effects of mutations on solubility. The database comprises all the previously published solubility datasets and thousands of new data points from recent publications, including deep mutational scanning experiments. Moreover, it features many available experimental conditions known to affect protein solubility. The datasets have been manually curated with substantial corrections, improving suitability for machine learning applications. The database is available at loschmidt.chemi.muni.cz/soluprotmutdb. (C) 2022 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据