4.7 Article Data Paper

SoluProtMut(DB): A manually curated database of protein solubility changes upon mutations

Journal

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL
Volume 20, Issue -, Pages 6339-6347

Publisher

ELSEVIER
DOI: 10.1016/j.csbj.2022.11.009

Keywords

Mutational database; Protein engineering; Soluble expression; Protein yield; Machine learning; Protein aggregation

Funding

  1. Czech Ministry of Education, Youth and Sports [INBIO CZ.02.1.01/0.0/0.0/16_026/0008451, CZ.02.1.01/0.0/0.0/17_043/0009632, 2.1.01/0.0/0.0/15_003/0000469, ESFRI RECETOX RI LM2018121, ESFRI ELIXIR LM2018131]
  2. Technology Agency of the Czech Republic [FW03010208]
  3. Brno University of Technology [FIT-S-20-6293]
  4. CETOCOEN EXCELLENCE Teaming 2 project - Horizon2020 of the European Union [857560]
  5. National Institute for Cancer Research - EU -Next Generation EU [LX22NPO5102]
  6. Czech Science Foundation [20-15915Y]

Ask authors/readers for more resources

Protein solubility is crucial for protein production and manufacturing yields. Understanding the structural determinants of solubility and the effects of mutations can help connect human diseases with protein aggregation. The SoluProtMut(DB) database contains extensive data on protein solubility changes upon mutations, serving as a valuable resource for researchers designing improved protein variants and developing machine learning tools. The database includes previously published datasets and additional data from recent studies, and it has been curated for machine learning applications.
Protein solubility is an attractive engineering target primarily due to its relation to yields in protein production and manufacturing. Moreover, better knowledge of the mutational effects on protein solubility could connect several serious human diseases with protein aggregation. However, we have limited understanding of the protein structural determinants of solubility, and the available data have mostly been scattered in the literature. Here, we present SoluProtMut(DB) - the first database containing data on protein solubility changes upon mutations. Our database accommodates 33000 measurements of 17000 protein variants in 103 different proteins. The database can serve as an essential source of information for the researchers designing improved protein variants or those developing machine learning tools to predict the effects of mutations on solubility. The database comprises all the previously published solubility datasets and thousands of new data points from recent publications, including deep mutational scanning experiments. Moreover, it features many available experimental conditions known to affect protein solubility. The datasets have been manually curated with substantial corrections, improving suitability for machine learning applications. The database is available at loschmidt.chemi.muni.cz/soluprotmutdb. (C) 2022 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available