4.7 Article

Precise Prediction of Promoter Strength Based on a De Novo Synthetic Promoter Library Coupled with Machine Learning

Journal

ACS SYNTHETIC BIOLOGY
Volume 11, Issue 1, Pages 92-102

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acssynbio.1c00117

Keywords

promoter; evolution; machine learning; XgBoost model; Pearson correlation coefficient

Funding

  1. National Key R&D Program of China [2019YFA0905500]
  2. National Natural Science Foundation of China [21877053, 31900066]
  3. Tianjin Synthetic Biotechnology Innovation Capacity Improvement Project [TSBICIP-KJGG-015]
  4. Fundamental Research Funds for the Central Universities [JUSRP51705A, JUSRP12056]
  5. Natural Science Foundation of Jiangsu Province [BK20210752]
  6. China Postdoctoral Science Foundation [2021 M690533]

Ask authors/readers for more resources

By constructing and characterizing a mutant library of Trc promoters, a synthetic promoter library was established with a wide range of intensities. Using this library, machine learning models were built and optimized, with the XgBoost model exhibiting optimal performance in predicting the strength of artificially designed promoter sequences. This work provides a powerful platform for predictably tuning promoters to achieve optimal transcriptional strength.
Promoters are one of the most critical regulatory elements controlling metabolic pathways. However, the fast and accurate prediction of promoter strength remains challenging, leading to time- and labor-consuming promoter construction and characterization processes. This dilemma is caused by the lack of a big promoter library that has gradient strengths, broad dynamic ranges, and clear sequence profiles that can be used to train an artificial intelligence model of promoter strength prediction. To overcome this challenge, we constructed and characterized a mutant library of Trc promoters (P-trc) using 83 rounds of mutation-construction-screening-characterization engineering cycles. After excluding invalid mutation sites, we established a synthetic promoter library that consisted of 3665 different variants, displaying an intensity range of more than two orders of magnitude. The strongest variant was similar to 69-fold stronger than the original Ptrc and 1.52-fold stronger than a 1 mM isopropyl-beta-Dthiogalactoside-driven P-T7 promoter, with an similar to 454-fold difference between the strongest and weakest expression levels. Using this synthetic promoter library, different machine learning models were built and optimized to explore the relationships between promoter sequences and transcriptional strength. Finally, our XgBoost model exhibited optimal performance, and we utilized this approach to precisely predict the strength of artificially designed promoter sequences (R-2 = 0.88, mean absolute error = 0.15, and Pearson correlation coefficient = 0.94). Our work provides a powerful platform that enables the predictable tuning of promoters to achieve optimal transcriptional strength.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available