4.7 Article

Machine Learning Enables Prediction of Pyrrolysyl-tRNA Synthetase Substrate Specificity

Journal

ACS SYNTHETIC BIOLOGY
Volume 12, Issue 8, Pages 2403-2417

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acssynbio.3c00225

Keywords

machine learning; pyrrolysyl-tRNA synthetase; noncanonical amino acids; enzyme engineering; substratespecificity

Ask authors/readers for more resources

This study developed machine learning models to predict the substrate specificity of PylRS for novel NCAAs. The models showed high accuracy and provided a framework for expanding the substrate scope of PylRS variants and developing machine learning models for other PylRS variants.
Knowledge about the substrate scope for a given enzymeis informativefor elucidating biochemical pathways and also for expanding applicationsof the enzyme. However, no general methods are available to accuratelypredict the substrate specificity of an enzyme. Pyrrolysyl-tRNA synthetase(PylRS) is a powerful tool for incorporating various noncanonicalamino acids (NCAAs) into proteins, which enabled us to probe, image,rationally engineer, and evolve protein structure and function. However,the incorporation of a new NCAA typically requires the selection oflarge libraries of PylRS with randomized mutations at active sites,and this process requires multiple rounds of selection for each newsubstrate. Therefore, a single aminoacyl-tRNA synthetase with broadsubstrate promiscuity is ideal to facilitate widespread applicationsof the genetic NCAA incorporation technique. Herein, machine learningmodels were developed to predict the substrate specificity of PylRSto accept novel NCAAs that could be incorporated into proteins bythree PylRS mutants. The models were built from a training set of285 unique enzyme-substrate pairs of three PylRS mutants includingIFRS, BtaRS, and MFRS against 95 NCAAs. The best BaggingTree (BT)model was then used for virtually screening a NCAAs library containing1474 phenylalanine, tyrosine, tryptophan, and alanine analogues, and156 NCAAs were predicted to be accepted by at least one of the threePylRS mutants. Then, 27 NCAAs including 24 positive and 3 negativesubstrates were experimentally tested for their activities, and 20of the 24 positive substrates showed weak or strong activity and wereaccepted by at least one PylRS mutant, among which 11 NCAAs were neverreported to be incorporated into proteins before. Three negative substratesdid not show any activity. Experimental results suggested that theBT model provides a three-class classification accuracy of 0.69 anda binary classification accuracy of 0.86. This study expanded thesubstrate scope of three PylRS variants and provided a framework fordeveloping machine learning models to predict substrate specificityof other PylRS variants.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available