4.7 Article

Informed training set design enables efficient machine learning-assisted directed protein evolution

Journal

CELL SYSTEMS
Volume 12, Issue 11, Pages 1026-+

Publisher

CELL PRESS
DOI: 10.1016/j.cels.2021.07.008

Keywords

-

Funding

  1. NSF Division of Chemical, Bioengineering, Environmental and Transport Systems [CBET 1937902]
  2. Amgen Chem-Bio-Engineering Award (CBEA)

Ask authors/readers for more resources

The study investigates and optimizes a path-independent machine learning-assisted directed evolution protocol, finding that reducing inclusion of minimally informative protein variants in training data is crucial for improving the outcome of the evolution process.
Directed evolution of proteins often involves a greedy optimization in which the mutation in the highest fitness variant identified in each round of single-site mutagenesis is fixed. The efficiency of such a singlestep greedy walk depends on the order in which beneficial mutations are identified-the process is path dependent. Here, we investigate and optimize a path-independent machine learning-assisted directed evolution (MLDE) protocol that allows in silico screening of full combinatorial libraries. In particular, we evaluate the importance of different protein encoding strategies, training procedures, models, and training set design strategies on MLDE outcome, finding the most important consideration to be the implementation of strategies that reduce inclusion of minimally informative holes(protein variants with zero or extremely low fitness) in training data. When applied to an epistatic, hole-filled, four-site combinatorial fitness landscape, our optimized protocol achieved the global fitness maximum up to 81-fold more frequently than singlestep greedy optimization. A record of this paper's transparent peer review process is included in the supplemental information.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available