4.7 Article

Accurate and efficient protein sequence design through learning concise local environment of residues

Ask authors/readers for more resources

ProDESIGN-LE is an accurate and efficient approach to protein sequence design. It uses a transformer to learn the correlation between residue local environments and amino acid types, resulting in designed proteins that fit well with their local environments. Experimental results show that ProDESIGN-LE performs well in designing protein sequences with high solubility and structural similarity to the target structures.
MotivationComputational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired.ResultsHere, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of the residue's local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 s per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in Escherichia coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein.Availability and implementationThe source code of ProDESIGN-LE is available at .

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available