4.8 Review

Machine Learning for Designing Next-Generation mRNA Therapeutics

Journal

ACCOUNTS OF CHEMICAL RESEARCH
Volume 55, Issue 1, Pages 24-34

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.accounts.1c00621

Keywords

-

Funding

  1. NIH [R01GM120379, R01HG009892]
  2. University of Washington eScience Institute
  3. Washington Research Foundation

Ask authors/readers for more resources

Over the past two years, mRNA therapeutics and vaccines have rapidly made a transition from concept to real-world impact. While some aspects have been optimized for decades, the selection and design of noncoding sequences controlling translation efficiency have received less attention. Model-driven design is proposed as a promising alternative to provide unprecedented control over untranslated region function.
Over just the last 2 years, mRNA therapeutics and vaccines have undergone a rapid transition from an intriguing concept to real-world impact. However, whereas some aspects of mRNA therapeutics, such as the use of chemical modifications to increase stability and reduce immunogenicity, have been extensively optimized for over two decades, other aspects, particularly the selection and design of the noncoding leader and trailer sequences which control translation efficiency and stability, have received comparably less attention. In practice, such 5' and 3' untranslated regions (UTRs) are often borrowed from highly expressed human genes with few or no modifications, as in the case for the Pfizer/BioNTech Covid vaccine. Focusing on the S'UTR, we here argue that model-driven design is a promising alternative that provides unprecedented control over S'UTR function. We review recent work that combines synthetic biology with machine learning to build quantitative models that relate ribosome loading, and thus translation efficiency, to the 5'UTR sequence. We first introduce an experimental approach that uses polysome profiling and high-throughput sequencing to quantify ribosome loading for hundreds of thousands of S'UTRs in parallel. We apply this approach to measure ribosome loading in synthetic RNA libraries with a random sequence inserted into the 5'UTR. We then review Optimus 5-Prime, a convolutional neural network model trained on the experimental data. We highlight that very accurate models of biological regulation can be learned from synthetic data sets with degenerate 5'UTRs. We validate model predictions not only on held-out data sets from our random library but also on a large library of over 30 000 human S'UTR fragments and using translation reporter data collected independently by other groups. Both the experiment and model are compatible with commonly used chemically modified nucleosides, in particular, pseudouridine (psi) and 1-methyll-pseudouridine (m(1)psi). We find that, in general, 5 1 UTRs have very similar impacts when combined with different proteincoding sequences and even in the context of different chemical modifications. We demonstrate that Optimus 5-Prime can be combined with design algorithms to generate de novo sequences with precisely defined translation efficiencies. We emphasize recent developments in design algorithms that rely on activation maximization and generative modeling to improve both the fitness and diversity of designed sequences. Compared with prior approaches such as genetic algorithms, we show that these approaches are not only faster but also less likely to get stuck in local sequence optima. Finally, we discuss how the approach reviewed here can be generalized to other gene regions and applications.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available