4.4 Review

Chemical language models for molecular design

Related references

Note: Only part of the references are listed.
Editorial Material Chemistry, Medicinal

Chemical language models for applications in medicinal chemistry

Atsushi Yoshimori et al.

FUTURE MEDICINAL CHEMISTRY (2023)

Article Biochemistry & Molecular Biology

DeepCubist: Molecular Generator for Designing Peptidomimetics based on Complex three-dimensional scaffolds

Kohei Umedera et al.

Summary: Mimicking bioactive conformations of peptide segments involved in protein-protein interaction interfaces is a promising strategy for designing PPI inhibitors. DeepCubist is a molecular generator based on 3D scaffolds that can be used to design peptidomimetics for pharmaceutical targets engaging in PPIs.

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN (2023)

Article Multidisciplinary Sciences

Leveraging molecular structure and bioactivity with chemical language models for de novo drug design

Michael Moret et al.

Summary: Generative chemical language models (CLMs) can be used to generate new molecular structures from a textual representation. Hybrid CLMs can leverage bioactivity information for training compounds. In this study, a virtual compound library was created using a generative CLM and refined using a CLM-based classifier for bioactivity prediction. A new PI3K gamma ligand with sub-micromolar activity was identified, highlighting the potential of hybrid CLMs for molecular design.

NATURE COMMUNICATIONS (2023)

Article Biochemistry & Molecular Biology

cMolGPT: A Conditional Generative Pre-Trained Transformer for Target-Specific De Novo Molecular Generation

Ye Wang et al.

Summary: Deep generative models have been applied to the generation of novel compounds in small-molecule drug design and have gained significant attention. In this study, a Generative Pre-Trained Transformer (GPT)-inspired model called cMolGPT is proposed for de novo target-specific molecular design. The results demonstrate that cMolGPT is capable of generating drug-like and active compounds, closely matching the chemical space of real target-specific molecules and covering a considerable portion of novel compounds. Therefore, cMolGPT is a valuable tool for de novo molecule design and has the potential to accelerate the molecular optimization cycle time.

MOLECULES (2023)

Article Multidisciplinary Sciences

Designing highly potent compounds using a chemical language model

Hengwei Chen et al.

Summary: This study presents a new methodology for predicting potent compounds by using a chemical language model with a conditional transformer architecture. The model is capable of predicting known potent compounds from different activity classes and generating highly potent compounds that are structurally distinct from the input molecules. It also produces novel candidate compounds not included in the test sets.

SCIENTIFIC REPORTS (2023)

Article Biochemistry & Molecular Biology

Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model

Atsushi Yoshimori et al.

Summary: In the field of drug design, very few studies have attempted the prediction of new active compounds from protein sequence data. This is mainly due to the challenging nature of this prediction task, as global protein sequence similarity has strong evolutionary and structural implications but is not directly related to ligand binding. However, the application of deep language models adapted from natural language processing provides new opportunities to attempt such predictions by linking amino acid sequences and chemical structures through textual molecular representations. In this study, a biochemical language model with a transformer architecture, named Motif2Mol, was introduced for the prediction of new active compounds based on sequence motifs of ligand binding sites. In a proof-of-concept application on inhibitors of more than 200 human kinases, Motif2Mol exhibited promising learning characteristics and an unprecedented ability to consistently reproduce known inhibitors of different kinases.

BIOMOLECULES (2023)

Article Multidisciplinary Sciences

Sequence-based drug design as a concept in computational drug design

Lifan Chen et al.

Summary: Drug development based on target proteins has been successful. Researchers propose a sequence-to-drug concept for computational drug design based on protein sequence information and validate it through differentiable learning, demonstrating its importance in drug design.

NATURE COMMUNICATIONS (2023)

Article Multidisciplinary Sciences

Meta-learning for transformer-based prediction of potent compounds

Hengwei Chen et al.

Summary: The study explores meta-learning for generative design in drug discovery and demonstrates that meta-learning models consistently outperform other transformers in predicting potent compounds with limited fine-tuning data.

SCIENTIFIC REPORTS (2023)

Article Multidisciplinary Sciences

Molecule generation using transformers and policy gradient reinforcement learning

Eyal Mazuz et al.

Summary: In this paper, the authors propose a transformer-based architecture called Taiga for generating molecules with desired properties. They use a two-stage approach, treating the problem as a language modeling task and then optimizing molecular properties using reinforcement learning. Evaluation results show that Taiga outperforms state-of-the-art baselines, with improvements in QED ranging from 2% to over 20%.

SCIENTIFIC REPORTS (2023)

Article Chemistry, Medicinal

MolGPT: Molecular Generation Using a Transformer-Decoder Model

Viraj Bagal et al.

Summary: The application of deep learning techniques in inverse molecular design for drug design has gained significant attention. In this study, we train a transformer-decoder model, MolGPT, on the next token prediction task using masked self-attention, and demonstrate that it performs on par with other modern machine learning frameworks for generating valid, unique, and novel druglike molecules. We also show that the model can be trained conditionally to control multiple properties of the generated molecules.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2022)

Article Chemistry, Multidisciplinary

Attention Map-Guided Visual Explanations for Deep Neural Networks

Junkang An et al.

Summary: This paper focuses on attention-map-guided visual explanations for deep neural networks to improve their interpretability in areas such as healthcare, finance, and defense. Experimental results show that the proposed method outperforms other methods significantly.

APPLIED SCIENCES-BASEL (2022)

Article Biochemistry & Molecular Biology

DeepAS-Chemical language model for the extension of active analogue series

Atsushi Yoshimori et al.

Summary: In this study, a chemical language model based on deep learning is introduced for analogue design. The model predicts preferred R-groups for new analogues based on ordered R-group sequences, taking into account the potency gradient and detectable SAR trends, providing a new concept for analogue design.

BIOORGANIC & MEDICINAL CHEMISTRY (2022)

Article Chemistry, Multidisciplinary

DeepAC - conditional transformer-based chemical language model for the prediction of activity cliffs formed by bioactive compounds

Hengwei Chen et al.

Summary: Activity cliffs (ACs) are pairs of structurally similar or analogous active small molecules with significant differences in potency. They are of high interest in medicinal chemistry as they reveal structure-activity relationship (SAR) determinants for compound optimization. In the field of molecular machine learning, ACs serve as test cases for predicting non-linear SARs between compound pairs. This study develops and evaluates chemical language models for AC prediction, demonstrating the accuracy of a conditional transformer called DeepAC in predicting ACs with minimal training data compared to other machine learning methods. DeepAC bridges the gap between predictive modeling and compound design, making it valuable for practical applications.

DIGITAL DISCOVERY (2022)

Article Multidisciplinary Sciences

AlphaDrug: protein target specific de novo molecular generation

Hao Qian et al.

Summary: Traditional drug discovery is a time-consuming and expensive process due to the complexity of the molecular search space. Researchers have turned to machine learning methods for help, but most existing methods either focus on virtual screening or unconditional molecular generation. In this paper, the authors propose a protein target-oriented de novo drug design method called AlphaDrug, which can automatically generate molecular drug candidates that can dock well with the target protein. They use a modified transformer network and a Monte Carlo tree search (MCTS) algorithm for conditional molecular generation. Experimental results demonstrate the effectiveness of their methods in diverse protein targets, suggesting that AlphaDrug is a promising solution for target-specific de novo drug design.

PNAS NEXUS (2022)

Article Biochemical Research Methods

MolTrans: Molecular Interaction Transformer for drug-target interaction prediction

Kexin Huang et al.

Summary: The MolTrans model improves the accuracy and interpretability of drug-target interaction prediction through knowledge-inspired sub-structural pattern mining algorithm and augmented transformer encoder, better extracting and capturing semantic relations among sub-structures extracted from massive unlabeled biomedical data.

BIOINFORMATICS (2021)

Article Multidisciplinary Sciences

Transformer neural network for protein-specific de novo drug generation as a machine translation problem

Daria Grechishnikova

Summary: This study proposes a method that generates novel molecules with predicted ability to bind a target protein solely based on its amino acid sequence. By utilizing the Transformer neural network architecture, the model is able to produce realistic diverse compounds with structural novelty, showing promising results in drug discovery.

SCIENTIFIC REPORTS (2021)

Article Multidisciplinary Sciences

Discovery of novel chemical reactions by deep generative recurrent neural network

William Bort et al.

Summary: Creative Artificial Intelligence has the ability to generate novel molecular structures and chemical reactions. By coupling reaction space cartography, it is possible to focus on desired reaction classes. Autoencoders can be trained to encode reactions and decode them in latent space for analysis and experimentation.

SCIENTIFIC REPORTS (2021)

Article Chemistry, Medicinal

Generative Models for De Novo Drug Design

Xiaochu Tong et al.

Summary: Generative models in the field of artificial intelligence have made remarkable achievements in drug design, covering various models and applications. Through generative models, compounds can be generated to expand the compound library, design compounds with specific properties, and use some publicly available tools to directly generate molecules.

JOURNAL OF MEDICINAL CHEMISTRY (2021)

Article Computer Science, Artificial Intelligence

Chemical language models enable navigation in sparsely populated chemical space

Michael A. Skinnider et al.

Summary: Researchers found that robust generative models can be learned from fewer examples than previously thought. They also identify reliable metrics for evaluating the quality of generated molecules.

NATURE MACHINE INTELLIGENCE (2021)

Article Multidisciplinary Sciences

State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis

Igor V. Tetko et al.

NATURE COMMUNICATIONS (2020)

Article Chemistry, Multidisciplinary

SyntaLinker: automatic fragment linking with deep conditional transformer neural networks

Yuyao Yang et al.

CHEMICAL SCIENCE (2020)

Article Chemistry, Multidisciplinary

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction

Philippe Schwaller et al.

ACS CENTRAL SCIENCE (2019)

Article Chemistry, Multidisciplinary

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

Marwin H. S. Segler et al.

ACS CENTRAL SCIENCE (2018)

Article Chemistry, Multidisciplinary

Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules

Rafael Gomez-Bombarelli et al.

ACS CENTRAL SCIENCE (2018)

Article Computer Science, Artificial Intelligence

A Primer on Neural Network Models for Natural Language Processing

Yoav Goldberg

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH (2016)

Review Multidisciplinary Sciences

Advances in natural language processing

Julia Hirschberg et al.

SCIENCE (2015)