☆ 4.7 Article

Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model

BIOMOLECULES (2023)

Journal

BIOMOLECULES

Volume 13, Issue 5, Pages -

Publisher

MDPI

DOI: 10.3390/biom13050833

Keywords

proteins; ligand binding sites; active compounds; molecular design; sequence motifs; molecular string representation; machine translation; transformer architecture; kinase inhibitors

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In the field of drug design, very few studies have attempted the prediction of new active compounds from protein sequence data. This is mainly due to the challenging nature of this prediction task, as global protein sequence similarity has strong evolutionary and structural implications but is not directly related to ligand binding. However, the application of deep language models adapted from natural language processing provides new opportunities to attempt such predictions by linking amino acid sequences and chemical structures through textual molecular representations. In this study, a biochemical language model with a transformer architecture, named Motif2Mol, was introduced for the prediction of new active compounds based on sequence motifs of ligand binding sites. In a proof-of-concept application on inhibitors of more than 200 human kinases, Motif2Mol exhibited promising learning characteristics and an unprecedented ability to consistently reproduce known inhibitors of different kinases.

In drug design, the prediction of new active compounds from protein sequence data has only been attempted in a few studies thus far. This prediction task is principally challenging because global protein sequence similarity has strong evolutional and structural implications, but is often only vaguely related to ligand binding. Deep language models adapted from natural language processing offer new opportunities to attempt such predictions via machine translation by directly relating amino acid sequences and chemical structures to each based on textual molecular representations. Herein, we introduce a biochemical language model with transformer architecture for the prediction of new active compounds from sequence motifs of ligand binding sites. In a proof-of-concept application on inhibitors of more than 200 human kinases, the Motif2Mol model revealed promising learning characteristics and an unprecedented ability to consistently reproduce known inhibitors of different kinases.

Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model

Journal

BIOMOLECULES

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model

Journal

BIOMOLECULES

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper