4.4 Review

Chemical language models for molecular design

Journal

MOLECULAR INFORMATICS
Volume -, Issue -, Pages -

Publisher

WILEY-V C H VERLAG GMBH
DOI: 10.1002/minf.202300288

Keywords

drug design; language models; recurrent neural networks; encoder-decoder frameworks; transformers; attention mechanisms

Ask authors/readers for more resources

This article discusses the opportunities and methods of applying chemical language models (CLMs) in drug discovery. CLMs can be developed using recurrent neural networks or transformer architectures, and attention mechanisms are used to improve predictive performance. CLMs can be used for constrained generative modeling and the prediction of chemical reactions or drug-target interactions. Since CLMs can learn mappings of different types of sequences and are applicable to any compound or target data presented in a sequential format and tokenized, they have a wide range of applications.
In drug discovery, chemical language models (CLMs) originating from natural language processing offer new opportunities for molecular design. CLMs have been developed using recurrent neural network (RNN) or transformer architectures. For the predictive performance of RNN-based encoder-decoder frameworks and transformers, attention mechanisms play a central role. Among others, emerging application areas for CLMs include constrained generative modeling and the prediction of chemical reactions or drug-target interactions. Since CLMs are applicable to any compound or target data that can be presented in a sequential format and tokenized, mappings of different types of sequences can be learned. For example, active compounds can be predicted from protein sequence motifs. Novel off-the-beat-path applications can also be considered. For example, analogue series from medicinal chemistry can be perceived and represented as chemical sequences and extended with new compounds using CLMs. Herein, methodological features of CLMs and different applications are discussed. image

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available