☆ 4.3 Article

Language models for the prediction of SARS-CoV-2 inhibitors

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS (2022)

Journal

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS

Volume 36, Issue 5-6, Pages 587-602

Publisher

SAGE PUBLICATIONS LTD

DOI: 10.1177/10943420221121804

Keywords

COVID-19; drug design; machine learning; language model; pre-training; fine-tuning; genetic algorithm

Funding

Exascale Computing Project [17SC-20-SC]
DOE CARES through the Advanced Scientific Computing Research (ASCR) program

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The article introduces the research results of using deep learning language models to generate and score drug candidates, reducing pre-training time and increasing dataset size. By fine-tuning the language model, inhibitors for important protein targets of the novel coronavirus were successfully identified. Genetic algorithm was used to find optimal candidates.

The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on similar to 9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.

Language models for the prediction of SARS-CoV-2 inhibitors

Journal

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS

Publisher

SAGE PUBLICATIONS LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Language models for the prediction of SARS-CoV-2 inhibitors

Journal

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS

Publisher

SAGE PUBLICATIONS LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper