☆ 4.7 Article

Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2022)

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING

Volume 62, Issue 5, Pages 1199-1206

Publisher

AMER CHEMICAL SOC

DOI: 10.1021/acs.jcim.2c00079

Keywords

Funding

Swiss National Science Foundation [205321_182176]
RETHINK initiative at ETH Zurich
Swiss National Science Foundation (SNF) [205321_182176] Funding Source: Swiss National Science Foundation (SNF)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Chemical language models (CLMs) are useful for designing molecules with desired properties. This study introduces the perplexity metric to evaluate the generated molecules' similarity to the design objectives, ranking the promising designs. The perplexity scoring also helps identify and remove undesired biases in the model training process.

Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare greedy (beam search) with explorative (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.

Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING

Publisher

AMER CHEMICAL SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING

Publisher

AMER CHEMICAL SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper