☆ 4.7 Article

Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2022)

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING

卷 62, 期 5, 页码 1199-1206

出版社

AMER CHEMICAL SOC

DOI: 10.1021/acs.jcim.2c00079

关键词

类别

Chemistry, Medicinal Chemistry, Multidisciplinary Computer Science, Information Systems Computer Science, Interdisciplinary Applications

资金

Swiss National Science Foundation [205321_182176]
RETHINK initiative at ETH Zurich
Swiss National Science Foundation (SNF) [205321_182176] Funding Source: Swiss National Science Foundation (SNF)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Chemical language models (CLMs) are useful for designing molecules with desired properties. This study introduces the perplexity metric to evaluate the generated molecules' similarity to the design objectives, ranking the promising designs. The perplexity scoring also helps identify and remove undesired biases in the model training process.

Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare greedy (beam search) with explorative (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.

Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING

出版社

AMER CHEMICAL SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING

出版社

AMER CHEMICAL SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文