期刊
JOURNAL OF CHEMICAL INFORMATION AND MODELING
卷 62, 期 5, 页码 1199-1206出版社
AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.2c00079
关键词
-
类别
资金
- Swiss National Science Foundation [205321_182176]
- RETHINK initiative at ETH Zurich
- Swiss National Science Foundation (SNF) [205321_182176] Funding Source: Swiss National Science Foundation (SNF)
Chemical language models (CLMs) are useful for designing molecules with desired properties. This study introduces the perplexity metric to evaluate the generated molecules' similarity to the design objectives, ranking the promising designs. The perplexity scoring also helps identify and remove undesired biases in the model training process.
Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare greedy (beam search) with explorative (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据