4.7 Article

Perplexity-Based Molecule Ranking and Bias Estimation of Chemical Language Models

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING
卷 62, 期 5, 页码 1199-1206

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.2c00079

关键词

-

资金

  1. Swiss National Science Foundation [205321_182176]
  2. RETHINK initiative at ETH Zurich
  3. Swiss National Science Foundation (SNF) [205321_182176] Funding Source: Swiss National Science Foundation (SNF)

向作者/读者索取更多资源

Chemical language models (CLMs) are useful for designing molecules with desired properties. This study introduces the perplexity metric to evaluate the generated molecules' similarity to the design objectives, ranking the promising designs. The perplexity scoring also helps identify and remove undesired biases in the model training process.
Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare greedy (beam search) with explorative (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据