4.6 Article

Reconstruction of lossless molecular representations from fingerprints

期刊

JOURNAL OF CHEMINFORMATICS
卷 15, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s13321-023-00693-0

关键词

Fingerprints; SMILES; SELFIES; Neural Machine Translation

向作者/读者索取更多资源

This study utilizes SMILES and SELFIES as important tools in chemical and natural language processing. By reconstructing these unique molecular representations from a set of structural fingerprints, the connectivity information lost during fingerprint transformation is restored, overcoming the major limitation of structural fingerprints in NLP models.
The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据