Journal
DIGITAL DISCOVERY
Volume 2, Issue 4, Pages 897-908Publisher
ROYAL SOC CHEMISTRY
DOI: 10.1039/d3dd00044c
Keywords
-
Ask authors/readers for more resources
String-based molecular representations are important in cheminformatics, and SELFIES is a robust representation that has been improved and extended. The updated version of selfies library, version 2.1.1, is described in this paper.
String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencing embedded strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation called selfies. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints, and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of selfies, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of selfies (version 2.1.1) in this manuscript. Our library, selfies, is available at GitHub (https://github.com/aspuru-guzik-group/selfies). We describe the current state of the SELFIES library (version 2.1.1), and, in particular, the advances and improvements we have made in its underlying algorithms, design, and API.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available