4.7 Article

Data mining the Cambridge Structural Database for hydrate-anhydrate pairs with SMILES strings

Journal

CRYSTENGCOMM
Volume 22, Issue 43, Pages 7290-7297

Publisher

ROYAL SOC CHEMISTRY
DOI: 10.1039/d0ce00273a

Keywords

-

Funding

  1. National Science Foundation [DMR-1609541]
  2. Henry Luce Foundation (JEW)

Ask authors/readers for more resources

Many organic molecules can crystallize in either hydrated or anhydrous forms. Predicting the formation of hydrates and their relative stability with respect to water-free alternative phases are significant challenges. Here we use the Cambridge Structural Database (CSD) and data informatics to identify and analyze hydrate-anhydrate structure pairs. A search method was developed based on Simplified Molecular-Input Line-Entry strings (SMILES) matching and implemented through the CSD Python Application Programming Interface. Of the >23 000 molecular hydrates containing no metal ions, similar to 1400 were found to have at least one corresponding anhydrous form, yielding just over 2000 unique pairs in the CSD. Hydrates with and without a reported anhydrate showed a similar distribution in their water stoichiometries. Lattice symmetry and packing fraction comparisons are reported for the paired hydrates and anhydrates. Structure pairs with one organic component and multiple organic components showed some subtle differences. The details and limitations of the method are outlined in a way that can encourage and guide other types of CSD searches using SMILES.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available