4.6 Article

An open-source framework for fast-yet-accurate calculation of quantum mechanical features

期刊

PHYSICAL CHEMISTRY CHEMICAL PHYSICS
卷 24, 期 17, 页码 10599-10610

出版社

ROYAL SOC CHEMISTRY
DOI: 10.1039/d2cp01165d

关键词

-

资金

  1. Hanna Leek and Kristina oheln from the Separation Sciences Lab at AstraZeneca

向作者/读者索取更多资源

We present an open-source framework called kallisto for efficient and accurate calculation of quantum mechanical features of atoms and molecules. The method demonstrates strong predictive power in the calculation of molecular polarizabilities and retention times, with significantly lower computational costs compared to traditional approaches. We also introduce a versatile van der Waals radius model based on atomic static polarizabilities, which can be efficiently applied to proteins. The study further shows the improvement in predictive power when combining molecular fingerprints with physicochemical descriptors, and validates the physical nature of the applied features in machine-learning models. The kallisto framework is recommended as a robust and cost-effective featurizer for future state-of-the-art machine-learning studies.
We present the open-source framework kallisto that enables the efficient and robust calculation of quantum mechanical features for atoms and molecules. For a benchmark set of 49 experimental molecular polarizabilities, the predictive power of the presented method competes against second-order perturbation theory in a converged atomic-orbital basis set at a fraction of its computational costs. The calculation of isotropic molecular polarizabilities is robust for a data set of more than 80 000 molecules. We present furthermore a generally applicable van der Waals radius model that is rooted on atomic static polarizabilites. Efficiency tests show that such radii can even be calculated for small- to medium-size proteins where the largest system (SARS-CoV-2 spike protein) has 42 539 atoms. Following the work of Domingo-Alemenara et al. [Domingo-Alemenara et al., Nat. Commun., 2019, 10, 5811], we present computational predictions for retention times for different chromatographic methods and describe how physicochemical features improve the predictive power of machine-learning models that otherwise only rely on two-dimensional features like molecular fingerprints. Additionally, we developed an internal benchmark set of experimental super-critical fluid chromatography retention times. For those methods, improvements of up to 10.6% are obtained when combining molecular fingerprints with physicochemical descriptors. Shapley additive explanation values show furthermore that the physical nature of the applied features can be retained within the final machine-learning models. We generally recommend the kallisto framework as a robust, low-cost, and physically motivated featurizer for upcoming state-of-the-art machine-learning studies.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据