4.7 Article

MIM-ML: A Novel Quantum Chemical Fragment-Based Random Forest Model for Accurate Prediction of NMR Chemical Shifts of Nucleic Acids

期刊

JOURNAL OF CHEMICAL THEORY AND COMPUTATION
卷 19, 期 19, 页码 6632-6642

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jctc.3c00563

关键词

-

向作者/读者索取更多资源

This study developed a random forest machine learning model for predicting chemical shifts of nucleic acids. The model showed excellent performance in predicting chemical shifts despite the presence of nonstandard structures, and both structural and electronic descriptors were found to be critical for reliable predictions.
We developed a random forest machine learning (ML) model for the prediction of H-1 and C-13 NMR chemical shifts of nucleic acids. Our ML model is trained entirely on reproducing computed chemical shifts obtained previously on 10 nucleic acids using a Molecules-in-Molecules (MIM) fragment-based density functional theory (DFT) protocol including microsolvation effects. Our ML model includes structural descriptors as well as electronic descriptors from an inexpensive low-level semiempirical calculation (GFN2-xTB) and trained on a relatively small number of DFT chemical shifts (2080 H-1 chemical shifts and 1780 C-13 chemical shifts on the 10 nucleic acids). The ML model is then used to make chemical shift predictions on 8 new nucleic acids ranging in size from 600 to 900 atoms and compared directly to experimental data. Though no experimental data was used in the training, the performance of our model is excellent (mean absolute deviation of 0.34 ppm for H-1 chemical shifts and 2.52 ppm for C-13 chemical shifts for the test set), despite having some nonstandard structures. A simple analysis suggests that both structural and electronic descriptors are critical for achieving reliable predictions. This is the first attempt to combine ML from fragment-based DFT calculations to predict experimental chemical shifts accurately, making the MIM-ML model a valuable tool for NMR predictions of nucleic acids.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据