4.8 Article

Genetic Optimization of Training Sets for Improved Machine Learning Models of Molecular Properties

期刊

JOURNAL OF PHYSICAL CHEMISTRY LETTERS
卷 8, 期 7, 页码 1351-1359

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jpclett.7b00038

关键词

-

资金

  1. NCCR MARVEL - Swiss National Science Foundation
  2. Swiss National Science Foundation [PP00P2_138932]
  3. Swiss National Science Foundation (SNF) [PP00P2_138932] Funding Source: Swiss National Science Foundation (SNF)

向作者/读者索取更多资源

The training of molecular models of quantum mechanical properties based on statistical machine learning requires large data sets which exemplify the map from chemical structure to molecular property. Intelligent a priori selection of training examples is often difficult or impossible to achieve, as prior knowledge may be unavailable. Ordinarily representative selection of training molecules from such data sets is achieved through random sampling. We use genetic algorithms for the optimization of training set composition consisting of tens of thousands of small organic molecules. The resulting machine learning models are considerably more accurate: in the limit of small training sets, mean absolute errors for out-of-sample predictions are reduced by up to similar to 75%. We discuss and present optimized training sets consisting of 10 molecular classes for all molecular properties studied. We show that these classes can be used to design improved training sets for the generation of machine learning models of the same properties in similar but unrelated molecular sets.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据