☆ 4.7 Article

A performance comparison of modem statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS (2005)

期刊

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS

卷 76, 期 2, 页码 185-196

出版社

ELSEVIER SCIENCE BV

DOI: 10.1016/j.chemolab.2004.11.001

关键词

CART; bagging; random forests; gradient boosting; genetic algorithms; QSRR; retention prediction

类别

Automation & Control Systems Chemistry, Analytical Computer Science, Artificial Intelligence Instruments & Instrumentation Mathematics, Interdisciplinary Applications Statistics & Probability

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

As datasets are becoming larger, a solution to the problem of variable prediction, this problem is becoming harder. The problem is to define which subset of variables produces optimum predictions. The example studied aims to predict the chromatographic retention of 83 basic drugs on a Unisphere PBD column at pH 11.7 using 1272 molecular descriptors. The goal of this paper is to compare the relative performance of recently developed data mining methods, specifically classification and regression trees (CART), stochastic gradient boosting for tree-based models (Treeboost), and random forests (RF), with common statistical techniques in chemometrics; and genetic algorithms on multiple linear regression (GA-MLR), uninformative variable elimination partial least squares (UVE-PLS), and SIMPLS. The comparison will be performed primarily on predictive performance, but also on the variables found to be most important for the predictions. The results of this study indicated that, individually, GA-MLR (R-2=0.93) outperformed all models. Further analysis found that a combination approach of GA-MLR and Treeboost (R-2=0.98) further improved these results. (c) 2004 Elsevier B.V. All rights reserved.

A performance comparison of modem statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies

期刊

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS

出版社

ELSEVIER SCIENCE BV

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A performance comparison of modem statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies

期刊

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS

出版社

ELSEVIER SCIENCE BV

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文