4.7 Article

The importance of outlier detection and training set selection for reliable environmental QSAR predictions

期刊

CHEMOSPHERE
卷 63, 期 1, 页码 99-108

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.chemosphere.2005.07.002

关键词

prediction outlier diagnostics; quantitative structure-activity relationships; partial least squares (PLS); Pseudokirschneriella subcapitata; Daphnia magna; Lepomis macrochirus

向作者/读者索取更多资源

Empirical QSAR models are only valid in the domain they were trained and validated. Application of the model to substances outside the domain of the model can lead to grossly erroneous predictions. Partial least squares (PLS) regression provides tools for prediction diagnostics that can be used to decide whether or not a substance is within the model domain, i.e. if the model prediction can be trusted. QSAR models for four different environmental end-points are used to demonstrate the importance of appropriate training set selection and how the reliability of QSAR predictions can be increased by outlier diagnostics. All models showed consistent results; test set prediction errors were very similar in magnitude to training set estimation errors when prediction outlier diagnostics were used to detect and remove outliers in the prediction data. Test set prediction errors for substances classified as outliers were much larger. The difference in the number of outliers between models with a randomly and systematically selected training illustrates well the need of representative training data. (c) 2005 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据