4.7 Article

Separation of data on the training and test set for modelling: a case study for modelling of five colour properties of a white pigment

Journal

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS
Volume 65, Issue 2, Pages 221-229

Publisher

ELSEVIER SCIENCE BV
DOI: 10.1016/S0169-7439(02)00110-7

Keywords

modelling; white pigment; training set; artificial neural network (ANN)

Ask authors/readers for more resources

In order to evaluate the influence of the choice of the data for the training set on the prediction ability of linear and nonlinear models, various methods for sample selection were tested. The study is carried on for modelling of five colour properties: whiteness (W-10), lightness (L* and L-p(*)), and hue (b* and b(p)(*)) of a titanium dioxide white pigment. In all variations of data selections and modelling, the same set of 132 samples of white pigment produced in a 6-month period was employed. As the modelling techniques standard multiple linear regression (MLR), radial basis functions (RBF) model and two artificial neural networks (ANNs) learning strategies, error-backpropagation (EBP ANN) and counterpropagation (CP ANN), were used. For each of the four modelling techniques, four different sample selections were picked out using the following methods: time-equidistant sampling of pigments produced during 6-month period (time dependent for short), random selection (RS), sampling from Kohonen self-organised top-maps (KOH), and Kennard-Stone maximal distance approach. Each time, exactly 66 samples for the training and 66 samples for the testing were chosen. The 66 testing objects were further divided into the test and control set. Only 13 objects were present in all four testing sets. These 13 objects were kept aside for the final control set, while the remaining 53 obtained from each division were used for testing the generated models at the very end of the entire modelling generation part of the work. Each sample (white pigment) in the study is characterised by 17 independent and five dependent variables. The best 80 models (for five pigment properties, each modelled by four different modelling methods, each of which generated by the training set of objects obtained by four different division methods) were tested and results were reported. It was found out that the differences in the quality of prediction abilities of models obtained by different modelling techniques are statistically significant (within alpha=0.05), while the division method is not. As the best modelling method, the error backpropagation was established. However, several exceptions from the general observations are present and discussed. (C) 2002 Elsevier Science B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available