4.7 Article

Cost-efficient unsupervised sample selection for multivariate calibration

Journal

Publisher

ELSEVIER
DOI: 10.1016/j.chemolab.2021.104352

Keywords

Sample selection; Unsupervised learning; Multivariate calibration; PLS regression; VC dimension; NIR

Funding

  1. Research Foundation-Flanders (FWO Brussels, Belgium) [1SA0721N]

Ask authors/readers for more resources

This study aims to select the most informative calibration samples in an unsupervised way based on spectral measurements by providing guidelines for addressing challenges in PLSR model building. Recommendations include calculating a sample size exceeding the model complexity by a factor of 12, performing selection in a PCA score space with a sufficient number of principal components, and using methods such as Kennard-Stone.
Indirect quantification of chemical composition through spectral measurements requires the establishment of multivariate calibration models. The reference analyses on the calibration samples typically form a major cost factor in the establishment of these multivariate models. Therefore, the aim of this study was to select the most informative calibration samples in an unsupervised way based on the spectral measurements. To this end, guidelines to address this challenge in PLSR model building have been developed. The recommendations include calculating a sample size that surpasses the model complexity by a factor of 12, performing the selection in the PCA score space spanned by a sufficiently large number of principal components and using methods such as Kennard-Stone, Puchwein, Clustering or D-optimal designs. We provide the data and methodology used in the present study for future use.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available