4.5 Article

Concept and role of extreme objects in PCA/SIMCA

Journal

JOURNAL OF CHEMOMETRICS
Volume 28, Issue 5, Pages 429-438

Publisher

WILEY
DOI: 10.1002/cem.2506

Keywords

PCA; SIMCA; robust methods; outliers; extreme samples; chi-squared distribution; scores and orthogonal distances; tolerance areas; thresholds

Ask authors/readers for more resources

For the construction of a reliable decision area in the soft independent modeling by class analogy (SIMCA) method, it is necessary to analyze calibration data revealing the objects of special types such as extremes and outliers. For this purpose, a thorough statistical analysis of the scores and orthogonal distances is necessary. The distance values should be considered as any data acquired in the experiment, and their distributions are estimated by a data-driven method, such as a method of moments or similar. The scaled chi-squared distribution seems to be the first candidate among the others in such an assessment. This provides the possibility of constructing a two-level decision area, with the extreme and outlier thresholds, both in case of regular data set and in the presence of outliers. We suggest the application of classical principal component analysis (PCA) with further use of enhanced robust estimators both for the scaling factor and for the number of degrees of freedom. A special diagnostic tool called extreme plot is proposed for the analyses of calibration objects. Extreme objects play an important role in data analysis. These objects are a mandatory attribute of any data set. The advocated dual data-driven PCA/SIMCA (DD-SIMCA) approach has demonstrated a proper performance in the analysis of simulated and real-world data for both regular and contaminated cases. DD-SIMCA has also been compared with robust principal component analysis, which is a fully robust method. Copyright (c) 2013 John Wiley & Sons, Ltd. A novel, semi-robust, data driven technique (DD-SIMCA) is proposed in the PCA/SIMCA context. DD-SIMCA is a dual method of estimation: classical for regular data, and robust for contaminated data. The method provides a clear association with extreme and outlier significance levels. It is shown that being combined with new diagnostic tool called Extreme plot, DD-SIMCA demonstrates a good performance in comparison with the ROBPCA method in the analysis of both regular and contaminated data sets.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available