4.3 Article

Principal component analysis for compositional data with outliers

Journal

ENVIRONMETRICS
Volume 20, Issue 6, Pages 621-632

Publisher

WILEY
DOI: 10.1002/env.966

Keywords

robust statistics; compositional data; isometric logratio transformation; principal component analysis

Funding

  1. Council of Czech Government MSM [6198959214]

Ask authors/readers for more resources

Compositional data (almost all data in geochemistry) are closed data, that is they usually sum up to a constant (e.g weight percent, wt.%) and carry only relative information. Thus, the covariance structure of compositional data is strongly biased and results of many multivariate techniques become doubtful without a proper transformation of the data. The centred logratio transformation (clr) is often used to open closed data. However the transformed data do not have full rank following a logratio transformation and cannot be used for robust multivariate techniques like principal component analysis (PCA). Here we propose to use the isometric logratio transformation (ilr) instead. However, the ilr transformation has the disadvantage that the resulting new variables are no longer directly interpretable in terms of the originally entered variables. Here we propose a technique how the resulting scores and loadings of a robust PCA on ilr transformed data can be back-transformed and interpreted. The procedure is demonstrated using a real data set from regional geochemistry and compared to results from non-transformed and non-robust versions of PCA. It turns out that the procedure using ilr-transformed data and robust PCA delivers superior results to all other approaches. The examples demonstrate that due to the compositional nature of geochemical data PCA should not be carried Out Without an appropriate transformation. Furthermore a robust approach is preferable if the dataset contains outliers. Copyright (C) 2009 John Wiley & Sons, Ltd.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available