4.3 Article

Robust principal component analysis and outlier detection with ecological data

期刊

ENVIRONMETRICS
卷 15, 期 2, 页码 129-139

出版社

WILEY
DOI: 10.1002/env.628

关键词

principal component analysis; outliers; robust statistics; water chemistry; environmental data; multivariate analysis

向作者/读者索取更多资源

Ecological studies frequently involve large numbers of variables and observations, and these are often subject to various errors. If some data are not representative of the study population, they tend to bias the interpretation and conclusion of an ecological study. Because of the multivariate nature of ecological data, it is very difficult to identify atypical observations using approaches such as univariate or bivariate plots. This difficulty calls for the application of robust statistical methods in identifying atypical observations. Our study provides a comparison of a standard method, based on the Mahalanobis distance, used in multivariate approaches to a robust method based on the minimum volume ellipsoid as a means of determining whether data sets contain outliers or not. We evaluate both methods using simulations varying conditions of the data, and show that the minimum volume ellipsoid approach is superior in detecting outliers where present. We show that, as the sample size parameter, h, used in the robust approach increases in value, there is a decrease in the accuracy and precision of the associated estimate of the number of outliers present, in particular as the number of outliers increases. Conversely, where no outliers are present, large values for the parameter provide the most accurate results. In addition to the simulation results, we demonstrate the use of the robust principal component analysis with a data set of lake-water chemistry variables to illustrate the additional insight available. We suggest that ecologists consider that their data may contain atypical points. Following checks associated with normality, bivariate linearity and other traditional aspects, we advocate that ecologists examine their data sets using robust multivariate methods. Points identified as being atypical should be carefully evaluated based on background information to determine their suitability for inclusion in further multivariate analyses and whether additional factors explain their unusual characteristics. Copyright (C) 2004 John Wiley Sons, Ltd.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.3
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据