4.6 Article

Outlier labeling with boxplot procedures

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
卷 100, 期 470, 页码 642-652

出版社

AMER STATISTICAL ASSOC
DOI: 10.1198/016214504000001466

关键词

exploratory data analysis; order statistics; outlier region; tolerance limits

向作者/读者索取更多资源

In this article we focus on the detection of possible outliers based on the widely used boxplot procedures. The outliers in a set of data are defined to be a subset of observations that appear to be inconsistent with the remaining observations. We identify the outliers by constructing a boxplot with its lower fence (LF) and upper fence (UF) either (a) satisfying the requirement that if the given sample is outlier-free, then the probability that one or more of the sample data would fall outside the region (LF, UF) is equal to a prescribed small value alpha, or (b) taken to be the tolerance limits, derived from an outlier-free random sample, within which a specified large proportion of the sampled population would be asserted to fall with a given large probability gamma. Exact expressions that can be routinely used to evaluate the constants needed in the construction of the boxplot's outlier region for samples taken from the family of location-scale distributions are obtained for both procedures. This article shows that the commonly constructed boxplot is in general inappropriate for detecting outliers in the normal and especially the exponential samples. We recommend that the graphical boxplot be constructed based on the knowledge of the underlying distribution of the dataset and by controling the risk of labeling regular observations as outliers.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据