4.7 Article

A practical outlier detection approach for mixed-attribute data

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 42, 期 22, 页码 8637-8649

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2015.07.018

关键词

Data mining; Outlier detection; Mixed-attribute data; Mixture model; Bivariate beta

资金

  1. Natural Sciences and Engineering Research Council of Canada (NSERC) [402495-2011]

向作者/读者索取更多资源

Outlier detection in mixed-attribute space is a challenging problem for which only a few approaches have been proposed. However, such existing methods suffer from the fact that there is a lack of an automatic mechanism to formally discriminate between outliers and inliers. In fact, a common approach to outlier identification is to estimate an outlier score for each object and then provide a ranked list of points, expecting outliers to come first. A major problem of such an approach is where to stop reading the ranked list? How many points should be chosen as outliers? Other methods, instead of outlier ranking, implement various strategies that depend on user-specified thresholds to discriminate outliers from inliers. Ad-hoc threshold values are often used. With such an unprincipled approach it is impossible to be objective or consistent. To alleviate these problems, we propose a principled approach based on the bivariate beta mixture model to identify outliers in mixed-attribute data. The proposed approach is able to automatically discriminate outliers from inliers and it can be applied to both mixed-type attribute and single-type (numerical or categorical) attribute data without any feature transformation. Our experimental study demonstrates the suitability of the proposed approach in comparison to mainstream methods. (C) 2015 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据