期刊
EXPERT SYSTEMS WITH APPLICATIONS
卷 42, 期 22, 页码 8637-8649出版社
PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2015.07.018
关键词
Data mining; Outlier detection; Mixed-attribute data; Mixture model; Bivariate beta
类别
资金
- Natural Sciences and Engineering Research Council of Canada (NSERC) [402495-2011]
Outlier detection in mixed-attribute space is a challenging problem for which only a few approaches have been proposed. However, such existing methods suffer from the fact that there is a lack of an automatic mechanism to formally discriminate between outliers and inliers. In fact, a common approach to outlier identification is to estimate an outlier score for each object and then provide a ranked list of points, expecting outliers to come first. A major problem of such an approach is where to stop reading the ranked list? How many points should be chosen as outliers? Other methods, instead of outlier ranking, implement various strategies that depend on user-specified thresholds to discriminate outliers from inliers. Ad-hoc threshold values are often used. With such an unprincipled approach it is impossible to be objective or consistent. To alleviate these problems, we propose a principled approach based on the bivariate beta mixture model to identify outliers in mixed-attribute data. The proposed approach is able to automatically discriminate outliers from inliers and it can be applied to both mixed-type attribute and single-type (numerical or categorical) attribute data without any feature transformation. Our experimental study demonstrates the suitability of the proposed approach in comparison to mainstream methods. (C) 2015 Elsevier Ltd. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据