4.7 Article

An Efficient Approach for Outlier Detection with Imperfect Data Labels

Journal

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Volume 26, Issue 7, Pages 1602-1616

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2013.108

Keywords

Outlier detection; data of uncertainty

Funding

  1. US National Science Foundation [IIS-0905215, CNS-1115234, IIS-0914934, DBI-0960443, OISE-1129076]
  2. US Department of Army [W911NF-12-1-0066]
  3. Google Mobile Program
  4. KAU grant
  5. Natural Science Foundation of China [61070033, 61203280, 61202270]
  6. Guangdong Natural Science Funds for Distinguished Young Scholar [S2013050014133]
  7. Natural Science Foundation of Guangdong province [9251009001000005, S2011040004187, S2012040007078]
  8. Specialized Research Fund for the Doctoral Program of Higher Education [20124420120004]
  9. Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, Overseas Outstanding Doctoral Fund [405120095]
  10. Australian Research Council Discovery Grant [DP1096218, DP130102691]
  11. ARC Linkage Grant [LP100200774, LP120100566]
  12. Direct For Computer & Info Scie & Enginr
  13. Division Of Computer and Network Systems [1115234] Funding Source: National Science Foundation
  14. Australian Research Council [LP120100566, LP100200774] Funding Source: Australian Research Council

Ask authors/readers for more resources

The task of outlier detection is to identify data objects that are markedly different from or inconsistent with the normal set of data. Most existing solutions typically build a model using the normal data and identify outliers that do not fit the represented model very well. However, in addition to normal data, there also exist limited negative examples or outliers in many applications, and data may be corrupted such that the outlier detection data is imperfectly labeled. These make outlier detection far more difficult than the traditional ones. This paper presents a novel outlier detection approach to address data with imperfect labels and incorporate limited abnormal examples into learning. To deal with data with imperfect labels, we introduce likelihood values for each input data which denote the degree of membership of an example toward the normal and abnormal classes respectively. Our proposed approach works in two steps. In the first step, we generate a pseudo training dataset by computing likelihood values of each example based on its local behavior. We present kernel k-means clustering method and kernel LOF-based method to compute the likelihood values. In the second step, we incorporate the generated likelihood values and limited abnormal examples into SVDD-based learning framework to build a more accurate classifier for global outlier detection. By integrating local and global outlier detection, our proposed method explicitly handles data with imperfect labels and enhances the performance of outlier detection. Extensive experiments on real life datasets have demonstrated that our proposed approaches can achieve a better tradeoff between detection rate and false alarm rate as compared to state-of-the-art outlier detection approaches.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available