☆ 4.5 Article

Fast distributed outlier detection in mixed-attribute data sets

DATA MINING AND KNOWLEDGE DISCOVERY (2006)

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Volume 12, Issue 2-3, Pages 203-228

Publisher

SPRINGER

DOI: 10.1007/s10618-005-0014-6

Keywords

outlier detection; anomaly detection; distributed data mining; mining dynamic data; mixedattribute data sets; data streams

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Efficiently detecting outliers or anomalies is an important problem in many areas of science, medicine and information technology. Applications range from data cleaning to clinical diagnosis, from detecting anomalous defects in materials to fraud and intrusion detection. Over the past decade, researchers in data mining and statistics have addressed the problem of outlier detection using both parametric and non-parametric approaches in a centralized setting. However, there are still several challenges that must be addressed. First, most approaches to date have focused on detecting outliers in a continuous attribute space. However, almost all real-world data sets contain a mixture of categorical and continuous attributes. Categorical attributes are typically ignored or incorrectly modeled by existing approaches, resulting in a significant loss of information. Second, there have not been any general-purpose distributed outlier detection algorithms. Most distributed detection algorithms are designed with a specific domain (e.g. sensor networks) in mind. Third, the data sets being analyzed may be streaming or otherwise dynamic in nature. Such data sets are prone to concept drift, and models of the data must be dynamic as well. To address these challenges, we present a tunable algorithm for distributed outlier detection in dynamic mixed-attribute data sets.

Fast distributed outlier detection in mixed-attribute data sets

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Fast distributed outlier detection in mixed-attribute data sets

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper