☆ 4.5 Article

Energy-based anomaly detection for mixed data

KNOWLEDGE AND INFORMATION SYSTEMS (2018)

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Volume 57, Issue 2, Pages 413-435

Publisher

SPRINGER LONDON LTD

DOI: 10.1007/s10115-018-1168-z

Keywords

Mixed data; Mixed-variate restricted Boltzmann machine; Deep belief net; Multilevel anomaly detection

Funding

Telstra-Deakin Centre of Excellence in Big Data and Machine Learning

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Anomalies are those deviating significantly from the norm. Thus, anomaly detection amounts to finding data points located far away from their neighbors, i.e., those lying in low-density regions. Classic anomaly detection methods are largely designed for single data type such as continuous or discrete. However, real-world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Mixed data poses multiple challenges including (a) capturing the inter-type correlation structures and (b) measuring deviation from the norm under multiple types. These challenges are exaggerated under (c) high-dimensional regimes. In this paper, we propose a new scalable unsupervised anomaly detection method for mixed data based on Mixed-variate Restricted Boltzmann Machine (Mv. RBM). The Mv. RBM is a principled probabilistic method that estimates density of mixed data. We propose to use free energy derived from Mv. RBM as anomaly score as it is identical to data negative log-density up to an additive constant. We then extend this method to detect anomalies across multiple levels of data abstraction, an effective approach to deal with high-dimensional settings. The extension is dubbed MIXMAD, which stands for MIXed data Multilevel Anomaly Detection. In MIXMAD, we sequentially construct an ensemble of mixed-data Deep Belief Nets (DBNs) with varying depths. Each DBN is an energy-based detector at a predefined abstraction level. Predictions across the ensemble are finally combined via a simple rank aggregation method. The proposed methods are evaluated on a comprehensive suit of synthetic and real high-dimensional datasets. The results demonstrate that for anomaly detection, (a) a proper handling of mixed types is necessary, (b) free energy is a powerful anomaly scoring method, (c) multilevel abstraction of data is important for high-dimensional data, and (d) empirically Mv. RBM and MIXMAD are superior to popular unsupervised detection methods for both homogeneous and mixed data.

Energy-based anomaly detection for mixed data

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Energy-based anomaly detection for mixed data

Journal

KNOWLEDGE AND INFORMATION SYSTEMS

Publisher

SPRINGER LONDON LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper