4.7 Article

Expected similarity estimation for large-scale batch and streaming anomaly detection

Journal

MACHINE LEARNING
Volume 105, Issue 3, Pages 305-333

Publisher

SPRINGER
DOI: 10.1007/s10994-016-5567-7

Keywords

Anomaly detection; Large-scale data; Kernel methods; Hilbert space embedding; Mean map

Ask authors/readers for more resources

We present a novel algorithm for anomaly detection on very large datasets and data streams. Themethod, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detectionwhile being an order of magnitude faster than most other approaches.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available