☆ 4.3 Article

Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data

INTELLIGENT DATA ANALYSIS (2014)

Journal

INTELLIGENT DATA ANALYSIS

Volume 18, Issue 5, Pages 761-791

Publisher

IOS PRESS

DOI: 10.3233/IDA-140668

Keywords

Pattern mining; episode mining; data generation; quality evaluation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Frequent episode mining has been proposed as a data mining task for recovering sequential patterns from temporal data sequences and several approaches have been introduced over the last fifteen years. These techniques have however never been compared against each other in a large scale comparison, mainly because the existing real life data is prevented from entering the public domain by non-disclosure agreements. We perform such a comparison for the first time. To get around the problem of proprietary data, we employ a data generator based on a number of real life observations and capable of generating data that mimics real life data at our disposal. Artificial data offers the additional advantage that the underlying patterns are known, which is typically not the case for real life data. Thus, we can evaluate for the first time the ability of mining approaches to recover patterns that are embedded in noise. Our experiments indicate that temporal constraints are more important in affecting the effectiveness of episode mining than occurrence semantics. They also indicate that recovering underlying patterns when several phenomena are present at the same time is rather difficult and that there is need to develop better significance measures and techniques for dealing with sets of episodes.

Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data

Journal

INTELLIGENT DATA ANALYSIS

Publisher

IOS PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data

Journal

INTELLIGENT DATA ANALYSIS

Publisher

IOS PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper