4.3 Article

Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data

Journal

INTELLIGENT DATA ANALYSIS
Volume 18, Issue 5, Pages 761-791

Publisher

IOS PRESS
DOI: 10.3233/IDA-140668

Keywords

Pattern mining; episode mining; data generation; quality evaluation

Ask authors/readers for more resources

Frequent episode mining has been proposed as a data mining task for recovering sequential patterns from temporal data sequences and several approaches have been introduced over the last fifteen years. These techniques have however never been compared against each other in a large scale comparison, mainly because the existing real life data is prevented from entering the public domain by non-disclosure agreements. We perform such a comparison for the first time. To get around the problem of proprietary data, we employ a data generator based on a number of real life observations and capable of generating data that mimics real life data at our disposal. Artificial data offers the additional advantage that the underlying patterns are known, which is typically not the case for real life data. Thus, we can evaluate for the first time the ability of mining approaches to recover patterns that are embedded in noise. Our experiments indicate that temporal constraints are more important in affecting the effectiveness of episode mining than occurrence semantics. They also indicate that recovering underlying patterns when several phenomena are present at the same time is rather difficult and that there is need to develop better significance measures and techniques for dealing with sets of episodes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available