4.4 Article

A new sampling technique for association rule mining

Journal

JOURNAL OF INFORMATION SCIENCE
Volume 35, Issue 3, Pages 358-376

Publisher

SAGE PUBLICATIONS LTD
DOI: 10.1177/0165551508100382

Keywords

sampling; parameterized sampling; data reduction; data mining; association rule mining; information retrieval

Ask authors/readers for more resources

Association Rule Mining (ARM) is one of the data mining techniques used to extract hidden knowledge from datasets, that can be used by an organization's decision makers to improve overall profit. However, performing ARM requires repeated passes over the entire database. Obviously, for large database, the role of input/output overhead in scanning the database is very significant. A popular solution to improve the speed of ARM is to apply the mining algorithm on a sample instead of the entire database. In this paper, a parameterized sampling algorithm for ARM is presented. This algorithm extracts sample datasets based on three parameters: transaction frequency, transaction length and transaction frequency-length. To evaluate its performance and accuracy, a comparison against a two-phase sampling-based algorithm is performed using real and synthetic datasets. The experimental results show that the proposed sampling algorithm in some cases outperforms two-phase sampling algorithm, and achieves up to 98% accuracy.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available