☆ 4.5 Article Proceedings Paper

Identifying the components

DATA MINING AND KNOWLEDGE DISCOVERY (2009)

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Volume 19, Issue 2, Pages 176-193

Publisher

SPRINGER

DOI: 10.1007/s10618-009-0137-2

Keywords

MDL; Database components; Clusters

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Most, if not all, databases are mixtures of samples from different distributions. Transactional data is no exception. For the prototypical example, supermarket basket analysis, one also expects a mixture of different buying patterns. Households of retired people buy different collections of items than households with young children. Models that take such underlying distributions into account are in general superior to those that do not. In this paper we introduce two MDL-based algorithms that follow orthogonal approaches to identify the components in a transaction database. The first follows a model-based approach, while the second is data-driven. Both are parameter-free: the number of components and the components themselves are chosen such that the combined complexity of data and models is minimised. Further, neither prior knowledge on the distributions nor a distance metric on the data is required. Experiments with both methods show that highly characteristic components are identified.

Identifying the components

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Identifying the components

Journal

DATA MINING AND KNOWLEDGE DISCOVERY

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper