☆ 4.5 Article

Finding the most interesting correlations in a database: how hard can it be?

INFORMATION SYSTEMS (2005)

Journal

INFORMATION SYSTEMS

Volume 30, Issue 1, Pages 21-46

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.is.2003.08.004

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

This paper addresses some of the foundational issues associated with discovering the best few correlations from a database. Specifically, we consider the computational complexity of various definitions of the top-k correlation problem. where the goal is to discover the few sets of events whose co-occurrence exhibits the smallest degree of independence. Our results show that many rigorous definitions of correlation lead to intractable and strongly inapproximable problems. Proof of this inapproximability is significant, since similar problems studied by the computer science theory community have resisted such analysis. One goal of the paper (and for future research) is to develop alternative correlation metrics whose use will both allow efficient search and produce results that are satisfactory for users. (C) 2003 Elsevier Ltd. All rights reserved.

Finding the most interesting correlations in a database: how hard can it be?

Journal

INFORMATION SYSTEMS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Finding the most interesting correlations in a database: how hard can it be?

Journal

INFORMATION SYSTEMS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper