☆ 4.2 Article Proceedings Paper

Mining data to find subsets of high activity

JOURNAL OF STATISTICAL PLANNING AND INFERENCE (2004)

Journal

JOURNAL OF STATISTICAL PLANNING AND INFERENCE

Volume 122, Issue 1-2, Pages 23-41

Publisher

ELSEVIER SCIENCE BV

DOI: 10.1016/j.jspi.2003.06.014

Keywords

ARF; data mining; recursive partitioning; classification tree

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Many data mining problems in biometrics research are concerned with trying to identify the characteristics of a subset of cases that responds substantially differently from the rest of the cases. For example, when studying the relationship between a response variable Y and a set of predictor variables, it is often of interest to determine what ranges of values of the predictor variables are associated with a high likelihood of Y = 1 (if Y is a Bernoulli variable) or with high values of Y (if Y is a continuous variable). We describe a criterion (H) and a recursive partitioning method (ARF) that directly addresses this question. A computational algorithm that makes ARF feasible for use even with very large datasets is presented. The basic version of ARF can be generalized to the case of multiple response variables, Y1,...,Y-t and other settings. We illustrate the effectiveness of ARF by mining a structure activity database, a hospital database, and some other real and simulated datasets. We conclude by proposing a basic paradigm for data mining. (C) 2003 Published by Elsevier B.V.

Mining data to find subsets of high activity

Journal

JOURNAL OF STATISTICAL PLANNING AND INFERENCE

Publisher

ELSEVIER SCIENCE BV

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Mining data to find subsets of high activity

Journal

JOURNAL OF STATISTICAL PLANNING AND INFERENCE

Publisher

ELSEVIER SCIENCE BV

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper