☆ 4.3 Article

Using balanced iterative reducing and clustering hierarchies to compute approximate rank statistics on massive datasets

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION (2014)

Journal

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

Volume 84, Issue 10, Pages 2214-2232

Publisher

TAYLOR & FRANCIS LTD

DOI: 10.1080/00949655.2013.787534

Keywords

correlation; rank statistics; massive dataset; Kendall's tau; Spearman's rho; BIRCH

Funding

Natural Sciences and Engineering Research Council of Canada
Fonds de recherche du Quebec - Nature et technologies

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

The balanced iterative reducing and clustering hierarchies (BIRCH) algorithm handles massive datasets by reading the data file only once, clustering the data as it is read, and retaining only a few clustering features to summarize the data read so far. Using BIRCH allows to analyse datasets that are too large to fit in the computer main memory. We propose estimates of Spearman's rho and Kendall's tau that are calculated from a BIRCH output and assess their performance through Monte Carlo studies. The numerical results show that the BIRCH-based estimates can achieve the same efficiency as the usual estimates of rho and tau while using only a fraction of the memory otherwise required.

Using balanced iterative reducing and clustering hierarchies to compute approximate rank statistics on massive datasets

Journal

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

Publisher

TAYLOR & FRANCIS LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Using balanced iterative reducing and clustering hierarchies to compute approximate rank statistics on massive datasets

Journal

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION

Publisher

TAYLOR & FRANCIS LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper