Journal
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION
Volume 84, Issue 10, Pages 2214-2232Publisher
TAYLOR & FRANCIS LTD
DOI: 10.1080/00949655.2013.787534
Keywords
correlation; rank statistics; massive dataset; Kendall's tau; Spearman's rho; BIRCH
Funding
- Natural Sciences and Engineering Research Council of Canada
- Fonds de recherche du Quebec - Nature et technologies
Ask authors/readers for more resources
The balanced iterative reducing and clustering hierarchies (BIRCH) algorithm handles massive datasets by reading the data file only once, clustering the data as it is read, and retaining only a few clustering features to summarize the data read so far. Using BIRCH allows to analyse datasets that are too large to fit in the computer main memory. We propose estimates of Spearman's rho and Kendall's tau that are calculated from a BIRCH output and assess their performance through Monte Carlo studies. The numerical results show that the BIRCH-based estimates can achieve the same efficiency as the usual estimates of rho and tau while using only a fraction of the memory otherwise required.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available