4.6 Article

A scalable bootstrap for massive data

出版社

WILEY
DOI: 10.1111/rssb.12050

关键词

Bootstrap; Computational efficiency; Estimator quality assessment; Massive data; Resampling

资金

  1. US Army Research Laboratory
  2. US Army Research Office [W911NF-11-1-0391]
  3. National Science Foundation [1122732]
  4. Direct For Computer & Info Scie & Enginr
  5. Office of Advanced Cyberinfrastructure (OAC) [1122732] Funding Source: National Science Foundation

向作者/读者索取更多资源

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large data sets-which are increasingly prevalent-the calculation of bootstrap-based quantities can be prohibitively demanding computationally. Although variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computations, these methods are generally not robust to specification of tuning parameters (such as the number of subsampled data points), and they often require knowledge of the estimator's convergence rate, in contrast with the bootstrap. As an alternative, we introduce the 'bag of little bootstraps' (BLB), which is a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient means of assessing the quality of estimators. The BLB is well suited to modern parallel and distributed computing architectures and furthermore retains the generic applicability and statistical efficiency of the bootstrap. We demonstrate the BLB's favourable statistical performance via a theoretical analysis elucidating the procedure's properties, as well as a simulation study comparing the BLB with the bootstrap, the m out of n bootstrap and subsampling. In addition, we present results from a large-scale distributed implementation of the BLB demonstrating its computational superiority on massive data, a method for adaptively selecting the BLB's tuning parameters, an empirical study applying the BLB to several real data sets and an extension of the BLB to time series data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据