☆ 4.5 Article

Finding approximate solutions to combinatorial problems with very large data sets using BIRCH

COMPUTATIONAL STATISTICS & DATA ANALYSIS (2010)

期刊

COMPUTATIONAL STATISTICS & DATA ANALYSIS

卷 54, 期 3, 页码 655-667

出版社

ELSEVIER SCIENCE BV

DOI: 10.1016/j.csda.2008.08.001

关键词

类别

Computer Science, Interdisciplinary Applications Statistics & Probability

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Computing estimators with good robustness properties generally requires solving highly complex optimization problems. The current state-of-the-art algorithms to find approximate solutions to these problems need to access the data set a large number to times and become unfeasible when the data do not fit in memory. In this paper the BIRCH algorithm is adapted to calculate approximate solutions to problems in this class. For data sets that fit in memory, this approach is able to find approximate Least Trimmed Squares (LTS) and Minimum Covariance Determinant (MCD) estimators that compare very well with those returned by the fast-LTS and fast-MCD algorithms, and in some cases is able to find a better solution (in terms of value of the objective function) than those returned by the fast-algorithms. This methodology can also be applied to the Linear Grouping Algorithm and its robust variant for very large datasets. Finally, results from a simulation study indicate that this algorithm performs comparably well to fast-LTS in simple situations (large data sets with a small number of covariates and small proportion of outliers) and does much better than fast-LTS in more challenging situations without requiring extra computational time. These findings seem to confirm that this approach provides the first computationally feasible and reliable approximating algorithm in the literature to compute the LTS and MCD estimators for data sets that do not fit in memory. (C) 2008 Elsevier B.V. All rights reserved.

Finding approximate solutions to combinatorial problems with very large data sets using BIRCH

期刊

COMPUTATIONAL STATISTICS & DATA ANALYSIS

出版社

ELSEVIER SCIENCE BV

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Finding approximate solutions to combinatorial problems with very large data sets using BIRCH

期刊

COMPUTATIONAL STATISTICS & DATA ANALYSIS

出版社

ELSEVIER SCIENCE BV

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文