☆ 4.8 Article

Fast and Accurate Least-Mean-Squares Solvers for High Dimensional Data

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2022)

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

卷 44, 期 12, 页码 9977-9994

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPAMI.2021.3139612

关键词

Regression; Least Mean Squares Solvers; Coresets; Sketches; Caratheodory's Theorem; Big Data

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article introduces a new algorithm that can compute subsets with positive weights in a short amount of time, which is crucial for solving machine learning problems and matrix factorizations. The algorithm combines data summarization techniques like sketches and coresets, providing performance boost for existing LMS solvers.

Least-mean-squares (LMS) solvers such as Linear / Ridge-Regression and SVD not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as matrix factorizations. We suggest an algorithm that gets a finite set of n d-dimensional real vectors and returns a subset of d + 1 vectors with positive weights whose weighted sum is exactly the same. The constructive proof in Caratheodory's Theorem computes such a subset in O(n(2)d(2)) time and thus not used in practice. Our algorithm computes this subset in O(nd + d(4)log n) time, using O(log n) calls to Caratheodory's construction on small but smart subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets. For large values of d, we suggest a faster construction that takes O(nd) time and returns a weighted subset of O(d) sparsified input points. Here, a sparsified point means that some of its entries were set to zero. As an application, we show how to boost the performance of existing LMS solvers, such as those in scikit-learn library, up to x100. Generalization for streaming and distributed data is trivial. Extensive experimental results and open source code are provided.

Fast and Accurate Least-Mean-Squares Solvers for High Dimensional Data

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Fast and Accurate Least-Mean-Squares Solvers for High Dimensional Data

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文