4.5 Article

Fast and fully-automated histograms for large-scale data sets

期刊

出版社

ELSEVIER
DOI: 10.1016/j.csda.2022.107668

关键词

Density estimation; Histograms; Model selection; Minimum description length

向作者/读者索取更多资源

G-Enum histograms are a fast and fully automated method for constructing irregular histograms. By using the Minimum Description Length principle (MDL), this method derives two model selection criteria and achieves linearithmic time complexity. The effectiveness of the method is demonstrated through comparisons with other automated methods on synthetic and real-world data sets.
G-Enum histograms are a new fast and fully automated method for irregular histogram construction. By framing histogram construction as a density estimation problem and its automation as a model selection task, these histograms leverage the Minimum Description Length principle (MDL) to derive two different model selection criteria. Several proven theoretical results about these criteria give insights about their asymptotic behaviour and are used to speed up their optimisation. These insights, combined to a greedy search heuristic, are used to construct histograms in linearithmic time rather than the polynomial time incurred by previous works. The capabilities of the proposed MDL density estimation method are illustrated with reference to other fully automated methods in the literature, both on synthetic and large real-world data sets.(c) 2022 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据