☆ 4.5 Article

Building efficient fuzzy regression trees for large scale and high dimensional problems

JOURNAL OF BIG DATA (2018)

期刊

JOURNAL OF BIG DATA

卷 5, 期 1, 页码 -

出版社

SPRINGERNATURE

DOI: 10.1186/s40537-018-0159-y

关键词

Fuzzy regression trees; Big Data; Fuzzy discretizer; Apache Spark

类别

Computer Science, Theory & Methods

资金

project PRA 2017 IoT e Big Data: metodologie e tecnologie per la raccolta e l'elaborazione di grosse moli di dati - University of Pisa
Spanish Research Agency (AEI/MINECO)
FEDER (UE) [TIN2013-46638-C3-3-P, TIN2016-77902-C3-1-P, SBPLY/17/180501/000493]
Junta de Comunidades de Castilla-La Mancha
MICINN [FPU12/05102]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Regression trees (RTs) are simple, but powerful models, which have been widely used in the last decades in different scopes. Fuzzy RTs (FRTs) add fuzziness to RTs with the aim of dealing with uncertain environments. Most of the FRT learning approaches proposed in the literature aim to improve the accuracy, measured in terms of mean squared error, and often neglect to consider the computation time and/or the memory requirements. In today's application domains, which require the management of huge amounts of data, this carelessness can strongly limit their use. In this paper, we propose a distributed FRT (DFRT) learning scheme for generating binary RTs from big datasets, that is based on the MapReduce paradigm. We have designed and implemented the scheme on the Apache Spark framework. We have used eight real-world and four synthetic datasets for evaluating its performance, in terms of mean squared error, computation time and scalability. As a baseline, we have compared the results with the distributed RT (DRT) and the Distributed Random Forest (DRF) available in the Spark MLlib library. Results show that our DFRT scales similarly to DRT and better than DRF. Regarding the performance, DFRT generalizes much better than DRT and similarly to DRF.

Building efficient fuzzy regression trees for large scale and high dimensional problems

期刊

JOURNAL OF BIG DATA

出版社

SPRINGERNATURE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Building efficient fuzzy regression trees for large scale and high dimensional problems

期刊

JOURNAL OF BIG DATA

出版社

SPRINGERNATURE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文