☆ 4.4 Article

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce

PROCEEDINGS OF THE VLDB ENDOWMENT (2009)

Journal

PROCEEDINGS OF THE VLDB ENDOWMENT

Volume 2, Issue 2, Pages 1426-1437

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.14778/1687553.1687569

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Classification and regression tree learning on massive datasets is a common data mining task at Google, yet many state of the art tree learning algorithms require training data to reside in memory on a single machine. While more scalable implementations of tree learning have been proposed, they typically require specialized parallel computing architectures. In contrast, the majority of Googles computing infrastructure is based on commodity hardware. In this paper, we describe PLANET: a scalable distributed framework for learning tree models over large datasets. PLANET defines tree learning as a series of distributed computations, and implements each one using the MapReduce model of distributed computation. We show how this framework supports scalable construction of classification and regression trees, as well as ensembles of such models. We discuss the benefits and challenges of using a MapReduce compute cluster for tree learning, and demonstrate the scalability of this approach by applying it to a real world learning task from the domain of computational advertising.

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce

Journal

PROCEEDINGS OF THE VLDB ENDOWMENT

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce

Journal

PROCEEDINGS OF THE VLDB ENDOWMENT

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper