☆ 3.8 Proceedings Paper

Proteus: agile ML elasticity through tiered reliability in dynamic resource markets

PROCEEDINGS OF THE TWELFTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS 2017) (2017)

Journal

PROCEEDINGS OF THE TWELFTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS 2017)

Volume -, Issue -, Pages 589-604

Publisher

ASSOC COMPUTING MACHINERY

DOI: 10.1145/3064176.3064182

Keywords

Funding

Intel as part of the Intel Science and Technology Center for Visual Cloud Systems (ISTC-VCS)
National Science Foundation [CNS-1042537, CCF-1533858, CNS-1042543]
DARPA Grant [FA87501220324]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Many shared computing clusters allow users to utilize excess idle resources at lower cost or priority, with the proviso that some or all may be taken away at any time. But, exploiting such dynamic resource availability and the often fluctuating markets for them requires agile elasticity and effective acquisition strategies. Proteus aggressively exploits such transient revocable resources to do machine learning (ML) cheaper and/or faster. Its parameter server framework, AgileML, efficiently adapts to bulk additions and revocations of transient machines, through a novel 3-stage active-backup approach, with minimal use of more costly non-transient resources. Its BidBrain component adaptively allocates resources from multiple EC2 spot markets to minimize average cost per work as transient resource availability and cost change over time. Our evaluations show that Proteus reduces cost by 85% relative to non-transient pricing, and by 43% relative to previous approaches, while simultaneously reducing runtimes by up to 37%.

Proteus: agile ML elasticity through tiered reliability in dynamic resource markets

Journal

PROCEEDINGS OF THE TWELFTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS 2017)

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Proteus: agile ML elasticity through tiered reliability in dynamic resource markets

Journal

PROCEEDINGS OF THE TWELFTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS 2017)

Publisher

ASSOC COMPUTING MACHINERY

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper