☆ 4.7 Article Proceedings Paper

Policy gradient in Lipschitz Markov Decision Processes

MACHINE LEARNING (2015)

Journal

MACHINE LEARNING

Volume 100, Issue 2-3, Pages 255-283

Publisher

SPRINGER

DOI: 10.1007/s10994-015-5484-1

Keywords

Reinforcement learning; Markov Decision Process; Lipschitz continuity; Policy gradient algorithm

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processes to safely speed up policy-gradient algorithms. Starting from assumptions about the Lipschitz continuity of the state-transition model, the reward function, and the policies considered in the learning process, we show that both the expected return of a policy and its gradient are Lipschitz continuous w.r.t. policy parameters. By leveraging such properties, we define policy-parameter updates that guarantee a performance improvement at each iteration. The proposed methods are empirically evaluated and compared to other related approaches using different configurations of three popular control scenarios: the linear quadratic regulator, the mass-spring-damper system and the ship-steering control.

Policy gradient in Lipschitz Markov Decision Processes

Journal

MACHINE LEARNING

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Policy gradient in Lipschitz Markov Decision Processes

Journal

MACHINE LEARNING

Publisher

SPRINGER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper