☆ 4.6 Article

Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy

MECHATRONICS (2014)

Journal

MECHATRONICS

Volume 24, Issue 8, Pages 966-974

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.mechatronics.2014.05.007

Keywords

Reinforcement learning; Process model; Robotics; Local linear regression; Least squares temporal difference

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Reinforcement learning (RL) is a framework that enables a controller to find an optimal control policy for a task in an unknown environment. Although RL has been successfully used to solve optimal control problems, learning is generally slow. The main causes are the inefficient use of information collected during interaction with the system and the inability to use prior knowledge on the system or the control task. In addition, the learning speed heavily depends on the learning rate parameter, which is difficult to tune. In this paper, we present a sample-efficient, learning-rate-free version of the Value-Gradient Based Policy (VGBP) algorithm. The main difference between VGBP and other frequently used algorithms, such as Sarsa, is that in VGBP the learning agent has a direct access to the reward function, rather than just the immediate reward values. Furthermore, the agent learns a process model. This enables the algorithm to select control actions by optimizing over the right-hand side of the Bellman equation. We demonstrate the fast learning convergence in simulations and experiments with the underactuated pendulum swing-up task. In addition, we present experimental results for a more complex 2-DOF robotic manipulator. (C) 2014 Elsevier Ltd. All rights reserved.

Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy

Journal

MECHATRONICS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy

Journal

MECHATRONICS

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper