4.6 Article

Planning-Augmented Hierarchical Reinforcement Learning

Journal

IEEE ROBOTICS AND AUTOMATION LETTERS
Volume 6, Issue 3, Pages 5097-5104

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LRA.2021.3071062

Keywords

Machine learning for robot control; Motion and path planning; Reinforcement learning

Categories

Funding

  1. Wallenberg AI, Autonomous Systems, and Software Program (WASP) - Knut, and AliceWallenberg Foundation

Ask authors/readers for more resources

This study introduces a novel algorithm called PAHRL, which combines planning algorithms and reinforcement learning to address problems with implicitly defined goals by dividing tasks into shorter MDPs. During testing, a planner determines useful subgoals on the state graph constructed at the bottom level, showcasing the effectiveness of this approach in solving long-horizon decision-making problems.
Planning algorithms are powerful at solving long-horizon decision-making problems but require that environment dynamics are known. Model-free reinforcement learning has recently been merged with graph-based planning to increase the robustness of trained policies in state-space navigation problems. Recent ideas suggest to use planning in order to provide intermediate waypoints guiding the policy in long-horizon tasks. Yet, it is not always practical to describe a problem in the setting of state-to-state navigation. Often, the goal is defined by one or multiple disjoint sets of valid states or implicitly using an abstract task description. Building upon previous efforts, we introduce a novel algorithm called Planning-Augmented Hierarchical Reinforcement Learning (PAHRL) which translates the concept of hybrid planning/RL to such problems with implicitly defined goal. Using a hierarchical framework, we divide the original task, formulated as a Markov Decision Process (MDP), into a hierarchy of shorter horizon MDPs. Actor-critic agents are trained in parallel for each level of the hierarchy. During testing, a planner then determines useful subgoals on a state graph constructed at the bottom level of the hierarchy. The effectiveness of our approach is demonstrated for a set of continuous control problems in simulation including robot arm reaching tasks and the manipulation of a deformable object.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available