4.6 Article

Hierarchical automatic curriculum learning: Converting a sparse reward navigation task into dense reward

Journal

NEUROCOMPUTING
Volume 360, Issue -, Pages 265-278

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2019.06.024

Keywords

Hierarchical reinforcement learning; Automatic curriculum learning; Sparse reward reinforcement learning; Sample-efficient reinforcement learning

Funding

  1. NSFC [61876095, 61751308]
  2. Beijing Natural Science Foundation [L172037]

Ask authors/readers for more resources

Mastering the sparse reward or long-horizon task is critical but challenging in reinforcement learning. To tackle this problem, we propose a hierarchical automatic curriculum learning framework (HACL), which intrinsically motivates the agent to hierarchically and progressively explore environments. The agent is equipped with a target area during training. As the target area progressively grows, the agent learns to explore from near to far, in a curriculum fashion. The pseudo target-achieving reward converts the sparse reward into dense reward, thus the long-horizon difficulty is alleviated. The whole system makes hierarchical decisions, in which a high-level conductor travels through different targets, and a low-level executor operates in the original action space to complete the instructions given by the high-level conductor. Unlike many existing works that manually set curriculum training phases, in HACL, the total curriculum training process is automated and suits the agent's current exploration capability. Extensive experiments on three sparse reward tasks, long-horizon stochastic chain, grid maze, and the challenging Atari game Montezuma's Revenge, show that HACL achieves comparable or even better performance but with significantly less training frames. (C) 2019 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available