4.6 Article

Overfitting-avoiding goal-guided exploration for hard-exploration multi-goal reinforcement learning

Journal

NEUROCOMPUTING
Volume 525, Issue -, Pages 76-87

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2023.01.016

Keywords

Goal-guided exploration; Hard-exploration problem; Multi-goal learning; Overfitting avoidance; Reinforcement learning

Ask authors/readers for more resources

In this paper, an overfitting-avoiding goal-guided exploration method (OGE) is proposed. It generates auxiliary goals following the Wasserstein-distance-based optimal transport geodesic and has a generation region that accounts for agent generalizability. Our method outperforms state-of-the-art methods in hard-exploration multi-goal robotic manipulation tasks, achieving high learning efficiency and successfully guiding the agent to achieve hard goals.
In hard-exploration multi-goal reinforcement learning tasks, the agent faces challenges to achieve a ser-ies of distant goals with sparse rewards. Directly exploring to pursue these hard goals can hardly succeed, because the agent is unable to acquire learning signals applicable to these goals. To progressively enhance agent ability and promote exploration, goal-guided exploration methods generate easier auxiliary goals that gradually approach the original hard goals for the agent to pursue. However, due to the neglect of the growth of agent generalizability, the goal generation region of the previous methods is limited, which causes overfitting and traps the exploration for further goals. In this paper, after modeling the multi-goal RL as a distribution-matching process, we propose an overfitting-avoiding goal-guided exploration method (OGE), where the generation of auxiliary goals follows the Wasserstein-distance-based optimal transport geodesic, and the generation region is in the Lipschitz-constant-delimited generalizability mar -gin. Our OGE is compared with state-of-the-art methods in hard-exploration multi-goal robotic manip-ulation tasks. Apart from showing the highest learning efficiency, in those tasks where all the prior methods meet overfitting and fail, our method can still successfully guide the agent to achieve the hard goals. & COPY; 2023 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available