4.6 Article

Overfitting-avoiding goal-guided exploration for hard-exploration multi-goal reinforcement learning

期刊

NEUROCOMPUTING
卷 525, 期 -, 页码 76-87

出版社

ELSEVIER
DOI: 10.1016/j.neucom.2023.01.016

关键词

Goal-guided exploration; Hard-exploration problem; Multi-goal learning; Overfitting avoidance; Reinforcement learning

向作者/读者索取更多资源

In this paper, an overfitting-avoiding goal-guided exploration method (OGE) is proposed. It generates auxiliary goals following the Wasserstein-distance-based optimal transport geodesic and has a generation region that accounts for agent generalizability. Our method outperforms state-of-the-art methods in hard-exploration multi-goal robotic manipulation tasks, achieving high learning efficiency and successfully guiding the agent to achieve hard goals.
In hard-exploration multi-goal reinforcement learning tasks, the agent faces challenges to achieve a ser-ies of distant goals with sparse rewards. Directly exploring to pursue these hard goals can hardly succeed, because the agent is unable to acquire learning signals applicable to these goals. To progressively enhance agent ability and promote exploration, goal-guided exploration methods generate easier auxiliary goals that gradually approach the original hard goals for the agent to pursue. However, due to the neglect of the growth of agent generalizability, the goal generation region of the previous methods is limited, which causes overfitting and traps the exploration for further goals. In this paper, after modeling the multi-goal RL as a distribution-matching process, we propose an overfitting-avoiding goal-guided exploration method (OGE), where the generation of auxiliary goals follows the Wasserstein-distance-based optimal transport geodesic, and the generation region is in the Lipschitz-constant-delimited generalizability mar -gin. Our OGE is compared with state-of-the-art methods in hard-exploration multi-goal robotic manip-ulation tasks. Apart from showing the highest learning efficiency, in those tasks where all the prior methods meet overfitting and fail, our method can still successfully guide the agent to achieve the hard goals. & COPY; 2023 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据