☆ 4.6 Article

Overfitting-avoiding goal-guided exploration for hard-exploration multi-goal reinforcement learning

NEUROCOMPUTING (2023)

期刊

NEUROCOMPUTING

卷 525, 期 -, 页码 76-87

出版社

ELSEVIER

DOI: 10.1016/j.neucom.2023.01.016

关键词

Goal-guided exploration; Hard-exploration problem; Multi-goal learning; Overfitting avoidance; Reinforcement learning

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, an overfitting-avoiding goal-guided exploration method (OGE) is proposed. It generates auxiliary goals following the Wasserstein-distance-based optimal transport geodesic and has a generation region that accounts for agent generalizability. Our method outperforms state-of-the-art methods in hard-exploration multi-goal robotic manipulation tasks, achieving high learning efficiency and successfully guiding the agent to achieve hard goals.

In hard-exploration multi-goal reinforcement learning tasks, the agent faces challenges to achieve a ser-ies of distant goals with sparse rewards. Directly exploring to pursue these hard goals can hardly succeed, because the agent is unable to acquire learning signals applicable to these goals. To progressively enhance agent ability and promote exploration, goal-guided exploration methods generate easier auxiliary goals that gradually approach the original hard goals for the agent to pursue. However, due to the neglect of the growth of agent generalizability, the goal generation region of the previous methods is limited, which causes overfitting and traps the exploration for further goals. In this paper, after modeling the multi-goal RL as a distribution-matching process, we propose an overfitting-avoiding goal-guided exploration method (OGE), where the generation of auxiliary goals follows the Wasserstein-distance-based optimal transport geodesic, and the generation region is in the Lipschitz-constant-delimited generalizability mar -gin. Our OGE is compared with state-of-the-art methods in hard-exploration multi-goal robotic manip-ulation tasks. Apart from showing the highest learning efficiency, in those tasks where all the prior methods meet overfitting and fail, our method can still successfully guide the agent to achieve the hard goals. & COPY; 2023 Elsevier B.V. All rights reserved.

Overfitting-avoiding goal-guided exploration for hard-exploration multi-goal reinforcement learning

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Overfitting-avoiding goal-guided exploration for hard-exploration multi-goal reinforcement learning

期刊

NEUROCOMPUTING

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文