期刊
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
卷 -, 期 -, 页码 3723-3730出版社
IEEE
关键词
HDP; Q-learning; Sarsa; Dyna; mobile robotics; maze navigation/path planning
This paper presents a direct heuristic dynamic programming (HDP) based on Dyna planning (Dyna_HDP) for online model learning in a Markov decision process. This novel technique is composed of HDP policy learning to construct the Dyna agent for speeding up the learning time. We evaluate Dyna_HDP on a differential -drive wheeled mobile robot navigation problem in a 2D maze. The simulation is introduced to compare Dyna HDP with other traditional reinforcement learning algorithms, namely one step Q learning, Sarsa (A), and Dyna_Q, under the same benchmark conditions. We demonstrate that Dyna_HDP has a faster near -optimal path than other algorithms, with high stability. In addition, we also confirm that the Dyna_HDP method can be applied in a multi -robot path planning problem. The virtual common environment model is learned from sharing the robots' experiences which significantly reduces the learning time.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据