4.5 Article

Navigation of Mobile Robots Based on Deep Reinforcement Learning: Reward Function Optimization and Knowledge Transfer

出版社

INST CONTROL ROBOTICS & SYSTEMS, KOREAN INST ELECTRICAL ENGINEERS
DOI: 10.1007/s12555-021-0642-7

关键词

Deep reinforcement learning (DRL); knowledge transfer; mobile robot; navigation; reward function

向作者/读者索取更多资源

This paper presents an end-to-end online learning navigation method based on deep reinforcement learning (DRL) for mobile robots in unknown environments. The proposed prioritized experience replay-double dueling deep Q-networks (PER-D3QN) algorithm combines various techniques to achieve efficient navigation. The introduction of artificial potential field in the reward function addresses the issue of sparse reward and guides robots to complete navigation tasks. A knowledge transfer training method is also proposed to accelerate training in complex environments. Performance validation in a three-dimensional simulator demonstrates the feasibility and efficiency of the proposed approaches.
This paper presents an end-to-end online learning navigation method based on deep reinforcement learning (DRL) for mobile robots, whose objective is that mobile robots can avoid obstacles to reach the target point in an unknown environment. Specifically, double deep Q-networks (Double DQN), dueling deep Q-networks (Dueling DQN) and prioritized experience replay (PER) are combined to form prioritized experience replay-double dueling deep Q-networks (PER-D3QN) algorithm to realize high-efficiency navigation of mobile robots. Moreover, considering the problem of sparse reward in the traditional reward function, an artificial potential field is introduced into the reward function to guide robots to fulfill the navigation task through the change of potential energy. Furthermore, in order to accelerate the training of mobile robots in complex environment, a knowledge transfer training method is proposed, which migrates the knowledge from simple to complex environment, and quickly learns on the basis of the priori knowledge. Finally, the performance is validated based on a three-dimensional simulator, which shows that the mobile robot can obtain higher rewards and achieve higher success rates and less time for navigation, indicating that the proposed approaches are feasible and efficient.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据