☆ 4.7 Article

Double Broad Reinforcement Learning Based on Hindsight Experience Replay for Collision Avoidance of Unmanned Surface Vehicles

JOURNAL OF MARINE SCIENCE AND ENGINEERING (2022)

期刊

JOURNAL OF MARINE SCIENCE AND ENGINEERING

卷 10, 期 12, 页码 -

出版社

MDPI

DOI: 10.3390/jmse10122026

关键词

collision avoidance; broad reinforcement learning; reinforcement learning

类别

Engineering, Marine Engineering, Ocean Oceanography

资金

Guangdong Basic and Applied Basic Research Foundation
National Key Research and Development Program of China
Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization
[2021A1515011999]
[2018YFC2002500]
[2021B1212040007]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article introduces a double broad reinforcement learning algorithm based on hindsight experience replay for improving the efficiency and accuracy of collision avoidance decision-making in unmanned surface vehicles. By decoupling target action selection and target Q value calculation, and adopting hindsight experience replay, the algorithm achieved a 31.9 percentage points higher success rate compared to DQN and a 24.4 percentage points higher success rate compared to BRL.

Although broad reinforcement learning (BRL) provides a more intelligent autonomous decision-making method for the collision avoidance problem of unmanned surface vehicles (USVs), the algorithm still has the problem of over-estimation and has difficulty converging quickly due to the sparse reward problem in a large area of sea. To overcome the dilemma, we propose a double broad reinforcement learning based on hindsight experience replay (DBRL-HER) for the collision avoidance system of USVs to improve the efficiency and accuracy of decision-making. The algorithm decouples the two steps of target action selection and target Q value calculation to form the double broad reinforcement learning method and then adopts hindsight experience replay to allow the agent to learn from the experience of failure in order to greatly improve the sample utilization efficiency. Through training in a grid environment, the collision avoidance success rate of the proposed algorithm was found to be 31.9 percentage points higher than that in the deep Q network (DQN) and 24.4 percentage points higher than that in BRL. A Unity 3D simulation platform with high fidelity was also designed to simulate the movement of USVs. An experiment on the platform fully verified the effectiveness of the proposed algorithm.

Double Broad Reinforcement Learning Based on Hindsight Experience Replay for Collision Avoidance of Unmanned Surface Vehicles

期刊

JOURNAL OF MARINE SCIENCE AND ENGINEERING

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Double Broad Reinforcement Learning Based on Hindsight Experience Replay for Collision Avoidance of Unmanned Surface Vehicles

期刊

JOURNAL OF MARINE SCIENCE AND ENGINEERING

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文