4.6 Article

The Control Method of Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism to Multi-DOF Manipulator

Journal

ELECTRONICS
Volume 10, Issue 7, Pages -

Publisher

MDPI
DOI: 10.3390/electronics10070870

Keywords

deep reinforcement learning; manipulator; reward function; rebirth mechanism

Ask authors/readers for more resources

This paper explores the application of deep reinforcement learning in the learning of motion ability of manipulators and introduces methods to suppress the overestimation bias of values, improving the learning ability of manipulators. By optimizing the reward function and experience replay, the learning efficiency of manipulators is enhanced.
As a research hotspot in the field of artificial intelligence, the application of deep reinforcement learning to the learning of the motion ability of a manipulator can help to improve the learning of the motion ability of a manipulator without a kinematic model. To suppress the overestimation bias of values in Deep Deterministic Policy Gradient (DDPG) networks, the Twin Delayed Deep Deterministic Policy Gradient (TD3) was proposed. This paper further suppresses the overestimation bias of values for multi-degree of freedom (DOF) manipulator learning based on deep reinforcement learning. Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism (RTD3) was proposed. The experimental results show that RTD3 applied to multi degree freedom manipulators is in place, with an improved learning ability by 29.15% on the basis of TD3. In this paper, a step-by-step reward function is proposed specifically for the learning and innovation of the multi degree of freedom manipulator's motion ability. The view of continuous decision-making and process problem is used to guide the learning of the manipulator, and the learning efficiency is improved by optimizing the playback of experience. In order to measure the point-to-point position motion ability of a manipulator, a new evaluation index based on the characteristics of the continuous decision process problem, energy efficiency distance, is presented in this paper, which can evaluate the learning quality of the manipulator motion ability by a more comprehensive and fair evaluation algorithm.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available