☆ 4.6 Article

The Control Method of Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism to Multi-DOF Manipulator

ELECTRONICS (2021)

Journal

ELECTRONICS

Volume 10, Issue 7, Pages -

Publisher

MDPI

DOI: 10.3390/electronics10070870

Keywords

deep reinforcement learning; manipulator; reward function; rebirth mechanism

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper explores the application of deep reinforcement learning in the learning of motion ability of manipulators and introduces methods to suppress the overestimation bias of values, improving the learning ability of manipulators. By optimizing the reward function and experience replay, the learning efficiency of manipulators is enhanced.

As a research hotspot in the field of artificial intelligence, the application of deep reinforcement learning to the learning of the motion ability of a manipulator can help to improve the learning of the motion ability of a manipulator without a kinematic model. To suppress the overestimation bias of values in Deep Deterministic Policy Gradient (DDPG) networks, the Twin Delayed Deep Deterministic Policy Gradient (TD3) was proposed. This paper further suppresses the overestimation bias of values for multi-degree of freedom (DOF) manipulator learning based on deep reinforcement learning. Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism (RTD3) was proposed. The experimental results show that RTD3 applied to multi degree freedom manipulators is in place, with an improved learning ability by 29.15% on the basis of TD3. In this paper, a step-by-step reward function is proposed specifically for the learning and innovation of the multi degree of freedom manipulator's motion ability. The view of continuous decision-making and process problem is used to guide the learning of the manipulator, and the learning efficiency is improved by optimizing the playback of experience. In order to measure the point-to-point position motion ability of a manipulator, a new evaluation index based on the characteristics of the continuous decision process problem, energy efficiency distance, is presented in this paper, which can evaluate the learning quality of the manipulator motion ability by a more comprehensive and fair evaluation algorithm.

The Control Method of Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism to Multi-DOF Manipulator

Journal

ELECTRONICS

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

The Control Method of Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism to Multi-DOF Manipulator

Journal

ELECTRONICS

Publisher

MDPI

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper