☆ 4.6 Article

A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking

MACHINES (2022)

Journal

MACHINES

Volume 10, Issue 7, Pages -

Publisher

MDPI

DOI: 10.3390/machines10070496

Keywords

trajectory tracking; deep reinforcement learning; deep deterministic policy gradient algorithm; state compensation network

Funding

Guizhou Provincial Science and Technology Projects [Guizhou-Sci-Co-Supp[2020]2Y044]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a control algorithm for UAV trajectory tracking, which achieves efficient training and stable convergence in unknown environments by establishing an MDP model and introducing a compensation network. Simulation results show that the algorithm significantly improves training efficiency and accuracy, achieving lower tracking error in tracking experiments.

The unmanned aerial vehicle (UAV) trajectory tracking control algorithm based on deep reinforcement learning is generally inefficient for training in an unknown environment, and the convergence is unstable. Aiming at this situation, a Markov decision process (MDP) model for UAV trajectory tracking is established, and a state-compensated deep deterministic policy gradient (CDDPG) algorithm is proposed. An additional neural network (C-Net) whose input is compensation state and output is compensation action is added to the network model of a deep deterministic policy gradient (DDPG) algorithm to assist in network exploration training. It combined the action output of the DDPG network with compensated output of the C-Net as the output action to interact with the environment, enabling the UAV to rapidly track dynamic targets in the most accurate continuous and smooth way possible. In addition, random noise is added on the basis of the generated behavior to realize a certain range of exploration and make the action value estimation more accurate. The OpenAI Gym tool is used to verify the proposed method, and the simulation results show that: (1) The proposed method can significantly improve the training efficiency by adding a compensation network and effectively improve the accuracy and convergence stability; (2) Under the same computer configuration, the computational cost of the proposed algorithm is basically the same as that of the QAC algorithm (Actor-critic algorithm based on behavioral value Q) and the DDPG algorithm; (3) During the training process, with the same tracking accuracy, the learning efficiency is about 70% higher than that of QAC and DDPG; (4) During the simulation tracking experiment, under the same training time, the tracking error of the proposed method after stabilization is about 50% lower than that of QAC and DDPG.

A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking

Journal

MACHINES

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking

Journal

MACHINES

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper