4.7 Article

Three-Dimension Trajectory Design for Multi-UAV Wireless Network With Deep Reinforcement Learning

期刊

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY
卷 70, 期 1, 页码 600-612

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TVT.2020.3047800

关键词

Trajectory; Wireless communication; Three-dimensional displays; Propagation losses; Interference; Heuristic algorithms; Downlink; Capacity; constrained markov decision process (CMDP); deep reinforcement learning (DRL); trajectory design; unmanned aerial vehicles (UAVs)

资金

  1. Beijing Municipal Natural Science Foundation-Haidian [L182037]
  2. Beijing Municipal Science and Technology [Z181100003218015]
  3. Beijing Natural Science Foundation [L192032]

向作者/读者索取更多资源

The study investigates the effective trajectory design of multiple UAVs to enhance communication system capacity, utilizing a Deep Q Network algorithm to maximize real-time downlink capacity under coverage constraints while ensuring all ground terminals are covered.
The effective trajectory design of multiple unmanned aerial vehicles (UAVs) is investigated for improving the capacity of the communication system. The aim is for maximizing real-time downlink capacity under the coverage constraint by reaping the mobility benefits of UAVs. The problem of three-dimension (3D) dynamic movement of UAVs under coverage constraint is formulated as a Constrained Markov Decision Process (CMDP) problem, while a constrained Deep Q-Network (cDQN) algorithm is proposed for solving the formulated problem. In the proposed cDQN model, each UAV acts as an agent to explore and learn its 3D deploying policy. The aim of the proposed cDQN model is for obtaining the maximum capacity while attempting to guarantee that all ground terminals (GTs) are covered. In order to satisfy the coverage constraint, a primal-dual method is adopted for training primal variable and dual variable (lagrangian multiplier) in turn. Furthermore, in an effort to reduce the action space of the cDQN algorithm, prior information is utilized for eliminating the invalid actions by the action filter. Experiment results demonstrate that the cDQN algorithm is capable of converging after some training steps. Additionally, the UAVs are capable of adapting the movement of GTs under the coverage constraint according to the 3D deploying policy derived from the proposed cDQN algorithm.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据