4.7 Article

Bandwidth Allocation and Trajectory Control in UAV-Assisted IoV Edge Computing Using Multiagent Reinforcement Learning

期刊

IEEE TRANSACTIONS ON RELIABILITY
卷 72, 期 2, 页码 599-608

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TR.2022.3192020

关键词

Attention mechanism; bandwidth assignment; location deployment; multiagent deep reinforcement learning (DRL); value decomposition network (VDN)

向作者/读者索取更多资源

This article investigates the scenario where multiple UAVs serve as edge computing devices for the Internet of Vehicles (IoV). By optimizing bandwidth allocation and trajectory control, the communication capacity of the system is maximized so that the UAV edge computing network can process more data. The proposed actor-critic mixing network (AC-Mix) and multi-attentive agent deep deterministic policy gradient (MA2DDPG) algorithms improve the performance compared to the benchmark algorithm MADDPG.
The rapid development of an unmanned aerial vehicle (UAV) has brought new opportunities for wireless communication and edge computing. In this article, we investigate the scenario where multiple UAVs serve as edge computing devices for the Internet of Vehicles (IoV). Regardless of the allocation of computing resources, we focus on bandwidth allocation and trajectory control to maximize the communication capacity of the system so that the UAV edge computing network can process more data. With this intent, a UAV-assisted IoV edge computing system model is constructed as a nonconvex optimization problem, aiming to maximize the achievable channel capacity of the network. To solve this problem, two quasi-distributed multiagent algorithms, i.e., actor-critic mixing network (AC-Mix) and multi-attentive agent deep deterministic policy gradient (MA2DDPG), are proposed based on deep deterministic policy gradient. Specifically, AC-Mix utilizes a mixing network to obtain a global Q-value for better evaluation of joint action, while MA2DDPG employs a multihead attention mechanism to achieve multiagent collaboration. Using multi-agents deep deterministic policy gradient (MADDPG) as benchmark, several experiments are carried out to verify the performance of the proposed algorithms. Simulation results show that the convergence velocity of AC-Mix and MA2DDPG is improved by 30.0% and 63.3%, respectively, compared with MADDPG.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据