4.8 Article

Resource Allocation and Trajectory Design in UAV-Aided Cellular Networks Based on Multiagent Reinforcement Learning

期刊

IEEE INTERNET OF THINGS JOURNAL
卷 9, 期 4, 页码 2933-2943

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JIOT.2021.3094651

关键词

Distributed reinforcement learning (RL); multiagent reinforcement learning; resource allocation; trajectory design; unmanned aerial vehicle (UAV)-aided wireless communications

资金

  1. National Key Research and Development Program of China [2020YFB1807600]

向作者/读者索取更多资源

This article focuses on a downlink cellular network where multiple unmanned aerial vehicles (UAVs) act as aerial base stations for ground users. The researchers propose a multiagent reinforcement learning approach to optimize resource allocation and trajectory design in a decentralized manner. Simulation results demonstrate the efficiency and effectiveness of the proposed methods in achieving overall throughput and fairness.
In this article, we focus on a downlink cellular network, where multiple unmanned aerial vehicles (UAVs) serve as aerial base stations for ground users through frequency-division multiple access (FDMA). With user locations and channel parameters inaccessible, the UAVs coordinate to make a decision on resource allocation and trajectory design in a decentralized way. Aiming at optimizing both overall and fairness throughput, we model resource allocation and trajectory design as a decentralized partially observable Markov decision process (Dec-POMDP) and propose multiagent reinforcement learning (RL) as a solution. Specifically, we use parameterized deep Q-network (P-DQN) for the action space comprising both discrete and continuous actions and the QMIX framework is leveraged to aggregate each UAV's local critics. For fairness throughput optimization, we introduce an entropy-like fairness indicator to the reward to make the total return decomposable. In addition, we further propose a novel distributed learning framework for overall throughput optimization such that each UAV can contribute its local gradient, and model training can be implemented in parallel without need of observation data sharing among the UAVs. Simulation results show that the proposed multiagent RL approach as well as the distributed learning framework are efficient in model training and present acceptable performance close to that achieved by deterministic optimization, which relies on convention optimization techniques with user locations and channel parameters explicitly known beforehand. For fairness throughput optimization, we also show that ground users achieve individual throughputs close to each other, which verifies the effectiveness of the proposed fairness indicator as the reward definition in the RL framework.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据