4.7 Article

5G Multi-Slices Bi-Level Resource Allocation by Reinforcement Learning

Journal

MATHEMATICS
Volume 11, Issue 3, Pages -

Publisher

MDPI
DOI: 10.3390/math11030760

Keywords

bi-level optimization; multi-slice; resource allocation; reinforcement learning

Categories

Ask authors/readers for more resources

In this paper, a bi-level resource allocation model is proposed for the vertical industry resource allocation problem in the 5G network. The model aims to optimize the profit of the 5G operator and achieve fair resource allocation for users. The multi-agent twin delayed deep deterministic policy gradient (MATD3) algorithm is used for upper slice resource allocation, while the discrete and continuous twin delayed deep deterministic policy gradient (DCTD3) algorithm is used for lower user resource allocation.
As the centralized unit (CU)-distributed unit (DU) separation in the fifth generation mobile network (5G), the multi-slice and multi-scenario, can be better applied in wireless communication. The development of the 5G network to vertical industries makes its resource allocation also have an obvious hierarchical structure. In this paper, we propose a bi-level resource allocation model. The up-level objective in this model refers to the profit of the 5G operator through the base station allocating resources to slices. The lower-level objective in this model refers to the slices allocating the resource to its users fairly. The resource allocation problem is a complex optimization problem with mixed-discrete variables, so whether a resource allocation algorithm can quickly and accurately give the resource allocation scheme is the key to its practical application. According to the characteristics of the problem, we select the multi-agent twin delayed deep deterministic policy gradient (MATD3) to solve the upper slice resource allocation and the discrete and continuous twin delayed deep deterministic policy gradient (DCTD3) to solve the lower user resource allocation. It is crucial to accurately characterize the state, environment, and reward of reinforcement learning for solving practical problems. Thus, we provide an effective definition of the environment, state, action, and reward of MATD3 and DCTD3 for solving the bi-level resource allocation problem. We conduct some simulation experiments and compare it with the multi-agent deep deterministic policy gradient (MADDPG) algorithm and nested bi-level evolutionary algorithm (NBLEA). The experimental results show that the proposed algorithm can quickly provide a better resource allocation scheme.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available