4.5 Article

Multi-agent reinforcement learning for network routing in integrated access backhaul networks

Journal

AD HOC NETWORKS
Volume 153, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.adhoc.2023.103347

Keywords

Multi-agent reinforcement learning; Integrated access backhaul; Network routing

Ask authors/readers for more resources

This study examines the problem of downlink wireless routing in integrated access backhaul (IAB) networks and proposes a multi-agent reinforcement learning algorithm for joint routing optimization. Experimental results demonstrate the effectiveness of the algorithm in achieving near-centralized performance.
In this study, we examine the problem of downlink wireless routing in integrated access backhaul (IAB) networks involving fiber-connected base stations, wireless base stations, and multiple users. Physical constraints prevent the use of a central controller, leaving base stations with limited access to real-time network conditions. These networks operate in a time-slotted regime, where base stations monitor network conditions and forward packets accordingly. Our objective is to maximize the arrival ratio of packets, while simultaneously minimizing their latency. To accomplish this, we formulate this problem as a multi-agent partially observed Markov Decision Process (POMDP). Moreover, we develop an algorithm that uses Multi-Agent Reinforcement Learning (MARL) combined with Advantage Actor Critic (A2C) to derive a joint routing policy on a distributed basis. Due to the importance of packet destinations for successful routing decisions, we utilize information about similar destinations as a basis for selecting specific-destination routing decisions. For portraying the similarity between those destinations, we rely on their relational base-station associations, i.e., which base station they are currently connected to. Therefore, the algorithm is referred to as Relational Advantage Actor Critic (Relational A2C). To the best of our knowledge, this is the first work that optimizes routing strategy for IAB networks. Further, we present three types of training paradigms for this algorithm in order to provide flexibility in terms of its performance and throughput. Through numerical experiments with different network scenarios, Relational A2C algorithms were demonstrated to be capable of achieving near-centralized performance even though they operate in a decentralized manner in the network of interest. Based on the results of those experiments, we compare Relational A2C to other reinforcement learning algorithms, like Q-Routing and Hybrid Routing. This comparison illustrates that solving the joint optimization problem increases network efficiency and reduces selfish agent behavior.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available