4.7 Article

A Reinforcement Learning Framework for Vehicular Network Routing Under Peak and Average Constraints

Journal

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY
Volume 72, Issue 5, Pages 6753-6764

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TVT.2023.3235946

Keywords

Routing; Clustering algorithms; Optimization; Vehicle-to-infrastructure; Vehicular ad hoc networks; Quality of service; Base stations; Constrained markov decision process; peak and average latency constraints; vehicular network routing

Ask authors/readers for more resources

This paper proposes a holistic framework for reinforcement learning-based vehicular network routing, which satisfies both peak and average constraints. The routing problem is modeled as a Constrained Markov Decision Process and is solved by an extended Q-learning algorithm based on Constraint Satisfaction Problems. The framework is further decentralized using a cluster-based learning structure. Simulation results show that the proposed algorithm achieves significant improvement in average transmission rate and does not violate any constraints.
Providing provable performance guarantees in vehicular network routing problems is crucial to ensure safely and timely delivery of information in an environment characterized by high mobility, dynamic network conditions, and frequent topology changes. While Reinforcement Learning (RL) has shown great promise in network routing, existing RL-based solutions typically support decision-making with either peak constraints or average constraints, but not both. For network routing in intelligent transportation, such as advanced vehicle control and safety, both peak constraints (e.g., maximum latency or minimum bandwidth guarantees) and average constraints (e.g., average transmit power or data rate constraints) must be satisfied. In this paper, we propose a holistic framework for RL-based vehicular network routing, which maximizes routing decisions under both average and peak constraints. The routing problem is modeled as a Constrained Markov Decision Process and recast into an optimization based on Constraint Satisfaction Problems (CSPs). We prove that the optimal policy of a given CSP can be learned by an extended Q-learning algorithm while satisfying both peak and average latency constraints. To improve the scalability of our framework, we further turn it into a decentralized implementation through a cluster-based learning structure. Applying the proposed RL algorithm to vehicular network routing problems under both peak and average latency constraints, simulation results show that our algorithm achieves much higher rewards than heuristic baselines with over 40% improvement in average transmission rate, while resulting in zero violation in terms of both peak and average constraints.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available