4.7 Article

A Reinforcement Learning Framework for Vehicular Network Routing Under Peak and Average Constraints

期刊

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY
卷 72, 期 5, 页码 6753-6764

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TVT.2023.3235946

关键词

Routing; Clustering algorithms; Optimization; Vehicle-to-infrastructure; Vehicular ad hoc networks; Quality of service; Base stations; Constrained markov decision process; peak and average latency constraints; vehicular network routing

向作者/读者索取更多资源

This paper proposes a holistic framework for reinforcement learning-based vehicular network routing, which satisfies both peak and average constraints. The routing problem is modeled as a Constrained Markov Decision Process and is solved by an extended Q-learning algorithm based on Constraint Satisfaction Problems. The framework is further decentralized using a cluster-based learning structure. Simulation results show that the proposed algorithm achieves significant improvement in average transmission rate and does not violate any constraints.
Providing provable performance guarantees in vehicular network routing problems is crucial to ensure safely and timely delivery of information in an environment characterized by high mobility, dynamic network conditions, and frequent topology changes. While Reinforcement Learning (RL) has shown great promise in network routing, existing RL-based solutions typically support decision-making with either peak constraints or average constraints, but not both. For network routing in intelligent transportation, such as advanced vehicle control and safety, both peak constraints (e.g., maximum latency or minimum bandwidth guarantees) and average constraints (e.g., average transmit power or data rate constraints) must be satisfied. In this paper, we propose a holistic framework for RL-based vehicular network routing, which maximizes routing decisions under both average and peak constraints. The routing problem is modeled as a Constrained Markov Decision Process and recast into an optimization based on Constraint Satisfaction Problems (CSPs). We prove that the optimal policy of a given CSP can be learned by an extended Q-learning algorithm while satisfying both peak and average latency constraints. To improve the scalability of our framework, we further turn it into a decentralized implementation through a cluster-based learning structure. Applying the proposed RL algorithm to vehicular network routing problems under both peak and average latency constraints, simulation results show that our algorithm achieves much higher rewards than heuristic baselines with over 40% improvement in average transmission rate, while resulting in zero violation in terms of both peak and average constraints.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据