4.7 Article

Model-free perimeter metering control for two-region urban networks using deep reinforcement learning

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.trc.2020.102949

关键词

Macroscopic Fundamental Diagram (MFD); Model free deep reinforcement learning (Deep-RL); Perimeter control

资金

  1. NSF [CMMI-1749200]
  2. Penn State Institute of CyberScience

向作者/读者索取更多资源

This paper proposes a model free deep reinforcement learning perimeter control scheme for two-region urban networks, where agents can choose actions without needing any information about the environment dynamics. Results from extensive numerical experiments show that the proposed scheme can consistently learn perimeter control strategies under various environment configurations, perform comparably to state-of-the-art model predictive control (MPC), and exhibit high transferability to a wide range of traffic conditions and dynamics in the environment.
Various perimeter metering control strategies have been proposed for urban traffic networks that rely on the existence of well-defined relationships between network productivity and accumulation, known more commonly as network Macroscopic Fundamental Diagrams (MFD). Most existing perimeter metering control strategies require accurate modeling of traffic dynamics with full knowledge of the network MFD and dynamic equations to describe how vehicles move across regions of the network. However, such information is generally difficult to obtain and subject to error. Some model free perimeter metering control schemes have been recently proposed in the literature. However, these existing approaches require estimates of network properties (e.g., the critical accumulation associated with maximum network productivity) in the controller designs. In this paper, a model free deep reinforcement learning perimeter control (MFDRLPC) scheme is proposed for two-region urban networks that features agents with either continuous or discrete action spaces. The proposed agents learn to select control actions through a reinforcement learning process without assuming any information about environment dynamics. Results from extensive numerical experiments demonstrate that the proposed agents: (a) can consistently learn perimeter control strategies under various environment configurations; (b) are comparable in performance to the state-of-the-art, model predictive control (MPC); and, (c) are highly transferable to a wide range of traffic conditions and dynamics in the environment.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据