4.7 Article

Real-Time Holding Control for Transfer Synchronization via Robust Multiagent Reinforcement Learning

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TITS.2022.3204805

关键词

Transit holding control; network-wide transfer synchronization; robust multi-agent reinforcement learning; actor-critic learning architecture; deep deterministic policy gradient

资金

  1. Natural Science Foundation of Jiangsu Province in China [BK20210250]
  2. National Natural Science Foundation of China [72201056, 72271117, 71901059, 52172316]
  3. National Science Foundation of United States [CMMI1637548, CMMI-1831140]
  4. Minnesota Department of Transportation [1003325 WO 111, 1003325 WO 44]

向作者/读者索取更多资源

This study presents a robust deep reinforcement learning approach for real-time network-wide holding control and evaluates its effectiveness in a simulator. The results show significant achievements in reducing computation time and waiting time, as well as exhibiting better robustness.
This study presents a robust deep reinforcement learning (RL) approach for real-time, network-wide holding control with transfer synchronization, considering stochastic passenger demand and vehicle running time during daily operations. The problem is formulated within a multi-agent RL framework where each active trip is considered as an agent, which not only interacts with the environment but also with other agents in the considered transit network. A specific learning procedure is developed to learn robust policies by introducing maxmin optimization into the learning objective. The agents are trained via deep deterministic policy gradient algorithm (DDPG) using an extended actor-critic framework with a joint action approximator. The effectiveness of the proposed approach is evaluated in a simulator, which is calibrated using data collected from a transit network in Twin Cities Minnesota, USA. The learned policy is compared with no control, rule-based control and the rolling horizon optimization control (RHOC). Computational results suggest that RL approach can significantly reduce the online computation time by about 50% compared with RHOC. In terms of policy performance, under deterministic scenario, the average waiting time of RL approach is 1.3% higher than the theoretical lower bound of average waiting time; under stochastic scenarios, RL approach could reduce as much as 18% average waiting time than RHOC, and the performance relative to RHOC improves when the level of system uncertainty increases. Evaluation under disrupted environment also suggests that the proposed RL method is more robust against short term uncertainties. The promising results in terms of both online computational efficiency and solution effectiveness suggest that the proposed RL method is a valid candidate for real-time transit control when the dynamics cannot be modeled perfectly with system uncertainties, as is the case for the network-wide transfer synchronization problem.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据