4.7 Article

Real-Time Holding Control for Transfer Synchronization via Robust Multiagent Reinforcement Learning

Journal

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Volume 23, Issue 12, Pages 23993-24007

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TITS.2022.3204805

Keywords

Transit holding control; network-wide transfer synchronization; robust multi-agent reinforcement learning; actor-critic learning architecture; deep deterministic policy gradient

Funding

  1. Natural Science Foundation of Jiangsu Province in China [BK20210250]
  2. National Natural Science Foundation of China [72201056, 72271117, 71901059, 52172316]
  3. National Science Foundation of United States [CMMI1637548, CMMI-1831140]
  4. Minnesota Department of Transportation [1003325 WO 111, 1003325 WO 44]

Ask authors/readers for more resources

This study presents a robust deep reinforcement learning approach for real-time network-wide holding control and evaluates its effectiveness in a simulator. The results show significant achievements in reducing computation time and waiting time, as well as exhibiting better robustness.
This study presents a robust deep reinforcement learning (RL) approach for real-time, network-wide holding control with transfer synchronization, considering stochastic passenger demand and vehicle running time during daily operations. The problem is formulated within a multi-agent RL framework where each active trip is considered as an agent, which not only interacts with the environment but also with other agents in the considered transit network. A specific learning procedure is developed to learn robust policies by introducing maxmin optimization into the learning objective. The agents are trained via deep deterministic policy gradient algorithm (DDPG) using an extended actor-critic framework with a joint action approximator. The effectiveness of the proposed approach is evaluated in a simulator, which is calibrated using data collected from a transit network in Twin Cities Minnesota, USA. The learned policy is compared with no control, rule-based control and the rolling horizon optimization control (RHOC). Computational results suggest that RL approach can significantly reduce the online computation time by about 50% compared with RHOC. In terms of policy performance, under deterministic scenario, the average waiting time of RL approach is 1.3% higher than the theoretical lower bound of average waiting time; under stochastic scenarios, RL approach could reduce as much as 18% average waiting time than RHOC, and the performance relative to RHOC improves when the level of system uncertainty increases. Evaluation under disrupted environment also suggests that the proposed RL method is more robust against short term uncertainties. The promising results in terms of both online computational efficiency and solution effectiveness suggest that the proposed RL method is a valid candidate for real-time transit control when the dynamics cannot be modeled perfectly with system uncertainties, as is the case for the network-wide transfer synchronization problem.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available