4.6 Article

Double Deep Q-Network with Dynamic Bootstrapping for Real-Time Isolated Signal Control: A Traffic Engineering Perspective

Journal

APPLIED SCIENCES-BASEL
Volume 12, Issue 17, Pages -

Publisher

MDPI
DOI: 10.3390/app12178641

Keywords

double deep Q-network; traffic signal control; traffic simulation; reinforcement learning

Funding

  1. National Natural Science Foundation of China [61374193]
  2. Humanities and Social Science Foundation of Ministry of Education of China [19YJCZH201]

Ask authors/readers for more resources

This study focuses on the real-time isolated signal control (RISC) problem at an intersection and improves a prevailing reinforcement learning (RL) method to solve it. By considering traffic engineering considerations and applying qualitative applicability analysis, the researchers propose a new RL algorithm based on deep Q-network (DDQN) and temporal-difference algorithm TD(Dyn) to address the problem. Experimental results demonstrate that the proposed method, termed D3ynQN, effectively reduces average vehicle delay compared to traditional fully-actuated control techniques.
Real-time isolated signal control (RISC) at an intersection is of interest in the field of traffic engineering. Energizing RISC with reinforcement learning (RL) is feasible and necessary. Previous studies paid less attention to traffic engineering considerations and under-utilized traffic expertise to construct RL tasks. This study profiles the single-ring RISC problem from the perspective of traffic engineers, and improves a prevailing RL method for solving it. By qualitative applicability analysis, we choose double deep Q-network (DDQN) as the basic method. A single agent is deployed for an intersection. Reward is defined with vehicle departures to properly encourage and punish the agent's behavior. The action is to determine the remaining green time for the current vehicle phase. State is represented in a grid-based mode. To update action values in time-varying environments, we present a temporal-difference algorithm TD(Dyn) to perform dynamic bootstrapping with the variable interval between actions selected. To accelerate training, we propose a data augmentation based on intersection symmetry. Our improved DDQN, termed D3ynQN, is subject to the signal timing constraints in engineering. The experiments at a close-to-reality intersection indicate that, by means of D3ynQN and non-delay-based reward, the agent acquires useful knowledge to significantly outperform a fully-actuated control technique in reducing average vehicle delay.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available