4.6 Article

Comparison of Deep Reinforcement Learning and Model Predictive Control for Adaptive Cruise Control

Journal

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES
Volume 6, Issue 2, Pages 221-231

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIV.2020.3012947

Keywords

Learning (artificial intelligence); Mathematical model; Cost function; Testing; Optimal control; Delays; Deep reinforcement learning; Model Predictive Control (MPC); Adaptive Cruise Control (ACC)

Funding

  1. Natural Sciences and Engineering Research Council of Canada
  2. Toyota Technical Center
  3. Ontario Centers of Excellence

Ask authors/readers for more resources

This study compared the performance of Deep Reinforcement Learning (DRL) and Model Predictive Control (MPC) in Adaptive Cruise Control design, finding that the two are comparable when testing data falls within the training range, but DRL performance degrades when the testing data is outside the training range.
This study compares Deep Reinforcement Learning (DRL) and Model Predictive Control (MPC) for Adaptive Cruise Control (ACC) design in car-following scenarios. A first-order system is used as the Control-Oriented Model (COM) to approximate the acceleration command dynamics of a vehicle. Based on the equations of the control system and the multi-objective cost function, we train a DRL policy using Deep Deterministic Policy Gradient (DDPG) and solve the MPC problem via Interior-Point Optimization (IPO). Simulation results for the episode costs show that, when there are no modeling errors and the testing inputs are within the training data range, the DRL solution is equivalent to MPC with a sufficiently long prediction horizon. Particularly, the DRL episode cost is only 5.8% higher than the benchmark optimal control solution provided by optimizing the entire episode via IPO. The DRL control performance degrades when the testing inputs are outside the training data range, indicating inadequate machine learning generalization. When there are modeling errors due to control delay, disturbances, and/or testing with a High-Fidelity Model (HFM) of the vehicle, the DRL-trained policy performs better when the modeling errors are large while having similar performances as MPC when the modeling errors are small.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available