4.6 Article

Comparison of Deep Reinforcement Learning and Model Predictive Control for Adaptive Cruise Control

期刊

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES
卷 6, 期 2, 页码 221-231

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIV.2020.3012947

关键词

Learning (artificial intelligence); Mathematical model; Cost function; Testing; Optimal control; Delays; Deep reinforcement learning; Model Predictive Control (MPC); Adaptive Cruise Control (ACC)

资金

  1. Natural Sciences and Engineering Research Council of Canada
  2. Toyota Technical Center
  3. Ontario Centers of Excellence

向作者/读者索取更多资源

This study compared the performance of Deep Reinforcement Learning (DRL) and Model Predictive Control (MPC) in Adaptive Cruise Control design, finding that the two are comparable when testing data falls within the training range, but DRL performance degrades when the testing data is outside the training range.
This study compares Deep Reinforcement Learning (DRL) and Model Predictive Control (MPC) for Adaptive Cruise Control (ACC) design in car-following scenarios. A first-order system is used as the Control-Oriented Model (COM) to approximate the acceleration command dynamics of a vehicle. Based on the equations of the control system and the multi-objective cost function, we train a DRL policy using Deep Deterministic Policy Gradient (DDPG) and solve the MPC problem via Interior-Point Optimization (IPO). Simulation results for the episode costs show that, when there are no modeling errors and the testing inputs are within the training data range, the DRL solution is equivalent to MPC with a sufficiently long prediction horizon. Particularly, the DRL episode cost is only 5.8% higher than the benchmark optimal control solution provided by optimizing the entire episode via IPO. The DRL control performance degrades when the testing inputs are outside the training data range, indicating inadequate machine learning generalization. When there are modeling errors due to control delay, disturbances, and/or testing with a High-Fidelity Model (HFM) of the vehicle, the DRL-trained policy performs better when the modeling errors are large while having similar performances as MPC when the modeling errors are small.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据