4.7 Article

A Deep Reinforcement Learning-Based Energy Management Framework With Lagrangian Relaxation for Plug-In Hybrid Electric Vehicle

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TTE.2020.3043239

关键词

Energy management; Training; Optimization; Engines; Transportation; Safety; Reinforcement learning; Energy management; Lagrangian relaxation; plug-in hybrid electric vehicle (PHEV); reinforcement learning (RL); training safety

资金

  1. National Key R&D Program in China [2019YFB1600100]
  2. National Science Foundation of China [52072074, 51705020, 61620106002]
  3. Fundamental Research Funds for the Central Universities [2242020R10045]
  4. Zhishan Scholars Programs of Southeast University
  5. Postgraduate Education Reform Project of Jiangsu Province [KYCX20_0133]
  6. Science and Technology Major Project, Transportation of Jiangsu Province

向作者/读者索取更多资源

This study proposes an RL framework named CADC for energy management optimization in hybrid electric vehicles. The framework combines coach-actor-double-critic algorithm design, which effectively considers constrained conditions for online energy management to improve energy-saving rate.
Reinforcement learning (RL)-based energy management is one of the current hot spots of hybrid electric vehicles. Recent advances in RL-based energy management focus on energy-saving performance but less considers the constrained setting for training safety. This article proposes an RL framework named coach-actor-double-critic (CADC) for the optimization of energy management considered as the constrained Markov decision process (CMDP). A bilevel onboard controller includes a neural network (NN)-based strategy actor and rule-based strategy coach for online energy management. Once the output of the actor exceeds the constrained range of feasible solutions, the coach would take charge of energy management to ensure safety. By using the Lagrangian relaxation, the optimization for CMDP transforms into an unconstrained dual problem to minimize the energy consumption while minimizing the coach participation. The parameters of the actor are updated in a manner of policy gradient through RL training with the Lagrangian value function. Double-critic with the same structure synchronously estimates the value function to avoid overestimate bias. Several experiments with the bus trajectories data demonstrate the optimality, self-learning ability, and adaptability of CADC. The results indicate that CADC outperforms the existing RL-based strategies and reaches above 95% energy-saving rate of the off-line global optimum.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据