4.7 Article

Accelerating reinforcement learning with case-based model-assisted experience augmentation for process control

期刊

NEURAL NETWORKS
卷 158, 期 -, 页码 197-215

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.neunet.2022.10.016

关键词

Reinforcement learning; Process control; Experience augmentation; Local dynamic model; Case -based reasoning

向作者/读者索取更多资源

This paper proposes a novel RL control algorithm, CBR-MA-DDPG, which combines case-based reasoning, model-assisted experience augmentation, and deep deterministic policy gradient to improve adaptability and control performance in industrial process control applications.
In the context of intelligent manufacturing in the process industry, traditional model-based opti-mization control methods cannot adapt to the situation of drastic changes in working conditions or operating modes. Reinforcement learning (RL) directly achieves the control objective by interacting with the environment, and has significant advantages in the presence of uncertainty since it does not require an explicit model of the operating plant. However, most RL algorithms fail to retain transfer learning capabilities in the presence of mode variation, which becomes a practical obstacle to industrial process control applications. To address these issues, we design a framework that uses local data augmentation to improve the training efficiency and transfer learning (adaptability) performance. Therefore, this paper proposes a novel RL control algorithm, CBR-MA-DDPG, organically integrating case-based reasoning (CBR), model-assisted (MA) experience augmentation, and deep deterministic policy gradient (DDPG). When the operating mode changes, CBR-MA-DDPG can quickly adapt to the varying environment and achieve the desired control performance within several training episodes. Experimental analyses on a continuous stirred tank reactor (CSTR) and an organic Rankine cycle (ORC) demonstrate the superiority of the proposed method in terms of both adaptability and control performance/robustness. The results show that the control performance of the CBR-MA-DDPG agent outperforms the conventional PI and MPC control schemes, and that it has higher training efficiency than the state-of-the-art DDPG, TD3, and PPO algorithms in transfer learning scenarios with mode shift situations.(c) 2022 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据