4.4 Article

Safe Building HVAC Control via Batch Reinforcement Learning

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TSUSC.2022.3164084

关键词

Batch reinforcement learning; safe building HVAC control; model-based offline performance evaluation

资金

  1. U.S. Army Research Office (ARO) [W911NF1910362]
  2. U.S. National Science Foundation (NSF) [1911229, 2009057]
  3. Direct For Computer & Info Scie & Enginr
  4. Division Of Computer and Network Systems [2009057] Funding Source: National Science Foundation
  5. Direct For Computer & Info Scie & Enginr
  6. Office of Advanced Cyberinfrastructure (OAC) [1911229] Funding Source: National Science Foundation
  7. U.S. Department of Defense (DOD) [W911NF1910362] Funding Source: U.S. Department of Defense (DOD)

向作者/读者索取更多资源

This paper studies safe building HVAC control through batch reinforcement learning. By introducing noise and model evaluation, this method achieves safety during exploration and performance improvement during deployment. Compared with rule-based controllers, this method shows significant performance improvements.
In this paper, we study safe building HVAC control via batch reinforcement learning. Random exploration in building HVAC control is infeasible due to safety considerations. However, diverse states are necessary for RL algorithms to learn useful policies. To enable safety during exploration, we propose guided exploration by adding a Gaussian noise to a hand-crafted rule-based controller. Adjusting the variance of the noise provides a tradeoff between the diversity of the dataset and the safety. We apply Conservative Q Learning (CQL) to learn a policy. CQL ensures that the trained policy stays within the policy distribution used to collect the dataset, thereby guarantees safety at deployment. To select the optimal policy during the offline training, we apply model-based performance evaluation. We use the widely adopted CityLearn testbed to evaluate the performance of our proposed method. Compared with a rule-based controller, our approach obtains 12% similar to 35% reduction in ramping, 3% similar to 10% reduction in 1-load factor, 3% similar to 8% reduction in daily peak at deployment with less than 10% performance degradation during the exploration. On the contrary, the performance degradation of the state-of-the-art online reinforcement learning algorithm during exploration is around 8% 18%. It also fails to surpass the performance of the rule-based controller at deployment.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据