3.8 Proceedings Paper

Safe HVAC Control via Batch Reinforcement Learning

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/ICCPS54341.2022.00023

Keywords

HVAC control; Batch Reinforcement Learning; Deep Reinforcement Learning

Funding

  1. CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program - DARPA

Ask authors/readers for more resources

Buildings account for 30% of global energy use, with HVAC systems contributing to approximately half of it. Prior works on reinforcement learning (RL) for HVAC energy efficiency have limitations with online methods and historical data-driven models. Batch RL methods can learn from historical data and improve existing policies without interacting with real buildings or simulators. Our algorithm incorporates KL regularization term and achieves significant energy reduction in a real multi-zone, multi-floor building, surpassing state-of-the-art methods and enhancing occupants' thermal comfort.
Buildings account for 30% of energy use worldwide, and approximately half of it is ascribed to HVAC systems. Reinforcement Learning (RL) has improved upon traditional control methods in increasing the energy efficiency of HVAC systems. However, prior works use online RL methods that require configuring complex thermal simulators to train or use historical data-driven thermal models that can take at least 10(4) time steps to reach rule-based performance Also, due to the distribution drift from simulator to real buildings, RL solutions are therefore seldom deployed in the real world. On the other hand, batch RL methods can learn from the historical data and improve upon the existing policy without any interactions with the real buildings or simulators during the training. With the existing rule-based policy as the priors, the policies learned with batch RL are better than the existing control from the first day of deployment with very few training steps compared with online methods. Our algorithm incorporates a Kullback-Leibler (KL) regularization term to penalize policies that deviate far from the previous ones. We evaluate our framework on a real multi-zone, multi-floor building-it achieves 7.2% in energy reduction cf. the state-of-the-art batch RL method, and outperforms other BRL methods in occupants' thermal comfort, and 16.7% energy reduction compared to the default rule-based control.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available