4.7 Article

Bi-Level Off-Policy Reinforcement Learning for Two-Timescale Volt/VAR Control in Active Distribution Networks

期刊

IEEE TRANSACTIONS ON POWER SYSTEMS
卷 38, 期 1, 页码 385-395

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TPWRS.2022.3168700

关键词

Volt/var control; reinforcement learning; bi-level; multi-timescale; active distribution networks

向作者/读者索取更多资源

In this paper, a novel bi-level off-policy reinforcement learning method is proposed to solve the Volt/Var control problem in active distribution networks (ADNs) without accurate system models. By defining a Bi-level Markov decision process and using separate agents for the slow and fast timescale sub-problems, the proposed method achieves stable and satisfactory optimization of both discrete and continuous devices in ADNs.
In Volt/Var control (VVC) of active distribution networks (ADNs), both slow timescale discrete devices (STDDs, e.g. on-load tap changers) and fast timescale continuous devices (FTCDs, e.g. distributed generators) are involved and should be coordinated in time sequence. Traditional two-timescale VVC optimizes STDDs and FTCDs based on accurate system models, but sometimes is impractical because of its unaffordable modeling effort. In this paper, a novel bi-level off-policy reinforcement learning (RL) method is proposed to solve this in a model-free manner. A Bi-level Markov decision process (BMDP) is defined and separate agents are set up for the slow and fast timescale sub-problems. For the fast timescale sub-problem, we adopt an off-policy RL method with high sample efficiency. For the slow one, we develop an off-policy multi-discrete soft actor-critic (MDSAC) algorithm to address the curse of dimensionality with various STDDs. To mitigate the non-stationary issue in the two agents' training, we propose a multi-timescale off-policy correction (MTOPC) method by adopting the importance sampling technique. Comprehensive numerical studies not only demonstrate the proposed method can achieve stable and satisfactory optimization of both STDDs and FTCDs without any model information, but also support that the proposed method outperforms existing VVC methods involving both STDDs and FTCDs.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据