4.7 Article

Distributed Multiagent Reinforcement Learning With Action Networks for Dynamic Economic Dispatch

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2023.3234049

关键词

Power demand; Heuristic algorithms; Prediction algorithms; Couplings; Approximation algorithms; Power system stability; Convex functions; Distributed optimization; dynamic economic dispatch; multiagent reinforcement learning (MARL); smart grids

向作者/读者索取更多资源

This article proposes a new class of distributed multiagent reinforcement learning (MARL) algorithm for addressing the dynamic economic dispatch problem (DEDP) in smart grids with coupling constraints. The algorithm utilizes a quadratic function to approximate the state-action value function and solves a convex optimization problem to obtain the approximate optimal solution. Furthermore, an improved experience replay mechanism is introduced to enhance the stability of the training process. The effectiveness and robustness of the proposed MARL algorithm are verified through simulations.
A new class of distributed multiagent reinforcement learning (MARL) algorithm suitable for problems with coupling constraints is proposed in this article to address the dynamic economic dispatch problem (DEDP) in smart grids. Specifically, the assumption made commonly in most existing results on the DEDP that the cost functions are known and/or convex is removed in this article. A distributed projection optimization algorithm is designed for the generation units to find the feasible power outputs satisfying the coupling constraints. By using a quadratic function to approximate the state-action value function of each generation unit, the approximate optimal solution of the original DEDP can be obtained by solving a convex optimization problem. Then, each action network utilizes a neural network (NN) to learn the relationship between the total power demand and the optimal power output of each generation unit, such that the algorithm obtains the generalization ability to predict the optimal power output distribution on an unseen total power demand. Furthermore, an improved experience replay mechanism is introduced into the action networks to improve the stability of the training process. Finally, the effectiveness and robustness of the proposed MARL algorithm are verified by simulation.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据