4.7 Article

Safe multi-agent deep reinforcement learning for joint bidding and maintenance scheduling of generation units

Journal

RELIABILITY ENGINEERING & SYSTEM SAFETY
Volume 232, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.ress.2022.109081

Keywords

Maintenance scheduling; Generation units; Reinforcement learning; Multi-agent system

Ask authors/readers for more resources

This paper proposes a safe reinforcement learning algorithm for generating bidding decisions and scheduling maintenance in a competitive electricity market. By combining reinforcement learning and a predicted safety filter, the proposed approach can handle the challenges of incomplete information and critical safety constraints, while achieving a higher profit and satisfying system safety requirements.
This paper proposes a safe reinforcement learning algorithm for generation bidding decisions and unit maintenance scheduling in a competitive electricity market environment. In this problem, each unit aims to find a bidding strategy that maximizes its revenue while concurrently retaining its reliability by scheduling preventive maintenance. The maintenance scheduling provides some safety constraints which should be satisfied at all times. Meeting the critical safety and reliability requirements when the generation units have incomplete information regarding each other's bidding strategy is a challenging problem. Bi-level optimization and reinforcement learning are state-of-the-art approaches for solving this type of problem. However, neither bi-level optimization nor reinforcement learning can handle the challenges of incomplete information and critical safety constraints. To tackle these challenges, we propose the safe deep deterministic policy gradient reinforcement learning algorithm, which is based on a combination of reinforcement learning and a predicted safety filter. The case study demonstrates that the proposed approach can yield a higher profit compared to other state-of-the-art methods while concurrently satisfying the system safety constraints. Moreover, the case study shows that the reward of the learning algorithm with incomplete information can converge to a reward of the complete information game.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available