4.7 Article

Deep Inverse Reinforcement Learning for Objective Function Identification in Bidding Models

Journal

IEEE TRANSACTIONS ON POWER SYSTEMS
Volume 36, Issue 6, Pages 5684-5696

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TPWRS.2021.3076296

Keywords

Linear programming; Generators; Data models; Reinforcement learning; Decision making; Power markets; Object recognition; Electricity market; individual reward function; data-driven analysis; inverse reinforcement learning; deep reinforcement learning

Funding

  1. National Natural Science Foundation of China [U2066205]
  2. Shuimu Tsinghua Scholar Program

Ask authors/readers for more resources

This paper introduces a data-driven framework for identifying bidding objective functions, consisting of three steps: modeling bidding decision processes as a Markov decision process, using a deep inverse reinforcement learning method to identify reward functions, and customizing a deep Q-network method to simulate bidding behaviors. These methods have been tested on real market data from the Australian electricity market.
Due to the deregulation of power systems worldwide, bidding behavior simulation research has gained prominence. One crucial element in these studies is accurately defining and modelling the individual reward function (or objective function). Considering the ubiquitous information barriers between market participants and researchers, the common way is to develop reward functions based on theoretical assumptions, which will inevitably cause deviations from the real world. However, since market data have gradually become transparent in recent years, especially data regarding historical bidding behaviors, it is feasible to introduce data-driven methods to identify the individual reward functions that are hidden in raw bidding data. Thus, this paper proposes a data-driven bidding objective function identification framework with three procedures. First, the bidding decision processes of participants are formulated as a standard Markov decision process. Second, a deep inverse reinforcement learning method that is based on maximum entropy is introduced to identify individual reward functions, whose high-dimensional nonlinearity could be saved in multilayer perceptions (MLPs). Third, a deep Q-network method is customized to simulate the individual bidding behaviors based on the obtained MLP-based objective functions. The effectiveness and feasibility of the proposed framework and methods are tested based on real market data from the Australian electricity market.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available