☆ 4.7 Article

Deterministic policy gradient algorithms for semi-Markov decision processes

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS (2022)

Journal

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS

Volume 37, Issue 7, Pages 4008-4019

Publisher

WILEY

DOI: 10.1002/int.22709

Keywords

average reward; deterministic policy; policy gradient theorem; reinforcement learning; SMDP

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper extends the DPG theorem from MDPs to SMDPs under the average-reward criterion, and presents two example actor-critic algorithms that demonstrate the efficacy of the method both mathematically and via simulations.

A large class of sequential decision-making problems under uncertainty, with broad applications from preventive maintenance to event-triggered control can be modeled in the framework of semi-Markov decision processes (SMDPs). Unlike Markov decision processes (MDPs), SMDPs are underexplored in the online and reinforcement learning (RL) settings. In this paper, we extend the well-known deterministic policy gradient (DPG) theorem in MDPs to SMDPs under average-reward criterion. The existing stochastic policy gradient methods not only require, in general, a large number of samples for training, but they also suffer from high variance in the gradient estimation when applied to problems with deterministic optimal policy. Our DPG method can potentially remedy these issues. On the basis of this method and depending on the choice of a critic, different actor-critic algorithms can easily be developed in the RL setup. We present two example actor-critic algorithms. Both algorithms employ our developed policy gradient theorem for their actors, but use two different critics; one uses a simple SARSA update while the other one uses the same on-policy update but with compatible function approximators. We demonstrate the efficacy of our method both mathematically and via simulations.

Deterministic policy gradient algorithms for semi-Markov decision processes

Journal

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS

Publisher

WILEY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Deterministic policy gradient algorithms for semi-Markov decision processes

Journal

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS

Publisher

WILEY

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper