☆ 4.7 Article

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach

JOURNAL OF MACHINE LEARNING RESEARCH (2021)

Journal

JOURNAL OF MACHINE LEARNING RESEARCH

Volume 22, Issue -, Pages -

Publisher

MICROTOME PUBL

Keywords

Reinforcement Learning; Approximate Dynamic Programming; Approximate Policy Iteration; Policy Oscillation; Policy Chattering; Markov Decision Process

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper presents a study on the policy improvement step in approximate policy iteration algorithms, proposing three safe policy-iteration schemas to address oscillations in policy iteration. The proposed algorithms are empirically evaluated and compared in various domains to explore solutions for potential issues in policy iteration.

This paper presents a study of the policy improvement step that can be usefully exploited by approximate policy-iteration algorithms. When either the policy evaluation step or the policy improvement step returns an approximated result, the sequence of policies produced by policy iteration may not be monotonically increasing, and oscillations may occur. To address this issue, we consider safe policy improvements, i.e., at each iteration, we search for a policy that maximizes a lower bound to the policy improvement w.r.t. the current policy, until no improving policy can be found. We propose three safe policy-iteration schemas that differ in the way the next policy is chosen w.r.t. the estimated greedy policy. Besides being theoretically derived and discussed, the proposed algorithms are empirically evaluated and compared on some chain-walk domains, the prison domain, and on the Blackjack card game.

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach

Journal

JOURNAL OF MACHINE LEARNING RESEARCH

Publisher

MICROTOME PUBL

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach

Journal

JOURNAL OF MACHINE LEARNING RESEARCH

Publisher

MICROTOME PUBL

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper