☆ 4.6 Article

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

IEEE ACCESS (2021)

Journal

IEEE ACCESS

Volume 9, Issue -, Pages -

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2021.3106662

Keywords

Artificial Intelligence; deep learning; reinforcement learning; proximal policy optimization; robotics and automation; robot learning

Funding

Japanese Ministry of Education, Culture, Sports, Science, and Technology (MEXT)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper investigates the impact of the optimization technique called "early stopping" on the performance of the PPO algorithm. The results show that PPO's performance is sensitive to the number of update iterations per epoch, and early stopping optimizations can dynamically adjust the update iterations, serving as a convenient alternative to tuning on K.

Code-level optimizations, which are low-level optimization techniques used in the implementation of algorithms, have generally been considered as tangential and often do not appear in published pseudo-code of Reinforcement Learning (RL) algorithms. However, recent studies suggest these optimizations to be critical to the performance of algorithms such as Proximal Policy Optimization (PPO). In this paper, we investigate the effect of one such optimization known as early stopping'' implemented for PPO in the popular openai/spinningup library but not in openai/baselines. This optimization technique, which we refer to as KLE-Stop, can stop the policy update within an epoch if the mean Kullback-Leibler (KL) Divergence between the target policy and current policy becomes too high. More specifically, we conduct experiments to examine the empirical importance of KLE-Stop and its conservative variant KLE-Rollback when they are used in conjunction with other common code-level optimizations. The main findings of our experiments are 1) the performance of PPO is sensitive to the number of update iterations per epoch (K), 2) Early stopping optimizations (KLE-Stop and KLE-Rollback) mitigate such sensitivity by dynamically adjusting the actual number of update iterations within an epoch, 3) Early stopping optimizations could serve as a convenient alternative to tuning on K.

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An Empirical Investigation of Early Stopping Optimizations in Proximal Policy Optimization

Journal

IEEE ACCESS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper