4.7 Article

Improvement of Reinforcement Learning With Supermodularity

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2023.3244024

Keywords

Optimization; Dynamic programming; Industries; Heuristic algorithms; Data analysis; Approximation algorithms; Sufficient conditions; Dynamic parameter; monotone comparative statics; optimization; reinforcement learning (RL); supermodularity

Ask authors/readers for more resources

Reinforcement learning is a promising approach for learning and decision-making in dynamic environments. This article explores reducing the action space in reinforcement learning using supermodularity. By applying monotone comparative statics, a monotonicity cut is proposed to remove unpromising actions from the action space. Evaluation on benchmark datasets and comparison with popular baseline algorithms demonstrate the effectiveness of the monotonicity cut in improving the performance of reinforcement learning.
Reinforcement learning (RL) is a promising approach to tackling learning and decision-making problems in a dynamic environment. Most studies on RL focus on the improvement of state evaluation or action evaluation. In this article, we investigate how to reduce action space by using supermodularity. We consider the decision tasks in the multistage decision process as a collection of parameterized optimization problems, where state parameters dynamically vary along with the time or stage. The optimal solutions of these parameterized optimization problems correspond to the optimal actions in RL. For a given Markov decision process (MDP) with supermodularity, the monotonicity of the optimal action set and the optimal selection with respect to state parameters can be obtained by using the monotone comparative statics. Accordingly, we propose a monotonicity cut to remove unpromising actions from the action space. Taking bin packing problem (BPP) as an example, we show how the supermodularity and monotonicity cut work in RL. Finally, we evaluate the monotonicity cut on the benchmark datasets reported in the literature and compare the proposed RL with some popular baseline algorithms. The results show that the monotonicity cut can effectively improve the performance of RL.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available