Journal
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Volume -, Issue -, Pages -Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2023.3244024
Keywords
Optimization; Dynamic programming; Industries; Heuristic algorithms; Data analysis; Approximation algorithms; Sufficient conditions; Dynamic parameter; monotone comparative statics; optimization; reinforcement learning (RL); supermodularity
Ask authors/readers for more resources
Reinforcement learning is a promising approach for learning and decision-making in dynamic environments. This article explores reducing the action space in reinforcement learning using supermodularity. By applying monotone comparative statics, a monotonicity cut is proposed to remove unpromising actions from the action space. Evaluation on benchmark datasets and comparison with popular baseline algorithms demonstrate the effectiveness of the monotonicity cut in improving the performance of reinforcement learning.
Reinforcement learning (RL) is a promising approach to tackling learning and decision-making problems in a dynamic environment. Most studies on RL focus on the improvement of state evaluation or action evaluation. In this article, we investigate how to reduce action space by using supermodularity. We consider the decision tasks in the multistage decision process as a collection of parameterized optimization problems, where state parameters dynamically vary along with the time or stage. The optimal solutions of these parameterized optimization problems correspond to the optimal actions in RL. For a given Markov decision process (MDP) with supermodularity, the monotonicity of the optimal action set and the optimal selection with respect to state parameters can be obtained by using the monotone comparative statics. Accordingly, we propose a monotonicity cut to remove unpromising actions from the action space. Taking bin packing problem (BPP) as an example, we show how the supermodularity and monotonicity cut work in RL. Finally, we evaluate the monotonicity cut on the benchmark datasets reported in the literature and compare the proposed RL with some popular baseline algorithms. The results show that the monotonicity cut can effectively improve the performance of RL.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available