☆ 4.7 Article

Proximal Parameter Distribution Optimization

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2021)

期刊

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

卷 51, 期 6, 页码 3771-3780

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TSMC.2019.2931946

关键词

Optimization; Task analysis; Artificial neural networks; Noise measurement; Uncertainty; Reinforcement learning; Acceleration; Exploration; optimization; parameter distribution; reinforcement learning (RL)

类别

Automation & Control Systems Computer Science, Cybernetics

资金

National Natural Science Foundation of China [61976215, 61772532]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The novel Proximal Parameter Distribution Optimization (PPDO) algorithm enhances the exploration ability of reinforcement learning agents by transforming neural network parameters and setting two groups of parameters. By limiting the amplitude of two consecutive parameter updates, PPDO reduces the influence of bias and variance on the value function approximation, thus improving the stability of parameter distribution optimization.

Encouraging the agent to explore has become a hot topic in the field of reinforcement learning (RL). The popular approaches to engage in exploration are mainly by injecting noise into neural network (NN) parameters or by augmenting additional intrinsic motivation term. However, the randomness of injecting noise and the metric for intrinsic reward must be chosen manually may make RL agents deviate from the optimal policy during the learning process. To enhance the exploration ability of agent and simultaneously ensure the stability of parameter learning, we proposed a novel proximal parameter distribution optimization (PPDO) algorithm. On the one hand, PPDO enhances the exploration ability of RL agent by transforming NN parameter from a certain single value to a function distribution. On the other hand, PPDO accelerates the parameter distribution optimization by setting two groups of parameters. The parameter optimization is guided by evaluating the parameter quality change before and after the parameter distribution update. In addition, PPDO reduces the influence of bias and variance on the value function approximation by limiting the amplitude of the two consecutive parameter updates, which can enhance the stability of the parameter distribution optimization. Experiments on the OpenAI Gym, Atari, and MuJoCo platforms indicate that PPDO can improve the exploration ability and learning efficiency of deep RL algorithms, including DQN and A3C.

Proximal Parameter Distribution Optimization

期刊

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Proximal Parameter Distribution Optimization

期刊

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文