4.7 Article

Dynamic sparse coding-based value estimation network for deep reinforcement learning

期刊

NEURAL NETWORKS
卷 168, 期 -, 页码 180-193

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.neunet.2023.09.013

关键词

Deep reinforcement learning; Value estimation network; Dynamic sparse coding

向作者/读者索取更多资源

This paper proposes a Value Estimation Network (VEN) model based on Dynamic Sparse Coding (DSC) to address the issues of interference and redundant parameters in Deep Reinforcement Learning (DRL). The proposed algorithm achieves higher control performances in both discrete-action and continuous-action environments compared to existing benchmark DRL algorithms.
Deep Reinforcement Learning (DRL) is one powerful tool for varied control automation problems. Performances of DRL highly depend on the accuracy of value estimation for states from environments. However, the Value Estimation Network (VEN) in DRL can be easily influenced by the phenomenon of catastrophic interference from environments and training. In this paper, we propose a Dynamic Sparse Coding-based (DSC) VEN model to obtain precise sparse representations for accurate value prediction and sparse parameters for efficient training, which is not only applicable in Q-learning structured discrete-action DRL but also in actor-critic structured continuous-action DRL. In detail, to alleviate interference in VEN, we propose to employ DSC to learn sparse representations for accurate value estimation with dynamic gradients beyond the conventional l1 norm that provides same-value gradients. To avoid influences from redundant parameters, we employ DSC to prune weights with dynamic thresholds more efficiently than static thresholds like l1 norm. Experiments demonstrate that the proposed algorithms with dynamic sparse coding can obtain higher control performances than existing benchmark DRL algorithms in both discrete-action and continuous-action environments, e.g., over 25% increase in Puddle World and about 10% increase in Hopper. Moreover, the proposed algorithm can reach convergence efficiently with fewer episodes in different environments.(c) 2023 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据