☆ 4.7 Article

Uncertainty quantification for operators in online reinforcement learning

KNOWLEDGE-BASED SYSTEMS (2022)

期刊

KNOWLEDGE-BASED SYSTEMS

卷 258, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.knosys.2022.109998

关键词

Q-learning; Reinforcement learning; Uncertainty

类别

Computer Science, Artificial Intelligence

资金

Science and Technology Research Project
Jiangxi Education Department, Startup Project of Doctor Scientific Research
Jiangxi University of Science and Technology
[GJJ180442]
[2022205200100595]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, a variant of Q-learning, named uncertainty quantification based Q-learning, is proposed by introducing the hedonistic expected value (HEV) to increase the probability of outputting an optimal partial order in online reinforcement learning. The weights assigned by HEV to the successors are compatible with the existing operators, and the prediction of the return is not only the sum over the weights succeeding the operator but also over the weights following HEV through re-weighting. The proposed algorithm with HEV demonstrates favorable performance in practice.

In online reinforcement learning, operators predict the return by weighting the successors' estimated value. However, due to the lack of uncertainty quantification, weights assigned by operators are affected by the potentially biased estimations. As a result, the partial order of estimated values is ineffective. To increase the probability of outputting an optimal partial order, this paper introduces the hedonistic expected value (HEV), an upper bound of the return's expectation to quantify the uncer-tainty. Notably, for compatibility reasons, some complex operators are rewritten as the weighted-sum forms. Based on the weighted-sum form of the operator, the variant Q-learning, namely uncertainty quantification based Q-learning is proposed in this paper. In the proposed algorithm, the weights assigned by HEV of the successors are compatible with the existing operators. The prediction of the return is not only the sum over the weights succeeding the operator but also over the weights following HEV through re-weighting. The greediness of the re-weighted operator is unchanged, and the contraction mapping indicates the convergence can be maintained. We demonstrate that the proposed algorithm with HEV performs favorably in practice. (c) 2022 Elsevier B.V. All rights reserved.

Uncertainty quantification for operators in online reinforcement learning

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Uncertainty quantification for operators in online reinforcement learning

期刊

KNOWLEDGE-BASED SYSTEMS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文