☆ 4.7 Article

Control of exploitation-exploration meta-parameter in reinforcement learning

NEURAL NETWORKS (2002)

期刊

NEURAL NETWORKS

卷 15, 期 4-6, 页码 665-687

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/S0893-6080(02)00056-4

关键词

reinforcement learning, exploitation-exploration problem; neuromodulator; attention; partially observable Markov decision process

类别

Computer Science, Artificial Intelligence Neurosciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In reinforcement learning (RL), the duality between exploitation and exploration has long been an important issue. This paper presents a new method that controls the balance between exploitation and exploration. Our learning scheme is based on model-based RL, in which the Bayes inference with forgetting effect estimates the state-transition probability of the environment. The balance parameter, which corresponds to the randomness in action selection, is controlled based on variation of action results and perception of environmental change. When applied to maze tasks, our method successfully obtains good controls by adapting to environmental changes. Recently, Usher et al. [Science 283 (1999) 549] has suggested that noradrenergic neurons in the locus coeruleus may control the exploitation-exploration balance in a real brain and that the balance may correspond to the level of animal's selective attention. According to this scenario, we also discuss a possible implementation in the brain. (C) 2002 Elsevier Science Ltd. All rights reserved.

Control of exploitation-exploration meta-parameter in reinforcement learning

期刊

NEURAL NETWORKS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Control of exploitation-exploration meta-parameter in reinforcement learning

期刊

NEURAL NETWORKS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文