☆ 4.0 Article

An information-theoretic approach to curiosity-driven reinforcement learning

THEORY IN BIOSCIENCES (2012)

期刊

THEORY IN BIOSCIENCES

卷 131, 期 3, 页码 139-148

出版社

SPRINGER

DOI: 10.1007/s12064-011-0142-z

关键词

Reinforcement learning; Exploration-exploitation trade-off; Information theory; Rate distortion theory; Curiosity; Adaptive behavior

类别

Biology Mathematical & Computational Biology

资金

NSERC
ONR

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We provide a fresh look at the problem of exploration in reinforcement learning, drawing on ideas from information theory. First, we show that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Second, we address the problem of curiosity-driven learning. We propose that, in addition to maximizing the expected return, a learner should choose a policy that also maximizes the learner's predictive power. This makes the world both interesting and exploitable. Optimal policies then have the form of Boltzmann-style exploration with a bonus, containing a novel exploration-exploitation trade-off which emerges naturally from the proposed optimization principle. Importantly, this exploration-exploitation trade-off persists in the optimal deterministic policy, i.e., when there is no exploration due to randomness. As a result, exploration is understood as an emerging behavior that optimizes information gain, rather than being modeled as pure randomization of action choices.

An information-theoretic approach to curiosity-driven reinforcement learning

期刊

THEORY IN BIOSCIENCES

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

An information-theoretic approach to curiosity-driven reinforcement learning

期刊

THEORY IN BIOSCIENCES

出版社

SPRINGER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文