4.0 Article

An information-theoretic approach to curiosity-driven reinforcement learning

期刊

THEORY IN BIOSCIENCES
卷 131, 期 3, 页码 139-148

出版社

SPRINGER
DOI: 10.1007/s12064-011-0142-z

关键词

Reinforcement learning; Exploration-exploitation trade-off; Information theory; Rate distortion theory; Curiosity; Adaptive behavior

资金

  1. NSERC
  2. ONR

向作者/读者索取更多资源

We provide a fresh look at the problem of exploration in reinforcement learning, drawing on ideas from information theory. First, we show that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Second, we address the problem of curiosity-driven learning. We propose that, in addition to maximizing the expected return, a learner should choose a policy that also maximizes the learner's predictive power. This makes the world both interesting and exploitable. Optimal policies then have the form of Boltzmann-style exploration with a bonus, containing a novel exploration-exploitation trade-off which emerges naturally from the proposed optimization principle. Importantly, this exploration-exploitation trade-off persists in the optimal deterministic policy, i.e., when there is no exploration due to randomness. As a result, exploration is understood as an emerging behavior that optimizes information gain, rather than being modeled as pure randomization of action choices.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.0
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据