期刊
THEORY IN BIOSCIENCES
卷 131, 期 3, 页码 139-148出版社
SPRINGER
DOI: 10.1007/s12064-011-0142-z
关键词
Reinforcement learning; Exploration-exploitation trade-off; Information theory; Rate distortion theory; Curiosity; Adaptive behavior
资金
- NSERC
- ONR
We provide a fresh look at the problem of exploration in reinforcement learning, drawing on ideas from information theory. First, we show that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Second, we address the problem of curiosity-driven learning. We propose that, in addition to maximizing the expected return, a learner should choose a policy that also maximizes the learner's predictive power. This makes the world both interesting and exploitable. Optimal policies then have the form of Boltzmann-style exploration with a bonus, containing a novel exploration-exploitation trade-off which emerges naturally from the proposed optimization principle. Importantly, this exploration-exploitation trade-off persists in the optimal deterministic policy, i.e., when there is no exploration due to randomness. As a result, exploration is understood as an emerging behavior that optimizes information gain, rather than being modeled as pure randomization of action choices.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据