4.7 Article

Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Journal

ARTIFICIAL INTELLIGENCE
Volume 187, Issue -, Pages 115-132

Publisher

ELSEVIER
DOI: 10.1016/j.artint.2012.04.006

Keywords

Partially observable Markov decision process; Reinforcement learning; Bayesian methods

Ask authors/readers for more resources

Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially lithe agent's sensors provide only noisy or partial information. In this setting. Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agent's knowledge anti actions that increase the agent's immediate reward. However, the task of specifying the POMDP's parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive. In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries in which we ask an expert for the correct action to infer the consequences of a potential pitfall without experiencing its effects. More important for human-robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified. (C) 2012 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available