期刊
JOURNAL OF COMPUTER AND SYSTEM SCIENCES
卷 74, 期 8, 页码 1309-1331出版社
ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jcss.2007.08.009
关键词
Reinforcement learning; Learning theory; Markov Decision Processes
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents a theoretical analysis of MBIE and a new variation called MBIE-EB, proving their efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less online cousins from the literature. (C) 2008 Elsevier Inc. All rights reserved.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据