☆ 4.7 Article

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

MACHINE LEARNING (2013)

期刊

MACHINE LEARNING

卷 91, 期 3, 页码 325-349

出版社

SPRINGER

DOI: 10.1007/s10994-013-5368-1

关键词

Sample complexity; Markov decision processes; Reinforcement learning; Learning theory

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP: value iteration and policy iteration. The first result indicates that for an MDP with N state-action pairs and the discount factor gamma a[0,1) only O(Nlog(N/delta)/((1-gamma)(3) epsilon (2))) state-transition samples are required to find an epsilon-optimal estimation of the action-value function with the probability (w.p.) 1-delta. Further, we prove that, for small values of epsilon, an order of O(Nlog(N/delta)/((1-gamma)(3) epsilon (2))) samples is required to find an epsilon-optimal policy w.p. 1-delta. We also prove a matching lower bound of I similar to(Nlog(N/delta)/((1-gamma)(3) epsilon (2))) on the sample complexity of estimating the optimal action-value function with epsilon accuracy. To the best of our knowledge, this is the first minimax result on the sample complexity of RL: the upper bounds match the lower bound in terms of N, epsilon, delta and 1/(1-gamma) up to a constant factor. Also, both our lower bound and upper bound improve on the state-of-the-art in terms of their dependence on 1/(1-gamma).

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

期刊

MACHINE LEARNING

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

期刊

MACHINE LEARNING

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文