☆ 4.6 Article

On actor-critic algorithms

SIAM JOURNAL ON CONTROL AND OPTIMIZATION (2003)

期刊

SIAM JOURNAL ON CONTROL AND OPTIMIZATION

卷 42, 期 4, 页码 1143-1166

出版社

SIAM PUBLICATIONS

DOI: 10.1137/S0363012901385691

关键词

reinforcement learning; Markov decision processes; actor-critic algorithms; stochastic approximation

类别

Automation & Control Systems Mathematics, Applied

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this article, we propose and analyze a class of actor-critic algorithms. These are two-time-scale algorithms in which the critic uses temporal difference learning with a linearly parameterized approximation architecture, and the actor is updated in an approximate gradient direction, based on information provided by the critic. We show that the features for the critic should ideally span a subspace prescribed by the choice of parameterization of the actor. We study actor-critic algorithms for Markov decision processes with Polish state and action spaces. We state and prove two results regarding their convergence.

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.6

评分不足

On actor-critic algorithms

期刊

SIAM JOURNAL ON CONTROL AND OPTIMIZATION

出版社

SIAM PUBLICATIONS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

On actor-critic algorithms

期刊

SIAM JOURNAL ON CONTROL AND OPTIMIZATION

出版社

SIAM PUBLICATIONS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文