☆ 4.7 Article

Temporal Difference Methods for General Projected Equations

IEEE TRANSACTIONS ON AUTOMATIC CONTROL (2011)

期刊

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

卷 56, 期 9, 页码 2128-2139

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TAC.2011.2115290

关键词

Approximation methods; dynamic programming; Markov decision processes; reinforcement learning; temporal difference methods

类别

Automation & Control Systems Engineering, Electrical & Electronic

资金

NSF [ECCS-0801549]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with near singularity in the associated matrix inversion. We develop deterministic iterative methods and their simulation-based versions, and we discuss a sharp qualitative distinction between them: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck.

Temporal Difference Methods for General Projected Equations

期刊

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Temporal Difference Methods for General Projected Equations

期刊

IEEE TRANSACTIONS ON AUTOMATIC CONTROL

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文