☆ 4.5 Article

Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control

OPTIMAL CONTROL APPLICATIONS & METHODS (2016)

期刊

OPTIMAL CONTROL APPLICATIONS & METHODS

卷 37, 期 1, 页码 108-126

出版社

WILEY

DOI: 10.1002/oca.2156

关键词

badly conditioned learning; polynomial basis functions; rate of convergence; temporal difference learning; value function approximation

类别

Automation & Control Systems Operations Research & Management Science Mathematics, Applied

资金

Turkish Ministry of National Education

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Reinforcement learning is a powerful tool used to obtain optimal control solutions for complex and difficult sequential decision making problems where only a minimal amount of a priori knowledge exists about the system dynamics. As such, it has also been used as a model of cognitive learning in humans and applied to systems, such as humanoid robots, to study embodied cognition. In this paper, a different approach is taken where a simple test problem is used to investigate issues associated with the value function's representation and parametric convergence. In particular, the terminal convergence problem is analyzed with a known optimal control policy where the aim is to accurately learn the value function. For certain initial conditions, the value function is explicitly calculated and it is shown to have a polynomial form. It is parameterized by terms that are functions of the unknown plant's parameters and the value function's discount factor, and their convergence properties are analyzed. It is shown that the temporal difference error introduces a null space associated with the finite horizon basis function during the experiment. The learning problem is only non-singular when the experiment termination is handled correctly and a number of (equivalent) solutions are described. Finally, it is demonstrated that, in general, the test problem's dynamics are chaotic for random initial states and this causes digital offset in the value function learning. The offset is calculated, and a dead zone is defined to switch off learning in the chaotic region. Copyright (C) 2015 John Wiley & Sons, Ltd.

Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control

期刊

OPTIMAL CONTROL APPLICATIONS & METHODS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control

期刊

OPTIMAL CONTROL APPLICATIONS & METHODS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文