☆ 4.7 Article

Online Model-Free n-Step HDP With Stability Analysis

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2020)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 31, 期 4, 页码 1255-1269

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2019.2919614

关键词

Mathematical model; Stability analysis; Dynamic programming; Programming; Training; Computer architecture; Learning systems; Adaptive dynamic programming (ADP); action-dependent (AD) heuristic dynamic programming (ADHDP); lambda-return; Lyapunov stability; uniformly ultimately bounded (UUB)

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

Missouri University of Science and Technology Intelligent Systems Center
Mary K. Finley Missouri Endowment
National Science Foundation
Lifelong Learning Machines program from DARPA/Microsystems Technology Office
Army Research Laboratory (ARL) [W911NF-18-2-0260]
Basra Oil Company (BOC), Iraq

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Because of a powerful temporal-difference (TD) with lambda [TD( lambda )] learning method, this paper presents a novel n-step adaptive dynamic programming (ADP) architecture that combines TD( lambda ) with regular TD learning for solving optimal control problems with reduced iterations. In contrast with a backward view learning of TD( lambda ) that is required an extra parameter named eligibility traces to update at the end of each episode (offline training), the new design in this paper has forward view learning, which is updated at each time step (online training) without needing the eligibility trace parameter in various applications without mathematical models. Therefore, the new design is called the online model-free n-step action-dependent (AD) heuristic dynamic programming [NSHDP( lambda )]. NSHDP( lambda ) has three neural networks: the critic network (CN) with regular one-step TD [TD(0)], the CN with n-step TD learning [or TD( lambda )], and the actor network (AN). Because the forward view learning does not require any extra eligibility traces associated with each state, the NSHDP( lambda ) architecture has low computational costs and is memory efficient. Furthermore, the stability is proven for NSHDP( lambda ) under certain conditions by using Lyapunov analysis to obtain the uniformly ultimately bounded (UUB) property. We compare the results with the performance of HDP and traditional action-dependent HDP( lambda ) [ADHDP( lambda )] with different lambda values. Moreover, a complex nonlinear system and 2-D maze problem are two simulation benchmarks in this paper, and the third one is an inverted pendulum simulation benchmark, which is presented in the supplemental material part of this paper. NSHDP( lambda ) performance is examined and compared with other ADP methods.

Online Model-Free n-Step HDP With Stability Analysis

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Online Model-Free n-Step HDP With Stability Analysis

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文