☆ 4.7 Article

Goal Representation Heuristic Dynamic Programming on Maze Navigation

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2013)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 24, 期 12, 页码 2038-2050

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2013.2271454

关键词

Adaptive dynamic programming; goal representation heuristic dynamic programming; maze navigation/path planning; Markov decision process; reinforcement learning

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

资金

National Science Foundation [CAREER ECCS 1053717]
Army Research Office [W911NF-12-1-0378]
NSF-DFG [CNS 1117314]
National Natural Science Foundation of China [51228701, 61075072]
Program for New Century Excellent Talents in University [NCET-10-0901]
Direct For Computer & Info Scie & Enginr
Division Of Computer and Network Systems [1117314] Funding Source: National Science Foundation
Directorate For Engineering
Div Of Electrical, Commun & Cyber Sys [1053717] Funding Source: National Science Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Goal representation heuristic dynamic programming (GrHDP) is proposed in this paper to demonstrate online learning in the Markov decision process. In addition to the (external) reinforcement signal in literature, we develop an adaptively internal goal/reward representation for the agent with the proposed goal network. Specifically, we keep the actor-critic design in heuristic dynamic programming (HDP) and include a goal network to represent the internal goal signal, to further help the value function approximation. We evaluate our proposed GrHDP algorithm on two 2-D maze navigation problems, and later on one 3-D maze navigation problem. Compared to the traditional HDP approach, the learning performance of the agent is improved with our proposed GrHDP approach. In addition, we also include the learning performance with two other reinforcement learning algorithms, namely Sarsa(lambda) and Q-learning, on the same benchmarks for comparison. Furthermore, in order to demonstrate the theoretical guarantee of our proposed method, we provide the characteristics analysis toward the convergence of weights in neural networks in our GrHDP approach.

Goal Representation Heuristic Dynamic Programming on Maze Navigation

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Goal Representation Heuristic Dynamic Programming on Maze Navigation

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文