4.7 Article

Goal Representation Heuristic Dynamic Programming on Maze Navigation

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2013.2271454

关键词

Adaptive dynamic programming; goal representation heuristic dynamic programming; maze navigation/path planning; Markov decision process; reinforcement learning

资金

  1. National Science Foundation [CAREER ECCS 1053717]
  2. Army Research Office [W911NF-12-1-0378]
  3. NSF-DFG [CNS 1117314]
  4. National Natural Science Foundation of China [51228701, 61075072]
  5. Program for New Century Excellent Talents in University [NCET-10-0901]
  6. Direct For Computer & Info Scie & Enginr
  7. Division Of Computer and Network Systems [1117314] Funding Source: National Science Foundation
  8. Directorate For Engineering
  9. Div Of Electrical, Commun & Cyber Sys [1053717] Funding Source: National Science Foundation

向作者/读者索取更多资源

Goal representation heuristic dynamic programming (GrHDP) is proposed in this paper to demonstrate online learning in the Markov decision process. In addition to the (external) reinforcement signal in literature, we develop an adaptively internal goal/reward representation for the agent with the proposed goal network. Specifically, we keep the actor-critic design in heuristic dynamic programming (HDP) and include a goal network to represent the internal goal signal, to further help the value function approximation. We evaluate our proposed GrHDP algorithm on two 2-D maze navigation problems, and later on one 3-D maze navigation problem. Compared to the traditional HDP approach, the learning performance of the agent is improved with our proposed GrHDP approach. In addition, we also include the learning performance with two other reinforcement learning algorithms, namely Sarsa(lambda) and Q-learning, on the same benchmarks for comparison. Furthermore, in order to demonstrate the theoretical guarantee of our proposed method, we provide the characteristics analysis toward the convergence of weights in neural networks in our GrHDP approach.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据