4.5 Article

Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

出版社

WILEY
DOI: 10.1002/acs.3282

关键词

eligibility traces; instrumental variable method; least squares; reinforcement learning; temporal difference

资金

  1. Jiangsu Double Innovation Talents Project for Jiangsu province [4207012004]
  2. National Natural Science Foundation of China [62073074]

向作者/读者索取更多资源

A new reinforcement learning method RLS-TD-f is proposed in this study, using a forgetting factor instead of eligibility traces, and its effectiveness is tested in a Policy Iteration setting.
We propose a new reinforcement learning method in the framework of Recursive Least Squares-Temporal Difference (RLS-TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS-TD(lambda)), we propose to use the forgetting factor commonly used in gradient-based or least-square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS-TD with forgetting factor (RLS-TD-f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart-pole benchmark and an adaptive cruise control benchmark as experimental platforms.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据