☆ 4.5 Article

Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING (2022)

期刊

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING

卷 36, 期 2, 页码 334-353

出版社

WILEY

DOI: 10.1002/acs.3282

关键词

eligibility traces; instrumental variable method; least squares; reinforcement learning; temporal difference

类别

Automation & Control Systems Engineering, Electrical & Electronic

资金

Jiangsu Double Innovation Talents Project for Jiangsu province [4207012004]
National Natural Science Foundation of China [62073074]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

A new reinforcement learning method RLS-TD-f is proposed in this study, using a forgetting factor instead of eligibility traces, and its effectiveness is tested in a Policy Iteration setting.

We propose a new reinforcement learning method in the framework of Recursive Least Squares-Temporal Difference (RLS-TD). Instead of using the standard mechanism of eligibility traces (resulting in RLS-TD(lambda)), we propose to use the forgetting factor commonly used in gradient-based or least-square estimation, and we show that it has a similar role as eligibility traces. An instrumental variable perspective is adopted to formulate the new algorithm, referred to as RLS-TD with forgetting factor (RLS-TD-f). An interesting aspect of the proposed algorithm is that it has an interpretation of a minimizer of an appropriate cost function. We test the effectiveness of the algorithm in a Policy Iteration setting, meaning that we aim to improve the performance of an initially stabilizing control policy (over large portion of the state space). We take a cart-pole benchmark and an adaptive cruise control benchmark as experimental platforms.

Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

期刊

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Eligibility traces and forgetting factor in recursive least-squares-based temporal difference

期刊

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文