☆ 4.8 Article

Robust Losses for Learning Value Functions

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

卷 45, 期 5, 页码 6157-6167

出版社

IEEE COMPUTER SOC

DOI: 10.1109/TPAMI.2022.3213503

关键词

Approximation algorithms; Optimization; Function approximation; Prediction algorithms; Visualization; Tuning; Time-frequency analysis; Machine learning; reinforcement learning; function approximation

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Most value function learning algorithms in reinforcement learning suffer from sensitivity to outliers, which can lead to biased solutions and high-variance gradients. This study proposes a saddlepoint reformulation for robust losses such as Huber Bellman error and Absolute Bellman error, providing more stable gradient-based algorithms for online off-policy prediction and control. The solutions of robust losses are characterized, showing that they can yield notably better solutions than the mean squared Bellman error. These findings contribute to reducing sensitivity to meta-parameters in reinforcement learning.

Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clipping gradients, clipping rewards, rescaling rewards, or clipping errors. While these strategies appear to be related to robust losses-like the Huber loss-they are built on semi-gradient update rules which do not minimize a known loss. In this work, we build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem and propose a saddlepoint reformulation for a Huber Bellman error and Absolute Bellman error. We start from a formalization of robust losses, then derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings. We characterize the solutions of the robust losses, providing insight into the problem settings where the robust losses define notably better solutions than the mean squared Bellman error. Finally, we show that the resulting gradient-based algorithms are more stable, for both prediction and control, with less sensitivity to meta-parameters.

Robust Losses for Learning Value Functions

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Robust Losses for Learning Value Functions

期刊

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

出版社

IEEE COMPUTER SOC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文