☆ 4.7 Article

A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning

JOURNAL OF MACHINE LEARNING RESEARCH (2021)

期刊

JOURNAL OF MACHINE LEARNING RESEARCH

卷 22, 期 -, 页码 -

出版社

MICROTOME PUBL

关键词

curriculum learning; reinforcement learning; self-paced learning; tempered inference; rl-as-inference

类别

Automation & Control Systems Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study introduces an automated curriculum generation method in reinforcement learning, formalizing the self-paced learning paradigm as inducing a distribution over training tasks to balance task complexity and the goal of matching a desired task distribution. Experiment results demonstrate that training on this induced distribution can help avoid poor local optima in different RL algorithms across tasks with uninformative rewards and challenging exploration requirements.

Across machine learning, the use of curricula has shown strong empirical potential to improve learning from data by avoiding local optima of training objectives. For reinforcement learning (RL), curricula are especially interesting, as the underlying optimization has a strong tendency to get stuck in local optima due to the exploration-exploitation trade-off. Recently, a number of approaches for an automatic generation of curricula for RL have been shown to increase performance while requiring less expert knowledge compared to manually designed curricula. However, these approaches are seldomly investigated from a theoretical perspective, preventing a deeper understanding of their mechanics. In this paper, we present an approach for automated curriculum generation in RL with a clear theoretical underpinning. More precisely, we formalize the well-known self-paced learning paradigm as inducing a distribution over training tasks, which trades off between task complexity and the objective to match a desired task distribution. Experiments show that training on this induced distribution helps to avoid poor local optima across RL algorithms in different tasks with uninformative rewards and challenging exploration requirements.

A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning

期刊

JOURNAL OF MACHINE LEARNING RESEARCH

出版社

MICROTOME PUBL

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning

期刊

JOURNAL OF MACHINE LEARNING RESEARCH

出版社

MICROTOME PUBL

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文