4.7 Article

TAG: Teacher-Advice Mechanism With Gaussian Process for Reinforcement Learning

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TNNLS.2023.3262956

关键词

Gaussian processes; Training; Trajectory; Task analysis; Supervised learning; Data models; Space exploration; Gaussian process; reinforcement learning (RL); teacher-advice mechanism

向作者/读者索取更多资源

Reinforcement learning suffers from sample inefficiency and exploration issues. Learning from demonstration (LfD) was proposed to address these problems, but often requires a large number of demonstrations. This study presents a sample efficient teacher-advice mechanism with Gaussian process (TAG) that leverages a few expert demonstrations. The TAG mechanism helps the agent explore the environment more intentionally and guides the agent accurately using a guided policy. Experiments show that TAG helps RL algorithms achieve significant performance gains and outperforms other LfD methods on delayed reward and continuous control environments.
Reinforcement learning (RL) still suffers from the problem of sample inefficiency and struggles with the exploration issue, particularly in situations with long-delayed rewards, sparse rewards, and deep local optimum. Recently, learning from demonstration (LfD) paradigm was proposed to tackle this problem. However, these methods usually require a large number of demonstrations. In this study, we present a sample efficient teacher-advice mechanism with Gaussian process (TAG) by leveraging a few expert demonstrations. In TAG, a teacher model is built to provide both an advice action and its associated confidence value. Then, a guided policy is formulated to guide the agent in the exploration phase via the defined criteria. Through the TAG mechanism, the agent is capable of exploring the environment more intentionally. Moreover, with the confidence value, the guided policy can guide the agent precisely. Also, due to the strong generalization ability of Gaussian process, the teacher model can utilize the demonstrations more effectively. Therefore, substantial improvement in performance and sample efficiency can be attained. Considerable experiments on sparse reward environments demonstrate that the TAG mechanism can help typical RL algorithms achieve significant performance gains. In addition, the TAG mechanism with soft actor-critic algorithm (TAG-SAC) attains the state-of-the-art performance over other LfD counterparts on several delayed reward and complicated continuous control environments.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据