☆ 4.7 Article

Empowering the Diversity and Individuality of Option: Residual Soft Option Critic Framework

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (2023)

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

卷 34, 期 8, 页码 4816-4825

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNNLS.2021.3128666

关键词

Entropy; Task analysis; Reinforcement learning; Mutual information; Diversity reception; Convergence; Games; Deep reinforcement learning (RL); diversity and individuality; hierarchical RL (HRL); option critic; residual

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Computer Science, Theory & Methods Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Extracting temporal abstraction is a crucial challenge in hierarchical reinforcement learning. This study proposes methods to address the challenge through diversity and individuality perspectives.

Extracting temporal abstraction (option), which empowers the action space, is a crucial challenge in hierarchical reinforcement learning. Under a well-structured action space, decision-making agents can probe more deeply in the searching or plan efficiently through pruning irrelevant action candidates. However, automatically capturing a well-performed temporal abstraction is a nontrivial challenge due to its insufficient exploration and inadequate functionality. We consider alleviating this challenge from two perspectives, i.e., diversity and individuality. For the aspect of diversity, we propose a maximum entropy model based on ensembled options to encourage exploration. For the aspect of individuality, we propose to distinguish each option accurately, utilizing mutual formation minimization, so that each option can better express and function. We name our framework as an ensemble with soft option (ESO) critics. Furthermore, the residual algorithm (RA) with a bidirectional target network is introduced to stabilize bootstrapping, yielding a residual version of ESO. We provide detailed analysis for extensive experiments, which shows that our method boosts performance in commonly used continuous control tasks.

Empowering the Diversity and Individuality of Option: Residual Soft Option Critic Framework

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Empowering the Diversity and Individuality of Option: Residual Soft Option Critic Framework

期刊

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文