☆ 4.6 Article

A Swapping Target Q-Value Technique for Data Augmentation in Offline Reinforcement Learning

IEEE ACCESS (2022)

期刊

IEEE ACCESS

卷 10, 期 -, 页码 57369-57382

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2022.3178194

关键词

Behavioral sciences; Games; Artificial intelligence; Training; Q-learning; Medical services; Licenses; Offline reinforcement learning; data augmentation; generalization; Atari games

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

资金

National Research Foundation of Korea (NRF) - Ministry of Science and ICT (MSIT) [2021R1A4A1030075]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study introduces a novel data augmentation technique called Swapping Target Q-Value (SQV) to enhance offline RL algorithms and improve pixel-based learning. By matching the Q-values of transformed images with the target Q-values of original images, and considering similar states as the same and different states as more distinct, the performance of the method is observed to significantly increase in the Atari 2600 game domain.

Offline reinforcement learning (RL) is applied to fixed datasets of logged interactions pertaining to actual applications in healthcare, autonomous vehicles, and robotics. In limited and fixed dataset settings, data augmentation can be beneficial in developing better policies. Several online RL methods for data augmentation have recently been utilized to enhance sampling efficiency and generalization. Here, a novel, simple data-augmentation technique referred to as Swapping Target Q-Value (SQV) is introduced to enhance offline RL algorithms and enable robust pixel-based learning without auxiliary loss. Our method matches the current Q-value of a transformed image to the target Q-value of the next original image, whereby the current Q-value of the original image is matched to the target Q-value of the next transformed image. The proposed method considers similar states as the same and different states as more distinct. Furthermore, the approach ties unseen data (lacking in the dataset) to similar states in the seen data. After training, these effects were observed to increase the performance of the offline RL algorithm. The method was tested on 23 games in the Atari 2600 game domain. As a result, the performance of our method improved in 18 out of 23 games, with an average performance improvement of 144% compared with batch-constrained deep Q-learning (BCQ), which is the latest offline RL method. The implementation can be found at https://github.com/hotaekjoo/SQV.

A Swapping Target Q-Value Technique for Data Augmentation in Offline Reinforcement Learning

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Swapping Target Q-Value Technique for Data Augmentation in Offline Reinforcement Learning

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文