4.7 Article

Pessimistic value iteration for multi-task data sharing in Offline Reinforcement Learning

期刊

ARTIFICIAL INTELLIGENCE
卷 326, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.artint.2023.104048

关键词

Uncertainty quantification; Data sharing; Pessimistic value iteration; Offline Reinforcement Learning

向作者/读者索取更多资源

Offline Reinforcement Learning has shown promising results in learning task-specific policies. However, directly sharing datasets from other tasks exacerbates the distribution shift issue in offline RL. In this paper, we propose an uncertainty-based Multi-Task Data Sharing approach that provides a unified framework for offline RL and resolves the distribution shift problem. The experimental results show that our algorithm outperforms previous state-of-the-art methods in challenging MTDS problems.
Offline Reinforcement Learning (RL) has shown promising results in learning a task-specific policy from a fixed dataset. However, successful offline RL often relies heavily on the coverage and quality of the given dataset. In scenarios where the dataset for a specific task is limited, a natural approach is to improve offline RL with datasets from other tasks, namely, to conduct Multi -Task Data Sharing (MTDS). Nevertheless, directly sharing datasets from other tasks exacerbates the distribution shift in offline RL. In this paper, we propose an uncertainty-based MTDS approach that shares the entire dataset without data selection. Given ensemble-based uncertainty quantification, we perform pessimistic value iteration on the shared offline dataset, which provides a unified framework for single-and multi-task offline RL. We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing. Empirically, we release an MTDS benchmark and collect datasets from three challenging domains. The experimental results show our algorithm outperforms the previous state-of-the-art methods in challenging MTDS problems.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据