4.6 Article

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

出版社

TAYLOR & FRANCIS INC
DOI: 10.1080/01621459.2022.2110878

关键词

Infinite horizons; Off-policy evaluation; Reinforcement learning; Ridesourcing platforms; Statistical inference; Unmeasured confounders

资金

  1. EPSRC [EP/W014971/1]
  2. NSF [DMS-1555244, DMS2113637]

向作者/读者索取更多资源

This article focuses on constructing a confidence interval to predict the value of a target policy offline in infinite horizon settings, based on pre-collected observational data. Most existing works assume that there are no unmeasured variables that would confound observed actions, but this article demonstrates that by using auxiliary variables to mediate the effect of actions on system dynamics, the value of the target policy can be identified even in a confounded Markov decision process. A robust off policy value estimator is developed based on this result, which can handle potential model misspecification and provide rigorous uncertainty quantification.
This article is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In this article, we show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy's value is identifiable in a confounded Markov decision process. Based on this result, we develop an efficient off policy value estimator that is robust to potential model misspecification and provide rigorous uncertainty quantification. Our method is justified by theoretical results, simulated and real datasets obtained from ridesharing companies. A Python implementation of the proposed procedure is available at https://github.com/Mamba413/cope. Supplementary materials for this article are available online.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据