☆ 4.6 Article

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION (2022)

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

卷 -, 期 -, 页码 -

出版社

TAYLOR & FRANCIS INC

DOI: 10.1080/01621459.2022.2110878

关键词

Infinite horizons; Off-policy evaluation; Reinforcement learning; Ridesourcing platforms; Statistical inference; Unmeasured confounders

类别

Statistics & Probability

资金

EPSRC [EP/W014971/1]
NSF [DMS-1555244, DMS2113637]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article focuses on constructing a confidence interval to predict the value of a target policy offline in infinite horizon settings, based on pre-collected observational data. Most existing works assume that there are no unmeasured variables that would confound observed actions, but this article demonstrates that by using auxiliary variables to mediate the effect of actions on system dynamics, the value of the target policy can be identified even in a confounded Markov decision process. A robust off policy value estimator is developed based on this result, which can handle potential model misspecification and provide rigorous uncertainty quantification.

This article is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In this article, we show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy's value is identifiable in a confounded Markov decision process. Based on this result, we develop an efficient off policy value estimator that is robust to potential model misspecification and provide rigorous uncertainty quantification. Our method is justified by theoretical results, simulated and real datasets obtained from ridesharing companies. A Python implementation of the proposed procedure is available at https://github.com/Mamba413/cope. Supplementary materials for this article are available online.

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

出版社

TAYLOR & FRANCIS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

期刊

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

出版社

TAYLOR & FRANCIS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文