4.5 Article

Sequential Interdiction with Incomplete Information and Learning

期刊

OPERATIONS RESEARCH
卷 67, 期 1, 页码 72-89

出版社

INFORMS
DOI: 10.1287/opre.2018.1773

关键词

bilevel programming; attacker-defender; interdiction; learning; incomplete information; online optimization; robust optimization

资金

  1. Air Force Office of Scientific Research [FA2386-12-1-3032]
  2. Defense Thread Defense Agency [HDTRA1-14-1-0065]
  3. National Science Foundation [CMMI-1634835]
  4. Complex Engineering Systems Institute, ISCI [ICM-FIC: P05-004-F, CONICYT: FB0816]

向作者/读者索取更多资源

We present a framework for a class of sequential decision-making problems in the context of general interdiction problems, in which a leader and a follower repeatedly interact. At each period, the leader allocates resources to disrupt the performance of the follower (e.g., as in defender-attacker or network interdiction problems), who, in turn, minimizes some cost function over a set of activities that depends on the leader's decision. Although the follower has complete knowledge of the follower's problem, the leader has only partial information and needs to learn about the cost parameters, available resources, and the follower's activities from the feedback generated by the follower's actions. We measure policy performance in terms of its time-stability, defined as the number of periods it takes for the leader to match the actions of an oracle with complete information. In particular, we propose a class of greedy and robust policies and show that these policies are weakly optimal, eventually match the oracle's actions, and provide a real-time certificate of optimality. We also study a lower bound on any policy performance based on the notion of a semioracle. Our numerical experiments demonstrate that the proposed policies consistently outperform a reasonable benchmark and perform fairly close to the semioracle.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据