☆ 4.4 Article

Regret in Online Combinatorial Optimization

MATHEMATICS OF OPERATIONS RESEARCH (2014)

期刊

MATHEMATICS OF OPERATIONS RESEARCH

卷 39, 期 1, 页码 31-45

出版社

INFORMS

DOI: 10.1287/moor.2013.0598

关键词

online optimization; combinatorial optimization; mirror descent; multi-armed bandits; minimax regret

类别

Operations Research & Management Science Mathematics, Applied

资金

Spanish Ministry of Science and Technology [MTM2009-09063]
PASCAL2 Network of Excellence [EC] [216886]
ICREA Funding Source: Custom

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the minimal loss she would have achieved by picking, in hindsight, the best possible action. Our goal is to understand the magnitude of the best possible (minimax) regret. We study the problem under three different assumptions for the feedback the decision maker receives: full information, and the partial information models of the so-called semi-bandit and bandit problems. In the full information case we show that the standard exponentially weighted average forecaster is a provably suboptimal strategy. For the semi-bandit model, by combining the Mirror Descent algorithm and the INF (Implicitely Normalized Forecaster) strategy, we are able to prove the first optimal bounds. Finally, in the bandit case we discuss existing results in light of a new lower bound, and suggest a conjecture on the optimal regret in that case.

Regret in Online Combinatorial Optimization

期刊

MATHEMATICS OF OPERATIONS RESEARCH

出版社

INFORMS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Regret in Online Combinatorial Optimization

期刊

MATHEMATICS OF OPERATIONS RESEARCH

出版社

INFORMS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文