☆ 4.6 Article

An efficient learning framework for multiproduct inventory systems with customer choices

PRODUCTION AND OPERATIONS MANAGEMENT (2022)

期刊

PRODUCTION AND OPERATIONS MANAGEMENT

卷 31, 期 6, 页码 2492-2516

出版社

WILEY

DOI: 10.1111/poms.13693

关键词

demand censoring; inventory control; multiproduct; online learning

类别

Engineering, Manufacturing Operations Research & Management Science

资金

Hong Kong Research Grants Council, Early Career Scheme [CUHK 24505918]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper investigates a periodic-review multiproduct inventory system in which customers' purchasing decisions are influenced by product availabilities. A UCB-based learning framework is proposed to address the learning problem by utilizing sales information based on two improvement ideas. Improved UCB algorithms are developed for two specific systems with tight worst-case convergence rates. Extensive numerical experiments demonstrate the efficiency of the improved UCB algorithms.

We consider a periodic-review multiproduct inventory system where customers' purchasing decisions are affected by the product availabilities. Demands need to be learned on the fly, through the partial and censored feedback of customers. For this learning problem, if one ignores the inventory dynamic and treats it as a multiarmed bandit problem and directly applies some existing algorithms, for example, the upper confidence bound (UCB) algorithm, the convergence can be extremely slow due to the high-dimensionality of the policy space. We propose a UCB-based learning framework that utilizes the sales information based on two improvement ideas. We illustrate how these two ideas can be incorporated by considering two specific systems: (1) multiproduct inventory system with stock-out substitutions, (2) multiproduct inventory assortment problem for urban warehouses. We develop improved UCB algorithms for both systems, using the two improvements. For both systems, the algorithm can achieve a tight worst-case convergence rate (up to a logarithmic term) on the planning horizon T$T$. Extensive numerical experiments are conducted to demonstrate the efficiency of the improved UCB algorithms for the two systems. In the experiments, when there are more than 1000 candidate policies to choose from, the algorithms can achieve around 15%$15\%$ average expected regret within 50 periods and continue to steadily improve as time increases.

An efficient learning framework for multiproduct inventory systems with customer choices

期刊

PRODUCTION AND OPERATIONS MANAGEMENT

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

An efficient learning framework for multiproduct inventory systems with customer choices

期刊

PRODUCTION AND OPERATIONS MANAGEMENT

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文