4.6 Article

Deep Reinforcement Learning for One-Warehouse Multi-Retailer inventory management

期刊

出版社

ELSEVIER
DOI: 10.1016/j.ijpe.2023.109088

关键词

Multi-echelon inventory control; Deep Reinforcement Learning; Allocation policies

向作者/读者索取更多资源

The One-Warehouse Multi-Retailer (OWMR) system is a typical distribution and inventory system. Previous research has focused on heuristic reordering and allocation strategies, which are time-consuming and problem-specific. This paper proposes a Deep Reinforcement Learning (DRL) algorithm for OWMR problems, which infers a multi-discrete action distribution and improves performance with a random rationing policy.
The One-Warehouse Multi-Retailer (OWMR) system is the prototypical distribution and inventory system. Many OWMR variants exist, e.g. demand in excess of supply may be completely back-ordered, partially back-ordered, or lost. Prior research has focused on the study of heuristic reordering policies such as echelon base-stock levels coupled with heuristic allocation policies. Constructing well-performing policies is time-consuming and must be redone for every problem variant. By contrast, Deep Reinforcement Learning (DRL) is a general purpose technique for sequential decision making that has yielded good results for various challenging inventory systems. However, applying DRL to OWMR problems is nontrivial, since allocation involves setting a quantity for each retailer: The number of possible allocations grows exponentially in the number of retailers. Since each action is typically associated with a neural network output node, this renders standard DRL techniques intractable. Our proposed DRL algorithm instead inferences a multi-discrete action distribution which has output nodes that grow linearly in the number of retailers. Moreover, when total retailer orders exceed the available warehouse inventory, we propose a random rationing policy that substantially improves the ability of standard DRL algorithms to train good policies because it promotes the learning of feasible retailer order quantities. The resulting algorithm outperforms general-purpose benchmark policies by similar to 1-3% for the lost sales case and by similar to 12-20% for the partial back-ordering case. For complete back-ordering, the algorithm cannot consistently outperform the benchmark.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据