4.6 Article

Scalable multi-product inventory control with lead time constraints using reinforcement learning

Journal

NEURAL COMPUTING & APPLICATIONS
Volume 34, Issue 3, Pages 1735-1757

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s00521-021-06129-w

Keywords

Multi-agent reinforcement learning; Supply chain; Scalability and parallelisation

Ask authors/readers for more resources

This paper proposes a deep reinforcement learning approach for multi-product, multi-period inventory management, addressing the challenges of inventory control under realistic constraints. The method outperforms baseline heuristics and can transfer learning to inventory control problems with different numbers of products without retraining.
Determining optimum inventory replenishment decisions are critical for retail businesses with uncertain demand. The problem becomes particularly challenging when multiple products with different lead times and cross-product constraints are considered. This paper addresses the aforementioned challenges in multi-product, multi-period inventory management using deep reinforcement learning (deep RL). The proposed approach improves upon existing methods for inventory control on three fronts: (1) concurrent inventory management of a large number (hundreds) of products under realistic constraints, (2) minimal retraining requirements on the RL agent under system changes through the definition of an individual product meta-model, (3) efficient handling of multi-period constraints that stem from different lead times of different products. We approach the inventory problem as a special class of dynamical system control, and explain why the generic problem cannot be satisfactorily solved using classical optimisation techniques. Subsequently, we formulate the problem in a general framework that can be used for parallelised decision-making using off-the-shelf RL algorithms. We also benchmark the formulation against the theoretical optimum achieved by linear programming under the assumptions that the demands are deterministic and known apriori. Experiments on scales between 100 and 220 products show that the proposed RL-based approaches perform better than the baseline heuristics, and quite close to the theoretical optimum. Furthermore, they are also able to transfer learning without retraining to inventory control problems involving different number of products.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available