Abstract
Determining optimum inventory replenishment decisions are critical for retail businesses with uncertain demand. The problem becomes particularly challenging when multiple products with different lead times and cross-product constraints are considered. This paper addresses the aforementioned challenges in multi-product, multi-period inventory management using deep reinforcement learning (deep RL). The proposed approach improves upon existing methods for inventory control on three fronts: (1) concurrent inventory management of a large number (hundreds) of products under realistic constraints, (2) minimal retraining requirements on the RL agent under system changes through the definition of an individual product meta-model, (3) efficient handling of multi-period constraints that stem from different lead times of different products. We approach the inventory problem as a special class of dynamical system control, and explain why the generic problem cannot be satisfactorily solved using classical optimisation techniques. Subsequently, we formulate the problem in a general framework that can be used for parallelised decision-making using off-the-shelf RL algorithms. We also benchmark the formulation against the theoretical optimum achieved by linear programming under the assumptions that the demands are deterministic and known apriori. Experiments on scales between 100 and 220 products show that the proposed RL-based approaches perform better than the baseline heuristics, and quite close to the theoretical optimum. Furthermore, they are also able to transfer learning without retraining to inventory control problems involving different number of products.
Similar content being viewed by others
Notes
For products with fixed expiry dates, wastage happens at discrete time instants t with \(a_i=0\). The quantity of wastage is known in advance, and can be absorbed in \({\mathbf{u}}(t)\).
References
Abdelmaguid TF, Dessouky MM (2006) A genetic algorithm approach to the integrated inventory-distribution problem. Int J Prod Res 44(21):4445–4464
Aharon BT, Boaz G, Shimrit S (2009) Robust multi-echelon multi-period inventory control. Eur J Oper Res 199(3):922–935
Akbari AA, Karimi B (2015) A new robust optimization approach for integrated multi-echelon, multi-product, multi-period supply chain network design under process uncertainty. Int J Adv Manuf Technol 79(1–4):229–244
Åström KJ, Wittenmark B (2013) Adaptive control. Courier Corporation
Baniwal V, Kayal C, Shah D, Ma P, Khadilkar H (2019) An imitation learning approach for computing anticipatory picking decisions in retail distribution centres. In: 2019 American control conference (ACC). IEEE, pp 4186–4191
Barat S, Khadilkar H, Meisheri H, Kulkarni V, Baniwal V, Kumar P, Gajrani M (2019) Actor based simulation for closed loop control of supply chain using reinforcement learning. In: Proceedings of the 18th international conference on autonomous agents and multiAgent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1802–1804
Barlas Y, Gunduz B (2011) Demand forecasting and sharing strategies to reduce fluctuations and the bullwhip effect in supply chains. J Oper Res Soc 62(3):458–473
Bertsekas DP (2005) Dynamic programming and optimal control, chap 6, vol 1. Athena scientific Belmont, MA
Bertsekas DP, Tsitsiklis JN (1995) Neuro-dynamic programming: an overview. In: Proceedings of 1995 34th IEEE conference on decision and control, vol. 1. IEEE, pp 560–564
Bertsimas D, Georghiou A (2015) Design of near optimal decision rules in multistage adaptive mixed-integer optimization. Oper Res 63(3):610–627
Bertsimas D, Thiele A (2004) A robust optimization approach to supply chain management. In: International conference on integer programming and combinatorial optimization. Springer, pp 86–100
Bertsimas D, Thiele A (2006) A robust optimization approach to inventory theory. Oper Res 54(1):150–168
Bouabdallah S, Noth A, Siegwart R (2004) Pid vs lq control techniques applied to an indoor micro quadrotor. In: Proceedings of The IEEE international conference on intelligent robots and systems (IROS). IEEE, pp 2451–2456
Cachon G, Fisher M (1997) Campbell soup’s continuous replenishment program: evaluation and enhanced inventory decision rules. Prod Oper Manage 6(3):266–276
Camacho EF, Alba CB (2013) Model predictive control. Springer Science & Business Media
Carbonneau R, Laframboise K, Vahidov R (2008) Application of machine learning techniques for supply chain demand forecasting. Eur J Oper Res 184(3):1140–1154
Cárdenas-Barrón LE, Treviño-Garza G (2014) An optimal solution to a three echelon supply chain network with multi-product and multi-period. Appl Math Modell 38(5–6):1911–1918
Caro F, Gallien J (2010) Inventory management of a fast-fashion retail network. Oper Res 58(2):257–273
Clark AJ, Scarf H (1960) Optimal policies for a multi-echelon inventory problem. Manage Sci 6(4):475–490
Coelho LC, Laporte G (2014) Optimal joint replenishment, delivery and inventory management policies for perishable products. Comput Oper Res 47:42–52
Condea C, Thiesse F, Fleisch E (2012) Rfid-enabled shelf replenishment with backroom monitoring in retail stores. Decis Supp Syst 52(4):839–849
Doyle JC, Glover K, Khargonekar PP, Francis BA (1989) State-space solutions to standard h/sub 2/and h/sub infinity/control problems. IEEE Trans Autom Control 34(8):831–847
Duan Y, Andrychowicz M, Stadie B, Ho J, Schneider J, Sutskever I, Abbeel P, Zaremba W (2017) One-shot imitation learning. In: NIPS 31
Fernie J, Sparks L (2018) Logistics and retail management: emerging issues and new challenges in the retail supply chain. Kogan page publishers
Giannoccaro I, Pontrandolfo P (2002) Inventory management in supply chains: a reinforcement learning approach. Int J Prod Econ 78(2):153–161
Godfrey GA, Powell WB (2002) An adaptive dynamic programming algorithm for dynamic fleet management, I: single period travel times. Transp Sci 36(1):21–39
Golnaraghi MF, Kuo BC (2017) Automatic control systems. McGraw-Hill Education
Gustafsson TK, Waller KV (1983) Dynamic modeling and reaction invariant control of ph. Chem Eng Sci 38(3):389–398
Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2020) Optimization in solving inventory control problem using nature inspired emperor penguins colony algorithm. J Intell Manuf 1–15
Harmer J, Gisslen L, del Val J, Holst H, Bergdahl J, Olsson T, Sjoo K, Nordin M (2018) Imitation learning with concurrent actions in 3d games. arXiv preprint arXiv:1803.05402
Hofmann E, Rutschmann E (2018) Big data analytics and demand forecasting in supply chains: a conceptual analysis. Int J Logist Manage
Ioannou PA, Sun J (1996) Robust adaptive control, vol 1. PTR Prentice-Hall Upper Saddle River, NJ
Jiang C, Sheng Z (2009) Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Exp Syst Appl 36(3):6520–6526
Kaggle: Instacart market basket analysis data. https://www.kaggle.com/c/instacart-market-basket-analysis/data. Accessed Aug 2018
Kara A, Dogan I (2018) Reinforcement learning approaches for specifying ordering policies of perishable inventory systems. Exp Syst Appl 91:150–158
Khadilkar H (2019) A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans Intell Transp Syst 20(2):727–736
Konda V, Tsitsiklis J (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
Kushner HJ, Clark DS (2012) Stochastic approximation methods for constrained and unconstrained systems, vol 26. Springer Science & Business Media
Lee H, Pinto JM, Grossmann IE, Park S (1996) Mixed-integer linear programming model for refinery short-term scheduling of crude oil unloading with inventory management. Indus Eng Chem Res 35(5):1630–1641
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. CoRR arxiv:abs/1509.02971
Liu XJ, Chan C (2006) Neuro-fuzzy generalized predictive control of boiler steam temperature. IEEE Trans Energy Con 21(4):900–908
Mayne DQ, Rawlings JB, Rao CV, Scokaert PO (2000) Constrained model predictive control: stability and optimality. Automatica 36(6):789–814
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep RL. Nature 518(7540):529
Mousavi SM, Hajipour V, Niaki STA, Aalikar N (2014) A multi-product multi-period inventory control problem under inflation and discount: a parameter-tuned particle swarm optimization algorithm. Int J Adv Manuf Technol 70(9–12):1739–1756
Nahmias S, Smith SA (1994) Optimizing inventory levels in a two-echelon retailer system with partial lost sales. Manage Sci 40(5):582–596
Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Experimental robotics IX. Springer, pp 363–372
Ogata K, Yang Y (2002) Modern control engineering, vol 4. Prentice Hall
Papadaki KP, Powell WB (2003) An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem. Naval Res Logist 50(7):742–769
Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality, vol 703. John Wiley & Sons
Proth JM, Sauer N, Wardi Y, Xie X (1996) Marking optimization of stochastic timed event graphs using ipa. Disc Event Dyn Syst 6(3):221–239
Qiu R, Shang J (2014) Robust optimisation for risk-averse multi-period inventory decision with partial demand distribution information. Int J Prod Res 52(24):7472–7495
Radhakrishnan P, Prasad V, Gopalan M (2009) Inventory optimization in supply chain management using genetic algorithm. Int J Comput Sci Netw Sec 9(1):33–40
Reddi SJ, Kale S, Kumar S (2019) On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237
Ross S, Bagnell JA (2010) Efficient reductions for imitation learning. In: Proceedings of the international conference artificial intelligence and statistics (AISTATS)
Schaal S (1999) Is imitation learning the route to humanoid robots? Trends Cogn Sci 3(6):233–242
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
See CT, Sim M (2010) Robust approximation to multiperiod inventory management. Oper Res 58(3):583–594
Shaabani H, Kamalabadi IN (2016) An efficient population-based simulated annealing algorithm for the multi-product multi-retailer perishable inventory routing problem. Comput Indus Eng 99:189–201
Shah D (2020) The six aces to thrive in supply chain 4.0. https://www.tcs.com/blogs/six-aces-to-thrive-in-supply-chain-4-0
Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295
Shervais S (2000) Adaptive critic design of control policies for a multi-echelon inventory system. Ph.D. thesis, Portland State University
Shervais S, Shannon TT, Lendaris GG (2003) Intelligent supply chain management using adaptive critic learning. IEEE Trans Syst Man Cybern Part A Syst Humans 33(2):235–244
Si J, Barto AG, Powell WB, Wunsch D (2004) Handbook of learning and approximate dynamic programming, vol 2. John Wiley & Sons
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Silver EA (1979) A simple inventory replenishment decision rule for a linear trend in demand. J Oper Res Soc 30(1):71–75
Silver EA (1981) Operations research in inventory management: a review and critique. Oper Res 29(4):628–645
Smith SA, Agrawal N (2000) Management of multi-item retail inventory systems with demand substitution. Oper Res 48(1):50–64
Song JS, van Houtum GJ, Van Mieghem JA (2020) Capacity and inventory management: Review, trends, and projections. Manuf Serv Oper Manage 22(1):36–46
Tavakoli A, Pardo F, Kormushev P (2018) Action branching architectures for deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Topaloglu H, Powell W (2006) Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS J Comput 18(1):31–42
Utkin V, Guldner J, Shi J (2009) Sliding mode control in electro-mechanical systems. CRC Press
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: AAAI, vol 2. Phoenix, AZ, p 5
Van Roy B, Bertsekas DP, Lee Y, Tsitsiklis JN (1997) A neuro-dynamic programming approach to retailer inventory management. In: Proceedings of the 36th IEEE conference on decision and control, vol 4, pp 4052–4057. https://doi.org/10.1109/CDC.1997.652501
Verma R, Saikia S, Khadilkar H, Agarwal P, Srinivasan A, Shroff G (2019) An RL framework for container selection and ship load sequencing in ports. In: International conference on autonomous agents and multi agent systems
Visentin A, Prestwich S, Rossi R, Tarim SA (2021) Computing optimal (r, s, s) policy parameters by a hybrid of branch-and-bound and stochastic dynamic programming. Eur J Oper Res
Yang L, Li H, Campbell JF, Sweeney DC (2017) Integrated multi-period dynamic inventory classification and control. Int J Prod Econ 189:86–96
Zhang W, Dietterich T (1995) A reinforcement learning approach to job-shop scheduling. In: International joint conference on artificial intelligence. Montreal, Canada
Zipkin P (2000) Foundations of inventory management. McGraw-Hill Companies, Incorporated. https://books.google.co.in/books?id=rjzbkQEACAAJ
Funding
None.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Meisheri, H., Sultana, N.N., Baranwal, M. et al. Scalable multi-product inventory control with lead time constraints using reinforcement learning. Neural Comput & Applic 34, 1735–1757 (2022). https://doi.org/10.1007/s00521-021-06129-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06129-w