Skip to main content
Log in

Scalable multi-product inventory control with lead time constraints using reinforcement learning

  • S.I. : Adaptive and Learning Agents 2020
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Determining optimum inventory replenishment decisions are critical for retail businesses with uncertain demand. The problem becomes particularly challenging when multiple products with different lead times and cross-product constraints are considered. This paper addresses the aforementioned challenges in multi-product, multi-period inventory management using deep reinforcement learning (deep RL). The proposed approach improves upon existing methods for inventory control on three fronts: (1) concurrent inventory management of a large number (hundreds) of products under realistic constraints, (2) minimal retraining requirements on the RL agent under system changes through the definition of an individual product meta-model, (3) efficient handling of multi-period constraints that stem from different lead times of different products. We approach the inventory problem as a special class of dynamical system control, and explain why the generic problem cannot be satisfactorily solved using classical optimisation techniques. Subsequently, we formulate the problem in a general framework that can be used for parallelised decision-making using off-the-shelf RL algorithms. We also benchmark the formulation against the theoretical optimum achieved by linear programming under the assumptions that the demands are deterministic and known apriori. Experiments on scales between 100 and 220 products show that the proposed RL-based approaches perform better than the baseline heuristics, and quite close to the theoretical optimum. Furthermore, they are also able to transfer learning without retraining to inventory control problems involving different number of products.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. For products with fixed expiry dates, wastage happens at discrete time instants t with \(a_i=0\). The quantity of wastage is known in advance, and can be absorbed in \({\mathbf{u}}(t)\).

References

  1. Abdelmaguid TF, Dessouky MM (2006) A genetic algorithm approach to the integrated inventory-distribution problem. Int J Prod Res 44(21):4445–4464

    Article  MATH  Google Scholar 

  2. Aharon BT, Boaz G, Shimrit S (2009) Robust multi-echelon multi-period inventory control. Eur J Oper Res 199(3):922–935

    Article  MATH  Google Scholar 

  3. Akbari AA, Karimi B (2015) A new robust optimization approach for integrated multi-echelon, multi-product, multi-period supply chain network design under process uncertainty. Int J Adv Manuf Technol 79(1–4):229–244

    Article  Google Scholar 

  4. Åström KJ, Wittenmark B (2013) Adaptive control. Courier Corporation

  5. Baniwal V, Kayal C, Shah D, Ma P, Khadilkar H (2019) An imitation learning approach for computing anticipatory picking decisions in retail distribution centres. In: 2019 American control conference (ACC). IEEE, pp 4186–4191

  6. Barat S, Khadilkar H, Meisheri H, Kulkarni V, Baniwal V, Kumar P, Gajrani M (2019) Actor based simulation for closed loop control of supply chain using reinforcement learning. In: Proceedings of the 18th international conference on autonomous agents and multiAgent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1802–1804

  7. Barlas Y, Gunduz B (2011) Demand forecasting and sharing strategies to reduce fluctuations and the bullwhip effect in supply chains. J Oper Res Soc 62(3):458–473

    Article  Google Scholar 

  8. Bertsekas DP (2005) Dynamic programming and optimal control, chap 6, vol 1. Athena scientific Belmont, MA

    Google Scholar 

  9. Bertsekas DP, Tsitsiklis JN (1995) Neuro-dynamic programming: an overview. In: Proceedings of 1995 34th IEEE conference on decision and control, vol. 1. IEEE, pp 560–564

  10. Bertsimas D, Georghiou A (2015) Design of near optimal decision rules in multistage adaptive mixed-integer optimization. Oper Res 63(3):610–627

    Article  MathSciNet  MATH  Google Scholar 

  11. Bertsimas D, Thiele A (2004) A robust optimization approach to supply chain management. In: International conference on integer programming and combinatorial optimization. Springer, pp 86–100

  12. Bertsimas D, Thiele A (2006) A robust optimization approach to inventory theory. Oper Res 54(1):150–168

    Article  MathSciNet  MATH  Google Scholar 

  13. Bouabdallah S, Noth A, Siegwart R (2004) Pid vs lq control techniques applied to an indoor micro quadrotor. In: Proceedings of The IEEE international conference on intelligent robots and systems (IROS). IEEE, pp 2451–2456

  14. Cachon G, Fisher M (1997) Campbell soup’s continuous replenishment program: evaluation and enhanced inventory decision rules. Prod Oper Manage 6(3):266–276

  15. Camacho EF, Alba CB (2013) Model predictive control. Springer Science & Business Media

  16. Carbonneau R, Laframboise K, Vahidov R (2008) Application of machine learning techniques for supply chain demand forecasting. Eur J Oper Res 184(3):1140–1154

    Article  MATH  Google Scholar 

  17. Cárdenas-Barrón LE, Treviño-Garza G (2014) An optimal solution to a three echelon supply chain network with multi-product and multi-period. Appl Math Modell 38(5–6):1911–1918

    Article  MathSciNet  MATH  Google Scholar 

  18. Caro F, Gallien J (2010) Inventory management of a fast-fashion retail network. Oper Res 58(2):257–273

    Article  MathSciNet  MATH  Google Scholar 

  19. Clark AJ, Scarf H (1960) Optimal policies for a multi-echelon inventory problem. Manage Sci 6(4):475–490

    Article  Google Scholar 

  20. Coelho LC, Laporte G (2014) Optimal joint replenishment, delivery and inventory management policies for perishable products. Comput Oper Res 47:42–52

    Article  MathSciNet  MATH  Google Scholar 

  21. Condea C, Thiesse F, Fleisch E (2012) Rfid-enabled shelf replenishment with backroom monitoring in retail stores. Decis Supp Syst 52(4):839–849

    Article  Google Scholar 

  22. Doyle JC, Glover K, Khargonekar PP, Francis BA (1989) State-space solutions to standard h/sub 2/and h/sub infinity/control problems. IEEE Trans Autom Control 34(8):831–847

    Article  MATH  Google Scholar 

  23. Duan Y, Andrychowicz M, Stadie B, Ho J, Schneider J, Sutskever I, Abbeel P, Zaremba W (2017) One-shot imitation learning. In: NIPS 31

  24. Fernie J, Sparks L (2018) Logistics and retail management: emerging issues and new challenges in the retail supply chain. Kogan page publishers

  25. Giannoccaro I, Pontrandolfo P (2002) Inventory management in supply chains: a reinforcement learning approach. Int J Prod Econ 78(2):153–161

    Article  Google Scholar 

  26. Godfrey GA, Powell WB (2002) An adaptive dynamic programming algorithm for dynamic fleet management, I: single period travel times. Transp Sci 36(1):21–39

    Article  MATH  Google Scholar 

  27. Golnaraghi MF, Kuo BC (2017) Automatic control systems. McGraw-Hill Education

  28. Gustafsson TK, Waller KV (1983) Dynamic modeling and reaction invariant control of ph. Chem Eng Sci 38(3):389–398

    Article  Google Scholar 

  29. Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2020) Optimization in solving inventory control problem using nature inspired emperor penguins colony algorithm. J Intell Manuf 1–15

  30. Harmer J, Gisslen L, del Val J, Holst H, Bergdahl J, Olsson T, Sjoo K, Nordin M (2018) Imitation learning with concurrent actions in 3d games. arXiv preprint arXiv:1803.05402

  31. Hofmann E, Rutschmann E (2018) Big data analytics and demand forecasting in supply chains: a conceptual analysis. Int J Logist Manage

  32. Ioannou PA, Sun J (1996) Robust adaptive control, vol 1. PTR Prentice-Hall Upper Saddle River, NJ

    MATH  Google Scholar 

  33. Jiang C, Sheng Z (2009) Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Exp Syst Appl 36(3):6520–6526

    Article  Google Scholar 

  34. Kaggle: Instacart market basket analysis data. https://www.kaggle.com/c/instacart-market-basket-analysis/data. Accessed Aug 2018

  35. Kara A, Dogan I (2018) Reinforcement learning approaches for specifying ordering policies of perishable inventory systems. Exp Syst Appl 91:150–158

    Article  Google Scholar 

  36. Khadilkar H (2019) A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans Intell Transp Syst 20(2):727–736

    Article  Google Scholar 

  37. Konda V, Tsitsiklis J (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014

  38. Kushner HJ, Clark DS (2012) Stochastic approximation methods for constrained and unconstrained systems, vol 26. Springer Science & Business Media

  39. Lee H, Pinto JM, Grossmann IE, Park S (1996) Mixed-integer linear programming model for refinery short-term scheduling of crude oil unloading with inventory management. Indus Eng Chem Res 35(5):1630–1641

    Article  Google Scholar 

  40. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. CoRR arxiv:abs/1509.02971

  41. Liu XJ, Chan C (2006) Neuro-fuzzy generalized predictive control of boiler steam temperature. IEEE Trans Energy Con 21(4):900–908

    Article  MathSciNet  Google Scholar 

  42. Mayne DQ, Rawlings JB, Rao CV, Scokaert PO (2000) Constrained model predictive control: stability and optimality. Automatica 36(6):789–814

    Article  MathSciNet  MATH  Google Scholar 

  43. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep RL. Nature 518(7540):529

    Article  Google Scholar 

  44. Mousavi SM, Hajipour V, Niaki STA, Aalikar N (2014) A multi-product multi-period inventory control problem under inflation and discount: a parameter-tuned particle swarm optimization algorithm. Int J Adv Manuf Technol 70(9–12):1739–1756

    Article  Google Scholar 

  45. Nahmias S, Smith SA (1994) Optimizing inventory levels in a two-echelon retailer system with partial lost sales. Manage Sci 40(5):582–596

    Article  MATH  Google Scholar 

  46. Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Experimental robotics IX. Springer, pp 363–372

  47. Ogata K, Yang Y (2002) Modern control engineering, vol 4. Prentice Hall

  48. Papadaki KP, Powell WB (2003) An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem. Naval Res Logist 50(7):742–769

    Article  MathSciNet  MATH  Google Scholar 

  49. Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality, vol 703. John Wiley & Sons

  50. Proth JM, Sauer N, Wardi Y, Xie X (1996) Marking optimization of stochastic timed event graphs using ipa. Disc Event Dyn Syst 6(3):221–239

    Article  MATH  Google Scholar 

  51. Qiu R, Shang J (2014) Robust optimisation for risk-averse multi-period inventory decision with partial demand distribution information. Int J Prod Res 52(24):7472–7495

    Article  Google Scholar 

  52. Radhakrishnan P, Prasad V, Gopalan M (2009) Inventory optimization in supply chain management using genetic algorithm. Int J Comput Sci Netw Sec 9(1):33–40

    Google Scholar 

  53. Reddi SJ, Kale S, Kumar S (2019) On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237

  54. Ross S, Bagnell JA (2010) Efficient reductions for imitation learning. In: Proceedings of the international conference artificial intelligence and statistics (AISTATS)

  55. Schaal S (1999) Is imitation learning the route to humanoid robots? Trends Cogn Sci 3(6):233–242

    Article  Google Scholar 

  56. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897

  57. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

  58. See CT, Sim M (2010) Robust approximation to multiperiod inventory management. Oper Res 58(3):583–594

    Article  MathSciNet  MATH  Google Scholar 

  59. Shaabani H, Kamalabadi IN (2016) An efficient population-based simulated annealing algorithm for the multi-product multi-retailer perishable inventory routing problem. Comput Indus Eng 99:189–201

    Article  Google Scholar 

  60. Shah D (2020) The six aces to thrive in supply chain 4.0. https://www.tcs.com/blogs/six-aces-to-thrive-in-supply-chain-4-0

  61. Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295

  62. Shervais S (2000) Adaptive critic design of control policies for a multi-echelon inventory system. Ph.D. thesis, Portland State University

  63. Shervais S, Shannon TT, Lendaris GG (2003) Intelligent supply chain management using adaptive critic learning. IEEE Trans Syst Man Cybern Part A Syst Humans 33(2):235–244

    Article  Google Scholar 

  64. Si J, Barto AG, Powell WB, Wunsch D (2004) Handbook of learning and approximate dynamic programming, vol 2. John Wiley & Sons

  65. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484

    Article  Google Scholar 

  66. Silver EA (1979) A simple inventory replenishment decision rule for a linear trend in demand. J Oper Res Soc 30(1):71–75

    Article  MATH  Google Scholar 

  67. Silver EA (1981) Operations research in inventory management: a review and critique. Oper Res 29(4):628–645

    Article  MathSciNet  Google Scholar 

  68. Smith SA, Agrawal N (2000) Management of multi-item retail inventory systems with demand substitution. Oper Res 48(1):50–64

    Article  MathSciNet  MATH  Google Scholar 

  69. Song JS, van Houtum GJ, Van Mieghem JA (2020) Capacity and inventory management: Review, trends, and projections. Manuf Serv Oper Manage 22(1):36–46

    Article  Google Scholar 

  70. Tavakoli A, Pardo F, Kormushev P (2018) Action branching architectures for deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence

  71. Topaloglu H, Powell W (2006) Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS J Comput 18(1):31–42

    Article  MathSciNet  MATH  Google Scholar 

  72. Utkin V, Guldner J, Shi J (2009) Sliding mode control in electro-mechanical systems. CRC Press

  73. Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: AAAI, vol 2. Phoenix, AZ, p 5

  74. Van Roy B, Bertsekas DP, Lee Y, Tsitsiklis JN (1997) A neuro-dynamic programming approach to retailer inventory management. In: Proceedings of the 36th IEEE conference on decision and control, vol 4, pp 4052–4057. https://doi.org/10.1109/CDC.1997.652501

  75. Verma R, Saikia S, Khadilkar H, Agarwal P, Srinivasan A, Shroff G (2019) An RL framework for container selection and ship load sequencing in ports. In: International conference on autonomous agents and multi agent systems

  76. Visentin A, Prestwich S, Rossi R, Tarim SA (2021) Computing optimal (r, s, s) policy parameters by a hybrid of branch-and-bound and stochastic dynamic programming. Eur J Oper Res

  77. Yang L, Li H, Campbell JF, Sweeney DC (2017) Integrated multi-period dynamic inventory classification and control. Int J Prod Econ 189:86–96

    Article  Google Scholar 

  78. Zhang W, Dietterich T (1995) A reinforcement learning approach to job-shop scheduling. In: International joint conference on artificial intelligence. Montreal, Canada

  79. Zipkin P (2000) Foundations of inventory management. McGraw-Hill Companies, Incorporated. https://books.google.co.in/books?id=rjzbkQEACAAJ

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hardik Meisheri.

Ethics declarations

Conflicts of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meisheri, H., Sultana, N.N., Baranwal, M. et al. Scalable multi-product inventory control with lead time constraints using reinforcement learning. Neural Comput & Applic 34, 1735–1757 (2022). https://doi.org/10.1007/s00521-021-06129-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06129-w

Keywords

Navigation