Scalable multi-product inventory control with lead time constraints using reinforcement learning

Meisheri, Hardik; Sultana, Nazneen N.; Baranwal, Mayank; Baniwal, Vinita; Nath, Somjit; Verma, Satyam; Ravindran, Balaraman; Khadilkar, Harshad

doi:10.1007/s00521-021-06129-w

Scalable multi-product inventory control with lead time constraints using reinforcement learning

S.I. : Adaptive and Learning Agents 2020
Published: 28 May 2021

Volume 34, pages 1735–1757, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Hardik Meisheri ORCID: orcid.org/0000-0002-9014-1098¹^na1,
Nazneen N. Sultana¹^na1,
Mayank Baranwal¹,
Vinita Baniwal¹,
Somjit Nath¹,
Satyam Verma²,
Balaraman Ravindran³ &
…
Harshad Khadilkar¹

2334 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Determining optimum inventory replenishment decisions are critical for retail businesses with uncertain demand. The problem becomes particularly challenging when multiple products with different lead times and cross-product constraints are considered. This paper addresses the aforementioned challenges in multi-product, multi-period inventory management using deep reinforcement learning (deep RL). The proposed approach improves upon existing methods for inventory control on three fronts: (1) concurrent inventory management of a large number (hundreds) of products under realistic constraints, (2) minimal retraining requirements on the RL agent under system changes through the definition of an individual product meta-model, (3) efficient handling of multi-period constraints that stem from different lead times of different products. We approach the inventory problem as a special class of dynamical system control, and explain why the generic problem cannot be satisfactorily solved using classical optimisation techniques. Subsequently, we formulate the problem in a general framework that can be used for parallelised decision-making using off-the-shelf RL algorithms. We also benchmark the formulation against the theoretical optimum achieved by linear programming under the assumptions that the demands are deterministic and known apriori. Experiments on scales between 100 and 220 products show that the proposed RL-based approaches perform better than the baseline heuristics, and quite close to the theoretical optimum. Furthermore, they are also able to transfer learning without retraining to inventory control problems involving different number of products.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities

Article Open access 25 July 2020

Applications of Artificial Intelligence in Inventory Management: A Systematic Review of the Literature

Article 07 February 2023

Optimization of Inventory Management: A Literature Review

Notes

For products with fixed expiry dates, wastage happens at discrete time instants t with \(a_i=0\). The quantity of wastage is known in advance, and can be absorbed in \({\mathbf{u}}(t)\).

References

Abdelmaguid TF, Dessouky MM (2006) A genetic algorithm approach to the integrated inventory-distribution problem. Int J Prod Res 44(21):4445–4464
Article MATH Google Scholar
Aharon BT, Boaz G, Shimrit S (2009) Robust multi-echelon multi-period inventory control. Eur J Oper Res 199(3):922–935
Article MATH Google Scholar
Akbari AA, Karimi B (2015) A new robust optimization approach for integrated multi-echelon, multi-product, multi-period supply chain network design under process uncertainty. Int J Adv Manuf Technol 79(1–4):229–244
Article Google Scholar
Åström KJ, Wittenmark B (2013) Adaptive control. Courier Corporation
Baniwal V, Kayal C, Shah D, Ma P, Khadilkar H (2019) An imitation learning approach for computing anticipatory picking decisions in retail distribution centres. In: 2019 American control conference (ACC). IEEE, pp 4186–4191
Barat S, Khadilkar H, Meisheri H, Kulkarni V, Baniwal V, Kumar P, Gajrani M (2019) Actor based simulation for closed loop control of supply chain using reinforcement learning. In: Proceedings of the 18th international conference on autonomous agents and multiAgent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 1802–1804
Barlas Y, Gunduz B (2011) Demand forecasting and sharing strategies to reduce fluctuations and the bullwhip effect in supply chains. J Oper Res Soc 62(3):458–473
Article Google Scholar
Bertsekas DP (2005) Dynamic programming and optimal control, chap 6, vol 1. Athena scientific Belmont, MA
Google Scholar
Bertsekas DP, Tsitsiklis JN (1995) Neuro-dynamic programming: an overview. In: Proceedings of 1995 34th IEEE conference on decision and control, vol. 1. IEEE, pp 560–564
Bertsimas D, Georghiou A (2015) Design of near optimal decision rules in multistage adaptive mixed-integer optimization. Oper Res 63(3):610–627
Article MathSciNet MATH Google Scholar
Bertsimas D, Thiele A (2004) A robust optimization approach to supply chain management. In: International conference on integer programming and combinatorial optimization. Springer, pp 86–100
Bertsimas D, Thiele A (2006) A robust optimization approach to inventory theory. Oper Res 54(1):150–168
Article MathSciNet MATH Google Scholar
Bouabdallah S, Noth A, Siegwart R (2004) Pid vs lq control techniques applied to an indoor micro quadrotor. In: Proceedings of The IEEE international conference on intelligent robots and systems (IROS). IEEE, pp 2451–2456
Cachon G, Fisher M (1997) Campbell soup’s continuous replenishment program: evaluation and enhanced inventory decision rules. Prod Oper Manage 6(3):266–276
Camacho EF, Alba CB (2013) Model predictive control. Springer Science & Business Media
Carbonneau R, Laframboise K, Vahidov R (2008) Application of machine learning techniques for supply chain demand forecasting. Eur J Oper Res 184(3):1140–1154
Article MATH Google Scholar
Cárdenas-Barrón LE, Treviño-Garza G (2014) An optimal solution to a three echelon supply chain network with multi-product and multi-period. Appl Math Modell 38(5–6):1911–1918
Article MathSciNet MATH Google Scholar
Caro F, Gallien J (2010) Inventory management of a fast-fashion retail network. Oper Res 58(2):257–273
Article MathSciNet MATH Google Scholar
Clark AJ, Scarf H (1960) Optimal policies for a multi-echelon inventory problem. Manage Sci 6(4):475–490
Article Google Scholar
Coelho LC, Laporte G (2014) Optimal joint replenishment, delivery and inventory management policies for perishable products. Comput Oper Res 47:42–52
Article MathSciNet MATH Google Scholar
Condea C, Thiesse F, Fleisch E (2012) Rfid-enabled shelf replenishment with backroom monitoring in retail stores. Decis Supp Syst 52(4):839–849
Article Google Scholar
Doyle JC, Glover K, Khargonekar PP, Francis BA (1989) State-space solutions to standard h/sub 2/and h/sub infinity/control problems. IEEE Trans Autom Control 34(8):831–847
Article MATH Google Scholar
Duan Y, Andrychowicz M, Stadie B, Ho J, Schneider J, Sutskever I, Abbeel P, Zaremba W (2017) One-shot imitation learning. In: NIPS 31
Fernie J, Sparks L (2018) Logistics and retail management: emerging issues and new challenges in the retail supply chain. Kogan page publishers
Giannoccaro I, Pontrandolfo P (2002) Inventory management in supply chains: a reinforcement learning approach. Int J Prod Econ 78(2):153–161
Article Google Scholar
Godfrey GA, Powell WB (2002) An adaptive dynamic programming algorithm for dynamic fleet management, I: single period travel times. Transp Sci 36(1):21–39
Article MATH Google Scholar
Golnaraghi MF, Kuo BC (2017) Automatic control systems. McGraw-Hill Education
Gustafsson TK, Waller KV (1983) Dynamic modeling and reaction invariant control of ph. Chem Eng Sci 38(3):389–398
Article Google Scholar
Harifi S, Khalilian M, Mohammadzadeh J, Ebrahimnejad S (2020) Optimization in solving inventory control problem using nature inspired emperor penguins colony algorithm. J Intell Manuf 1–15
Harmer J, Gisslen L, del Val J, Holst H, Bergdahl J, Olsson T, Sjoo K, Nordin M (2018) Imitation learning with concurrent actions in 3d games. arXiv preprint arXiv:1803.05402
Hofmann E, Rutschmann E (2018) Big data analytics and demand forecasting in supply chains: a conceptual analysis. Int J Logist Manage
Ioannou PA, Sun J (1996) Robust adaptive control, vol 1. PTR Prentice-Hall Upper Saddle River, NJ
MATH Google Scholar
Jiang C, Sheng Z (2009) Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Exp Syst Appl 36(3):6520–6526
Article Google Scholar
Kaggle: Instacart market basket analysis data. https://www.kaggle.com/c/instacart-market-basket-analysis/data. Accessed Aug 2018
Kara A, Dogan I (2018) Reinforcement learning approaches for specifying ordering policies of perishable inventory systems. Exp Syst Appl 91:150–158
Article Google Scholar
Khadilkar H (2019) A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans Intell Transp Syst 20(2):727–736
Article Google Scholar
Konda V, Tsitsiklis J (2000) Actor-critic algorithms. In: Advances in neural information processing systems, pp 1008–1014
Kushner HJ, Clark DS (2012) Stochastic approximation methods for constrained and unconstrained systems, vol 26. Springer Science & Business Media
Lee H, Pinto JM, Grossmann IE, Park S (1996) Mixed-integer linear programming model for refinery short-term scheduling of crude oil unloading with inventory management. Indus Eng Chem Res 35(5):1630–1641
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. CoRR arxiv:abs/1509.02971
Liu XJ, Chan C (2006) Neuro-fuzzy generalized predictive control of boiler steam temperature. IEEE Trans Energy Con 21(4):900–908
Article MathSciNet Google Scholar
Mayne DQ, Rawlings JB, Rao CV, Scokaert PO (2000) Constrained model predictive control: stability and optimality. Automatica 36(6):789–814
Article MathSciNet MATH Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep RL. Nature 518(7540):529
Article Google Scholar
Mousavi SM, Hajipour V, Niaki STA, Aalikar N (2014) A multi-product multi-period inventory control problem under inflation and discount: a parameter-tuned particle swarm optimization algorithm. Int J Adv Manuf Technol 70(9–12):1739–1756
Article Google Scholar
Nahmias S, Smith SA (1994) Optimizing inventory levels in a two-echelon retailer system with partial lost sales. Manage Sci 40(5):582–596
Article MATH Google Scholar
Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Experimental robotics IX. Springer, pp 363–372
Ogata K, Yang Y (2002) Modern control engineering, vol 4. Prentice Hall
Papadaki KP, Powell WB (2003) An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem. Naval Res Logist 50(7):742–769
Article MathSciNet MATH Google Scholar
Powell WB (2007) Approximate dynamic programming: solving the curses of dimensionality, vol 703. John Wiley & Sons
Proth JM, Sauer N, Wardi Y, Xie X (1996) Marking optimization of stochastic timed event graphs using ipa. Disc Event Dyn Syst 6(3):221–239
Article MATH Google Scholar
Qiu R, Shang J (2014) Robust optimisation for risk-averse multi-period inventory decision with partial demand distribution information. Int J Prod Res 52(24):7472–7495
Article Google Scholar
Radhakrishnan P, Prasad V, Gopalan M (2009) Inventory optimization in supply chain management using genetic algorithm. Int J Comput Sci Netw Sec 9(1):33–40
Google Scholar
Reddi SJ, Kale S, Kumar S (2019) On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237
Ross S, Bagnell JA (2010) Efficient reductions for imitation learning. In: Proceedings of the international conference artificial intelligence and statistics (AISTATS)
Schaal S (1999) Is imitation learning the route to humanoid robots? Trends Cogn Sci 3(6):233–242
Article Google Scholar
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, pp 1889–1897
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
See CT, Sim M (2010) Robust approximation to multiperiod inventory management. Oper Res 58(3):583–594
Article MathSciNet MATH Google Scholar
Shaabani H, Kamalabadi IN (2016) An efficient population-based simulated annealing algorithm for the multi-product multi-retailer perishable inventory routing problem. Comput Indus Eng 99:189–201
Article Google Scholar
Shah D (2020) The six aces to thrive in supply chain 4.0. https://www.tcs.com/blogs/six-aces-to-thrive-in-supply-chain-4-0
Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295
Shervais S (2000) Adaptive critic design of control policies for a multi-echelon inventory system. Ph.D. thesis, Portland State University
Shervais S, Shannon TT, Lendaris GG (2003) Intelligent supply chain management using adaptive critic learning. IEEE Trans Syst Man Cybern Part A Syst Humans 33(2):235–244
Article Google Scholar
Si J, Barto AG, Powell WB, Wunsch D (2004) Handbook of learning and approximate dynamic programming, vol 2. John Wiley & Sons
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Article Google Scholar
Silver EA (1979) A simple inventory replenishment decision rule for a linear trend in demand. J Oper Res Soc 30(1):71–75
Article MATH Google Scholar
Silver EA (1981) Operations research in inventory management: a review and critique. Oper Res 29(4):628–645
Article MathSciNet Google Scholar
Smith SA, Agrawal N (2000) Management of multi-item retail inventory systems with demand substitution. Oper Res 48(1):50–64
Article MathSciNet MATH Google Scholar
Song JS, van Houtum GJ, Van Mieghem JA (2020) Capacity and inventory management: Review, trends, and projections. Manuf Serv Oper Manage 22(1):36–46
Article Google Scholar
Tavakoli A, Pardo F, Kormushev P (2018) Action branching architectures for deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Topaloglu H, Powell W (2006) Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS J Comput 18(1):31–42
Article MathSciNet MATH Google Scholar
Utkin V, Guldner J, Shi J (2009) Sliding mode control in electro-mechanical systems. CRC Press
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: AAAI, vol 2. Phoenix, AZ, p 5
Van Roy B, Bertsekas DP, Lee Y, Tsitsiklis JN (1997) A neuro-dynamic programming approach to retailer inventory management. In: Proceedings of the 36th IEEE conference on decision and control, vol 4, pp 4052–4057. https://doi.org/10.1109/CDC.1997.652501
Verma R, Saikia S, Khadilkar H, Agarwal P, Srinivasan A, Shroff G (2019) An RL framework for container selection and ship load sequencing in ports. In: International conference on autonomous agents and multi agent systems
Visentin A, Prestwich S, Rossi R, Tarim SA (2021) Computing optimal (r, s, s) policy parameters by a hybrid of branch-and-bound and stochastic dynamic programming. Eur J Oper Res
Yang L, Li H, Campbell JF, Sweeney DC (2017) Integrated multi-period dynamic inventory classification and control. Int J Prod Econ 189:86–96
Article Google Scholar
Zhang W, Dietterich T (1995) A reinforcement learning approach to job-shop scheduling. In: International joint conference on artificial intelligence. Montreal, Canada
Zipkin P (2000) Foundations of inventory management. McGraw-Hill Companies, Incorporated. https://books.google.co.in/books?id=rjzbkQEACAAJ

Download references

Funding

None.

Author information

Hardik Meisheri and Nazneen N. Sultana contributed equally to this study.

Authors and Affiliations

TCS Research, Mumbai, India
Hardik Meisheri, Nazneen N. Sultana, Mayank Baranwal, Vinita Baniwal, Somjit Nath & Harshad Khadilkar
IIT Bombay, Mumbai, India
Satyam Verma
Robert Bosch Centre for Data Science and AI, IIT Madras, Chennai, India
Balaraman Ravindran

Authors

Hardik Meisheri
View author publications
You can also search for this author in PubMed Google Scholar
Nazneen N. Sultana
View author publications
You can also search for this author in PubMed Google Scholar
Mayank Baranwal
View author publications
You can also search for this author in PubMed Google Scholar
Vinita Baniwal
View author publications
You can also search for this author in PubMed Google Scholar
Somjit Nath
View author publications
You can also search for this author in PubMed Google Scholar
Satyam Verma
View author publications
You can also search for this author in PubMed Google Scholar
Balaraman Ravindran
View author publications
You can also search for this author in PubMed Google Scholar
Harshad Khadilkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hardik Meisheri.

Ethics declarations

Conflicts of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meisheri, H., Sultana, N.N., Baranwal, M. et al. Scalable multi-product inventory control with lead time constraints using reinforcement learning. Neural Comput & Applic 34, 1735–1757 (2022). https://doi.org/10.1007/s00521-021-06129-w

Download citation

Received: 16 November 2020
Accepted: 15 May 2021
Published: 28 May 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s00521-021-06129-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable multi-product inventory control with lead time constraints using reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities

Applications of Artificial Intelligence in Inventory Management: A Systematic Review of the Literature

Optimization of Inventory Management: A Literature Review

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable multi-product inventory control with lead time constraints using reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities

Applications of Artificial Intelligence in Inventory Management: A Systematic Review of the Literature

Optimization of Inventory Management: A Literature Review

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation