Skip to main content
Log in

Asynchronous action-reward learning for nonstationary serial supply chain inventory control

  • Published:
Applied Intelligence Aims and scope Submit manuscript

An Erratum to this article was published on 29 July 2007

Abstract

Action-reward learning is a reinforcement learning method. In this machine learning approach, an agent interacts with non-deterministic control domain. The agent selects actions at decision epochs and the control domain gives rise to rewards with which the performance measures of the actions are updated. The objective of the agent is to select the future best actions based on the updated performance measures. In this paper, we develop an asynchronous action-reward learning model which updates the performance measures of actions faster than conventional action-reward learning. This learning model is suitable to apply to nonstationary control domain where the rewards for actions vary over time. Based on the asynchronous action-reward learning, two situation reactive inventory control models (centralized and decentralized models) are proposed for a two-stage serial supply chain with nonstationary customer demand. A simulation based experiment was performed to evaluate the performance of the proposed two models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Sutton RS, Barto AG (1998) Reinforcement learning. MIT Press

  2. Narendra KS, Thatcher MA (1989) Learning automata: an introduction. Prentice-Hall

  3. Lee HL, Padmanabhan V, Whang S (1997) The bullwhip effect in supply chains. Sloan Manag Rev 38(3):93–102

    Google Scholar 

  4. Graves S (1999) A single-item inventory model for a non-stationary demand process. Manufactur Service Oper Manag 1(1):50–61

    Article  Google Scholar 

  5. Sethi S, Cheng F (1997) Optimality of (s, S) policies in inventory models with Markovian demands. Oper Res 45(6):931–939

    MATH  MathSciNet  Google Scholar 

  6. Gavirneni S, Tayur S (2001) An efficient procedure for non-stationary inventory control. IIE Trans 33(2):83–89

    Article  Google Scholar 

  7. Cachon G, Zipkin PH (1999) Competitive and cooperative inventory policies in a two stage supply chain. Manag Sci 45(7):936–953

    Google Scholar 

  8. Waller M, Johnson M, Davis T (1999) Vendor-managed inventory in the retail supply chain. J Busin Logist 20(1):183–203

    Google Scholar 

  9. Achabal DD, McIntyre SH, Smith SA, Kalyanam K (2000) A decision support system for vendor managed inventory. J Retail 76(4):430–454

    Article  Google Scholar 

  10. Zhao X, Xie J, Lau R (2001) Improving the supply chain performance: use of forecasting models versus early order commitments. Int J Prod Res 39(17):3923–3939

    Article  MATH  Google Scholar 

  11. Kim CO, Jun J, Baek JK, Smith RL, Kim YD (2005) Adaptive inventory control models for supply chain management. Int J Adv Manufactur Technol 26(9–10):1184–1192

    Article  Google Scholar 

  12. Kaipia R, Holmstrom J, Tanskanen K (2002) VMI: what are you loosing if you let your customer place orders? Prod Plan Contr 13(1):17–25

    Article  Google Scholar 

  13. Chaudhury A, Whinston AB (1990) Towards an adaptive Kanban system. Int J Prod Res 28(3):437–458

    Article  Google Scholar 

  14. Takahashi K, Nakamura N (1999) Reacting JIT ordering systems to the unstable changes in demand. Int J Prod Res 37(10):2293–2313

    Article  MATH  Google Scholar 

  15. Takahashi K (2003) Comparing reactive Kanban systems. Int J Prod Res 41(18):4317–4337

    Article  Google Scholar 

  16. Quintana R, Lambert BK, Roderick L (1997) Adaptive pull-type production control using Kalman Filters. Int J Prod Res 35(10):2689–2699

    Article  MATH  Google Scholar 

  17. Min HS, Yih Y, Kim CO (1998) A competitive neural network approach to multi-objective FMS scheduling. Int J Prod Res 36(7):1749–1765

    Article  MATH  Google Scholar 

  18. Azri Y, Iaroslavitz L (1999) Neural network-based adaptive production control system for a flexible manufacturing cell under a random environment. IIE Trans 31(3):217–230

    Google Scholar 

  19. Min HS, Yih Y (2003) Selection of dispatching rules on multiple dispatching decision points in real-time scheduling of a semiconductor wafer fabrication system. Int J Prod Res 41(16):3921–3941

    Article  MATH  Google Scholar 

  20. Kim CO, Min HS, Yih Y (1998) Integration of inductive learning and neural network for multi-objective FMS scheduling. Int J Prod Res 36(9):2497–2509

    Article  MATH  Google Scholar 

  21. Li DC, Chen LS, Lin YS (2003) Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. Int J Prod Res 41(17):4011–4024

    Article  Google Scholar 

  22. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8:279–292

    MATH  Google Scholar 

  23. Sutton RS (1988) Learning to predict by the method of temporal difference. Mach Learn 3:9–44

    Google Scholar 

  24. Moore AW, Atkeson CG (1993) Prioritized sweeping: reinforcement learning with less data and less real time. Mach Learn 13:103–130

    Google Scholar 

  25. Sutton RS (1991) Dyna, and integrated architecture for learning, planning, and reacting. ACM SIGART Bull 2(4):160–163

    Article  Google Scholar 

  26. Simchi-Levi D, Kaminsky P, Simchi-Levi E (2003) Designing and managing the supply chain: concepts, strategies, and case studies. McGraw-Hill

  27. Axsäter S (2001) A framework for decentralized multi-echelon inventory control. IIE Trans 33(2):91–97

    Google Scholar 

  28. Brown RG (1959) Statistical forecasting for inventory control. McGraw-Hill

  29. Zipkin PH (2000) Foundations of inventory management. McGraw-Hill

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chang Ouk Kim.

Additional information

Chang Ouk Kim received his Ph.D. in industrial engineering from Purdue University in 1996 and his B.S. and M.S. degrees from Korea University, Republic of Korea in 1988 and 1990, respectively. From 1998--2001, he was an assistant professor in the Department of Industrial Systems Engineering at Myongji University, Republic of Korea. In 2002, he joined the Department of Information and Industrial Engineering at Yonsei University, Republic of Korea and is now an associate professor. He has published more than 30 articles at international journals. He is currently working on applications of artificial intelligence and adaptive control theory in supply chain management, RFID based logistics information system design, and advanced process control in semiconductor manufacturing.

Ick-Hyun Kwon is a postdoctoral researcher in the Department of Civil and Environmental Engineering at University of Illinois at Urbana-Champaign. Previous to this position, Dr. Kwon was a research assistant professor in the Research Institute for Information and Communication Technology at Korea University, Seoul, Republic of Korea. He received his B.S., M.S., and Ph.D. degrees in Industrial Engineering from Korea University, in 1998, 2000, and 2006, respectively. His current research interests are supply chain management, inventory control, production planning and scheduling.

Jun-Geol Baek is an assistant professor in the Department of Business Administration at Kwangwoon University, Seoul, Korea. He received his B.S., M.S., and Ph.D. degrees in Industrial Engineering from Korea University, Seoul, Korea, in 1993, 1995, and 2001 respectively. From March 2002 to February 2007, he was an assistant professor in the Department of Industrial Systems Engineering at Induk Institute of Technology, Seoul, Korea. His research interests include machine learning, data mining, intelligent machine diagnosis, and ubiquitous logistics information systems.

An erratum to this article can be found at http://dx.doi.org/10.1007/s10489-007-0087-6

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, C.O., Kwon, IH. & Baek, JG. Asynchronous action-reward learning for nonstationary serial supply chain inventory control. Appl Intell 28, 1–16 (2008). https://doi.org/10.1007/s10489-007-0038-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-007-0038-2

Keywords

Navigation