Skip to main content
Log in

Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper presents an approach that is suitable for Just-In-Time (JIT) production for multi-objective scheduling problem in dynamically changing shop floor environment. The proposed distributed learning and control (DLC) approach integrates part-driven distributed arrival time control (DATC) and machine-driven distributed reinforcement learning based control. With DATC, part controllers adjust their associated parts' arrival time to minimize due-date deviation. Within the restricted pattern of arrivals, machine controllers are concurrently searching for optimal dispatching policies. The machine control problem is modeled as Semi Markov Decision Process (SMDP) and solved using Q-learning. The DLC algorithms are evaluated using simulation for two types of manufacturing systems: family scheduling and dynamic batch sizing. Results show that DLC algorithms achieve significant performance improvement over usual dispatching rules in complex real-time shop floor control problems for JIT production.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. U. Bagchi, R. Sullivan, and Y. Chang, “Minimizing mean squared due-date deviation of completion times about a common due date,” Management Science, vol. 33, pp. 894–906, 1987.

    Google Scholar 

  2. K.R. Baker and G.D. Scudder, “Sequencing with earliness and tardiness penalties: A review,” Operations Research, vol. 38. no. 1, pp. 22–36, 1990.

    Google Scholar 

  3. M. Azizoglu and S. Webster, “Scheduling job families about an unrestricted common due date on a single machine,” International Journal of Production Research, vol. 35, no. 5, pp. 132–1330, 1997.

    Google Scholar 

  4. Y. Yih and A. Thesen, “Semi-Markov decision models for realtime scheduling,” International Journal of Production Research, vol.29, no. 11, pp. 2331–2346, 1991.

    Google Scholar 

  5. L. Tang, Y. Yih, and C. Liu, “A study on decision rules of a scheduling model in FMS,” Computers in Industry, vol. 22. pp. 1–13, 1993.

    Google Scholar 

  6. L.C. Rabelo, A. Jones, and Y. Yih, “Development of a realtime learning scheduler using reinforcement learning concept,” in IEEE International Symposium on Intelligent Control, Columbus, Ohio, 1994, pp. 291-296.

    Google Scholar 

  7. G.H. Kim and C.S.G. Lee, “Genetic reinforcement learning for scheduling heterogeneous machines,” in Proceedings of the 1996 IEEE International Conference on Robotics and Automation, 1996.

  8. S.C. Park, N. Raman, and M.J. Shaw, “Adaptive scheduling in dynamic flexible manufacturing systems: A dynamic rule selection approach,” IEEE Transactions on Robotics and Automation, vol. 13, no. 4, pp. 486–502, 1997.

    Google Scholar 

  9. W. Brauer and G. Weiss, “Multi-machine scheduling-A multi-agent learning approach,” in Proceedings International Conference on Multi Agent Systems, pp. 42–48, 1998.

  10. R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. MIT Press: Cambridge, MA, 1998.

    Google Scholar 

  11. G. Tesauro, “Practical issues in temporal difference learning,” Machine Learning, vol. 8, pp. 257–277, 1992.

    Google Scholar 

  12. L. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching,” Machine Learning, vol. 8, pp. 293–321, 1992.

    Google Scholar 

  13. W. Zhang and T.G. Dietterich, “High performance job-shop scheduling with a time-delay TD(?) network,” in Advances in Neural Information Processing Systems, edited by D.S. Thouretzky, M.C. Mozer, and M.E. Hasselmo, pp. 1024–1030 MIT Press: Cambridge MA, 1996.

    Google Scholar 

  14. S. Mahadevan, “Average reward reinforcement learning: Foundations, algorithms, and empirical results,” Machine Learning, vol. 22, pp. 159–195, 1996.

    Google Scholar 

  15. S. Mahadevan, N. Marchalleck, T.K. Das, and A. Gosavi, “Selfimproving factory simulation using continuous-time average reward reinforcement learning,” in Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, July, 1997, pp. 202–210.

  16. G. Wang and S. Mahadevan, “A greedy divide-and-conquer approach to optimizing large manufacturing systems using reinforcement learning,” in NIPS '98 Workshop on Abstraction and Hierarchy in Reinforcement Learning, Dec., 1998.

  17. P. McDonnell, Resource Reconfiguration Decisions for Distributed Manufacturing Systems: A Game Theoretic Approach, Ph.D. Thesis, Industrial Engineering, Pennsylvania State University, 1998.

  18. J. Hong and V.V. Prabhu, “Distributed learning and control for manufacturing systems scheduling,” in Fourteenth International Conference on Industrial and Engineering Applications of Artifi-cial Intelligence and Expert Systems (IEA/AIE- 2001 ), Budapest, Hungary, 4-7, June.

  19. V.V. Prabhu and N.A. Duffie, “Distributed simulation approach for enabling cooperation between entities in heterarchical manufacturing systems, modeling, simulation, and control technologies for manufacturing,” in SIPE Proceedings, vol. 2596, pp. 234–242, 1995.

    Google Scholar 

  20. J. Hong and V.V. Prabhu, “Modeling and performance of distributed algorithm for scheduling dissimilar machines with setup,” to appear in International Journal of Production Research, 2003.

  21. C.L. Monma and C. Potts, “On the complexity of scheduling with batch setup times,” Operations Research, vol. 37, pp. 798–804, 1989.

    Google Scholar 

  22. S. Webster and K.R. Baker, “Scheduling groups of jobs on a single machine,” Operations Research, vol. 43, no. 4, pp. 692–703, 1995.

    Google Scholar 

  23. Z. Chen, “Scheduling with batch setup times and earliness tardiness penalties,” European Journal of Operations Research vol. 96, pp. 518–537, 1997.

    Google Scholar 

  24. N. Balakrishnan, J.J. Kanet, and S.V. Sridharan, “Early/tardy scheduling with sequence dependent setups on uniform parallel machines,” Computers and Operations Research, vol. 26, pp. 127–141, 1999.

    Google Scholar 

  25. J.H. Wang, P.B. Luh, J.L. Wang, and R.N. Thomas, “Near optimal scheduling of manufacturing systems with presence of batch machines and setup requirements,” CIRP Annals, vol. 46, no. 1, pp. 397–402, 1997.

    Google Scholar 

  26. S.A. Banawan and J. Zahorjan, “Load sharing in heterogeneous queuing systems,” in Proceedings of the Eighth Annual Joint Conference of the IEEE Computer and Communication Societies, vol. 2, 1989, pp. 731–739.

    Google Scholar 

  27. C. Berenguer, C. Chu, and A. Grall, “Inspection and maintenance planning: An application of semi-Markov decision processes,” Journal of Intelligent Manufacturing, vol. 8, pp. 467–476, 1997.

    Google Scholar 

  28. G. Wang and S. Mahadevan, “Hierarchical optimization of policy-coupled semi-Markov decision processes,” in International Conference on Machine Learning, 1999.

  29. C.J.C.H. Waktins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge, England, 1989.

    Google Scholar 

  30. C.J.C.H.Waktins and P. Dayan, “Q-Learning,” Machine Learning, vol. 8, pp. 279–292, 1992.

    Google Scholar 

  31. J.N. Tsitsiklis, “Asynchronous stochastic approximation and Q-learning,” Machine Learning, vol. 16, pp. 185–202, 1994.

    Google Scholar 

  32. A.G. Barto, S.J. Bradtke, and S.P. Singh, “Learning to act using Real-time dynamic programming,” Artificial Intelligence, vol. 72, pp. 81–138, 1995.

    Google Scholar 

  33. V.V. Prabhu, Real-Time Distributed Arrival Time Control of Heterarchical Manufacturing Systems, Ph.D. Thesis, Mechanical Engineering, University of Wisconsin-Madison, 1995.

  34. G. Kudva, A. Elkamel, J.E. Penky, and G.V. Reklaitis, “Heuristic algorithm for scheduling batch and semi-continuous plants with production deadlines, intermediate storage limitations and equipment changeover costs,” Computers in Chemical Engineering, vol. 18, no. 9, pp. 859–875, 1994.

    Google Scholar 

  35. P. Stone and M. Veloso, “Multiagent systems: A survey from a machine learning perspective,” IEEE Transactions on Knowledge and Data Engineering, June 1996.

  36. D.P. Bertsekas and J.N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific: Belmont, MA, 1996.

    Google Scholar 

  37. V.V. Prabhu, “Lyapunov stability of distributed control in multiple machine heterarchical manufacturing cells,” in Proceedings of XXVI North American Manufacturing Research Conference, Atlanta, GA, May 1998, pp. 311–316.

  38. V.V. Prabhu, “Stability and fault adaptation in distributed control of heterarchical manufacturing job shops,” IEEE Transactions on Robotics and Automation, 2002, vol. 19, no. 1, pp. 142–147, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, J., Prabhu, V.V. Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems. Applied Intelligence 20, 71–87 (2004). https://doi.org/10.1023/B:APIN.0000011143.95085.74

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:APIN.0000011143.95085.74

Navigation