Abstract
This paper presents an approach that is suitable for Just-In-Time (JIT) production for multi-objective scheduling problem in dynamically changing shop floor environment. The proposed distributed learning and control (DLC) approach integrates part-driven distributed arrival time control (DATC) and machine-driven distributed reinforcement learning based control. With DATC, part controllers adjust their associated parts' arrival time to minimize due-date deviation. Within the restricted pattern of arrivals, machine controllers are concurrently searching for optimal dispatching policies. The machine control problem is modeled as Semi Markov Decision Process (SMDP) and solved using Q-learning. The DLC algorithms are evaluated using simulation for two types of manufacturing systems: family scheduling and dynamic batch sizing. Results show that DLC algorithms achieve significant performance improvement over usual dispatching rules in complex real-time shop floor control problems for JIT production.
Similar content being viewed by others
References
U. Bagchi, R. Sullivan, and Y. Chang, “Minimizing mean squared due-date deviation of completion times about a common due date,” Management Science, vol. 33, pp. 894–906, 1987.
K.R. Baker and G.D. Scudder, “Sequencing with earliness and tardiness penalties: A review,” Operations Research, vol. 38. no. 1, pp. 22–36, 1990.
M. Azizoglu and S. Webster, “Scheduling job families about an unrestricted common due date on a single machine,” International Journal of Production Research, vol. 35, no. 5, pp. 132–1330, 1997.
Y. Yih and A. Thesen, “Semi-Markov decision models for realtime scheduling,” International Journal of Production Research, vol.29, no. 11, pp. 2331–2346, 1991.
L. Tang, Y. Yih, and C. Liu, “A study on decision rules of a scheduling model in FMS,” Computers in Industry, vol. 22. pp. 1–13, 1993.
L.C. Rabelo, A. Jones, and Y. Yih, “Development of a realtime learning scheduler using reinforcement learning concept,” in IEEE International Symposium on Intelligent Control, Columbus, Ohio, 1994, pp. 291-296.
G.H. Kim and C.S.G. Lee, “Genetic reinforcement learning for scheduling heterogeneous machines,” in Proceedings of the 1996 IEEE International Conference on Robotics and Automation, 1996.
S.C. Park, N. Raman, and M.J. Shaw, “Adaptive scheduling in dynamic flexible manufacturing systems: A dynamic rule selection approach,” IEEE Transactions on Robotics and Automation, vol. 13, no. 4, pp. 486–502, 1997.
W. Brauer and G. Weiss, “Multi-machine scheduling-A multi-agent learning approach,” in Proceedings International Conference on Multi Agent Systems, pp. 42–48, 1998.
R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. MIT Press: Cambridge, MA, 1998.
G. Tesauro, “Practical issues in temporal difference learning,” Machine Learning, vol. 8, pp. 257–277, 1992.
L. Lin, “Self-improving reactive agents based on reinforcement learning, planning and teaching,” Machine Learning, vol. 8, pp. 293–321, 1992.
W. Zhang and T.G. Dietterich, “High performance job-shop scheduling with a time-delay TD(?) network,” in Advances in Neural Information Processing Systems, edited by D.S. Thouretzky, M.C. Mozer, and M.E. Hasselmo, pp. 1024–1030 MIT Press: Cambridge MA, 1996.
S. Mahadevan, “Average reward reinforcement learning: Foundations, algorithms, and empirical results,” Machine Learning, vol. 22, pp. 159–195, 1996.
S. Mahadevan, N. Marchalleck, T.K. Das, and A. Gosavi, “Selfimproving factory simulation using continuous-time average reward reinforcement learning,” in Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, July, 1997, pp. 202–210.
G. Wang and S. Mahadevan, “A greedy divide-and-conquer approach to optimizing large manufacturing systems using reinforcement learning,” in NIPS '98 Workshop on Abstraction and Hierarchy in Reinforcement Learning, Dec., 1998.
P. McDonnell, Resource Reconfiguration Decisions for Distributed Manufacturing Systems: A Game Theoretic Approach, Ph.D. Thesis, Industrial Engineering, Pennsylvania State University, 1998.
J. Hong and V.V. Prabhu, “Distributed learning and control for manufacturing systems scheduling,” in Fourteenth International Conference on Industrial and Engineering Applications of Artifi-cial Intelligence and Expert Systems (IEA/AIE- 2001 ), Budapest, Hungary, 4-7, June.
V.V. Prabhu and N.A. Duffie, “Distributed simulation approach for enabling cooperation between entities in heterarchical manufacturing systems, modeling, simulation, and control technologies for manufacturing,” in SIPE Proceedings, vol. 2596, pp. 234–242, 1995.
J. Hong and V.V. Prabhu, “Modeling and performance of distributed algorithm for scheduling dissimilar machines with setup,” to appear in International Journal of Production Research, 2003.
C.L. Monma and C. Potts, “On the complexity of scheduling with batch setup times,” Operations Research, vol. 37, pp. 798–804, 1989.
S. Webster and K.R. Baker, “Scheduling groups of jobs on a single machine,” Operations Research, vol. 43, no. 4, pp. 692–703, 1995.
Z. Chen, “Scheduling with batch setup times and earliness tardiness penalties,” European Journal of Operations Research vol. 96, pp. 518–537, 1997.
N. Balakrishnan, J.J. Kanet, and S.V. Sridharan, “Early/tardy scheduling with sequence dependent setups on uniform parallel machines,” Computers and Operations Research, vol. 26, pp. 127–141, 1999.
J.H. Wang, P.B. Luh, J.L. Wang, and R.N. Thomas, “Near optimal scheduling of manufacturing systems with presence of batch machines and setup requirements,” CIRP Annals, vol. 46, no. 1, pp. 397–402, 1997.
S.A. Banawan and J. Zahorjan, “Load sharing in heterogeneous queuing systems,” in Proceedings of the Eighth Annual Joint Conference of the IEEE Computer and Communication Societies, vol. 2, 1989, pp. 731–739.
C. Berenguer, C. Chu, and A. Grall, “Inspection and maintenance planning: An application of semi-Markov decision processes,” Journal of Intelligent Manufacturing, vol. 8, pp. 467–476, 1997.
G. Wang and S. Mahadevan, “Hierarchical optimization of policy-coupled semi-Markov decision processes,” in International Conference on Machine Learning, 1999.
C.J.C.H. Waktins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge, England, 1989.
C.J.C.H.Waktins and P. Dayan, “Q-Learning,” Machine Learning, vol. 8, pp. 279–292, 1992.
J.N. Tsitsiklis, “Asynchronous stochastic approximation and Q-learning,” Machine Learning, vol. 16, pp. 185–202, 1994.
A.G. Barto, S.J. Bradtke, and S.P. Singh, “Learning to act using Real-time dynamic programming,” Artificial Intelligence, vol. 72, pp. 81–138, 1995.
V.V. Prabhu, Real-Time Distributed Arrival Time Control of Heterarchical Manufacturing Systems, Ph.D. Thesis, Mechanical Engineering, University of Wisconsin-Madison, 1995.
G. Kudva, A. Elkamel, J.E. Penky, and G.V. Reklaitis, “Heuristic algorithm for scheduling batch and semi-continuous plants with production deadlines, intermediate storage limitations and equipment changeover costs,” Computers in Chemical Engineering, vol. 18, no. 9, pp. 859–875, 1994.
P. Stone and M. Veloso, “Multiagent systems: A survey from a machine learning perspective,” IEEE Transactions on Knowledge and Data Engineering, June 1996.
D.P. Bertsekas and J.N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific: Belmont, MA, 1996.
V.V. Prabhu, “Lyapunov stability of distributed control in multiple machine heterarchical manufacturing cells,” in Proceedings of XXVI North American Manufacturing Research Conference, Atlanta, GA, May 1998, pp. 311–316.
V.V. Prabhu, “Stability and fault adaptation in distributed control of heterarchical manufacturing job shops,” IEEE Transactions on Robotics and Automation, 2002, vol. 19, no. 1, pp. 142–147, 2003.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hong, J., Prabhu, V.V. Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems. Applied Intelligence 20, 71–87 (2004). https://doi.org/10.1023/B:APIN.0000011143.95085.74
Issue Date:
DOI: https://doi.org/10.1023/B:APIN.0000011143.95085.74