Application of Multi-agent Reinforcement Learning to the Dynamic Scheduling Problem in Manufacturing Systems

Heik, David; Bahrpeyma, Fouad; Reichelt, Dirk

doi:10.1007/978-3-031-53966-4_18

David Heik¹³,
Fouad Bahrpeyma¹³ &
Dirk Reichelt¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14506))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

170 Accesses

Abstract

Most recent research in reinforcement learning (RL) has dem-onstrated remarkable results on complex strategic planning problems. Especially popular have become approaches which incorporate multiple agents to complete complex tasks in a cooperative manner. However, the application of multi-agent reinforcement learning (MARL) to manufacturing problems, such as the production scheduling problem, has been less frequently addressed and remains a challenge for current research. A major reason is that applications to the manufacturing domain are typically characterized by specific requirements, and impose the research community with major difficulties in terms of implementation. MARL has the capability to solve complex problems with enhanced performance in comparison with traditional methods. The main objective of this paper is to implement feasible MARL algorithms to solve the problem of dynamic scheduling in manufacturing systems using a model factory as an example. We focus on optimizing the performance of the scheduling task, which is mainly reflected in the maskspan. We obtained more stable and enhanced performance in our experiments with algorithms based on the on-policy policy gradient methods. Therefore, this study also investigates the promising and state-of-the-art single-agent reinforcement learning algorithms based on the on-policy method, including Asynchronous Advantage Actor-Critic, Proximal Policy Optimization, and Recurrent Proximal Policy Optimization, and compares the results with those of MARL. The findings illustrate that RL was indeed successful in converging to optimal solutions that are ahead of the traditional heuristic methods for dealing with the complex problem of scheduling under uncertain conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alqoud, A., Schaefer, D., Milisavljevic-Syed, J.: Industry 4.0: a systematic review of legacy manufacturing system digital retrofitting. Manuf. Rev. 9, 32 (2022). https://doi.org/10.1051/mfreview/2022031
Bahrpeyma, F., Haghighi, H., Zakerolhosseini, A.: An adaptive rl based approach for dynamic resource provisioning in cloud virtualized data centers. Computing 97, 1209–1234 (2015)
Article MathSciNet Google Scholar
Bahrpeyma, F., Zakerolhoseini, A., Haghighi, H.: Using ids fitted q to develop a real-time adaptive controller for dynamic resource provisioning in cloud’s virtualized environment. Appl. Soft Comput. 26, 285–298 (2015)
Article Google Scholar
Burggräf, P., Wagner, J., Saßmannshausen, T., Ohrndorf, D., Subramani, K.: Multi-agent-based deep reinforcement learning for dynamic flexible job shop scheduling. Procedia CIRP 112, 57–62 (2022). https://doi.org/10.1016/j.procir.2022.09.024
Article Google Scholar
Carroll, D.C.: Heuristic sequencing of single and multiple component jobs. Ph.D. thesis, Massachusetts Institute of Technology (1965)
Google Scholar
Conway, R.W.: Priority dispatching and job lateness in a job shop. J. Ind. Eng. 16(4), 228–237 (1965)
Google Scholar
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269–271 (1959). https://doi.org/10.1007/bf01386390
Article MathSciNet Google Scholar
Garey, M.R., Johnson, D.S., Sethi, R.: The complexity of flowshop and jobshop scheduling. Math. Oper. Res. 1(2), 117–129 (1976). http://www.jstor.org/stable/3689278
Graham, R.L.: Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17(2), 416–429 (1969). http://www.jstor.org/stable/2099572
Heik, D.: Discrete-test-bed-environment-with-multiple-operations (v1) (2023). https://doi.org/10.5281/ZENODO.7906613
Heik, D., Bahrpeyma, F., Reichelt, D.: An application of reinforcement learning in industrial cyber-physical systems. In: OVERLAY 2022: 4th Workshop on Artificial Intelligence and Formal Verification, Logic, Automata, and Synthesis (2022)
Google Scholar
Heik, D., Bahrpeyma, F., Reichelt, D.: Dynamic job shop scheduling in an industrial assembly environment using various reinforcement learning techniques. In: 22nd International Conference on Intelligent Systems Design and Applications (ISDA 2022) (2022)
Google Scholar
Holland, J.H.: Outline for a logical theory of adaptive systems. J. ACM 9(3), 297–314 (1962). https://doi.org/10.1145/321127.321128
Article Google Scholar
Hussain, K., Salleh, M.N.M., Cheng, S., Shi, Y.: Metaheuristic research: a comprehensive survey. Artif. Intell. Rev. 52(4), 2191–2233 (2018). https://doi.org/10.1007/s10462-017-9605-z
Article Google Scholar
Jing, X., Yao, X., Liu, M., Zhou, J.: Multi-agent reinforcement learning based on graph convolutional network for flexible job shop scheduling. J. Intell. Manuf. (2022). https://doi.org/10.1007/s10845-022-02037-5
Kennedy, J., Eberhart, R.: A discrete binary version of the particle swarm algorithm. In: 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation. IEEE (1997). https://doi.org/10.1109/icsmc.1997.637339
Kleinrock, L.: Analysis of a time-shared processor. Naval Res. Logist. q. 11(1), 59–73 (1964)
Article MathSciNet Google Scholar
Liu, R., Piplani, R., Toro, C.: Deep reinforcement learning for dynamic scheduling of a flexible job shop. Int. J. Prod. Res. 60(13), 4049–4069 (2022). https://doi.org/10.1080/00207543.2022.2058432
Lohse, O., Haag, A., Dagner, T.: Enhancing Monte-Carlo tree search with multi-agent deep q-network in open shop scheduling. In: 2022 5th World Conference on Mechanical Engineering and Intelligent Manufacturing (WCMEIM), pp. 1210–1215 (2022). https://doi.org/10.1109/WCMEIM56910.2022.10021570
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6382–6393. NIPS’17, Curran Associates Inc., Red Hook, NY, USA (2017)
Google Scholar
Park, I.B., Huh, J., Kim, J., Park, J.: A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities. IEEE Trans. Autom. Sci. Eng. 1–12 (2020). https://doi.org/10.1109/tase.2019.2956762
Popper, J., Motsch, W., David, A., Petzsche, T., Ruskowski, M.: Utilizing multi-agent deep reinforcement learning for flexible job shop scheduling under sustainable viewpoints. In: 2021 International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME), pp. 1–6 (2021). https://doi.org/10.1109/ICECCME52200.2021.9590925
de Puiseau, C.W., Meyes, R., Meisen, T.: On reliability of reinforcement learning based production scheduling systems: a comparative survey. J. Intell. Manuf. 33(4), 911–927 (2022). https://doi.org/10.1007/s10845-022-01915-2
Article Google Scholar
Troxler, P.: Making the 3rd Industrial Revolution. Fab Labs: Of Machines, Makers and Inventors. Transcript Publishers, Bielefeld (2013)
Google Scholar
Xin-li, X., Ping, H., Wan-Liang, W.: Multi-agent dynamic scheduling method and its application to dyeing shops scheduling. Comput. Integr. Manuf. Syst. 16(03) (2010)
Google Scholar
Yan-hai, H., Jun-qi, Y., Fei-fan, Y., Jun-he, Y.: Flow shop rescheduling problem under rush orders. J. Zhejiang Univ.-Sci. A 6(10), 1040–1046 (2005). https://doi.org/10.1631/jzus.2005.a1040
Article Google Scholar
Zhang, G., Shao, X., Li, P., Gao, L.: An effective hybrid particle swarm optimization algorithm for multi-objective flexible job-shop scheduling problem. Comput. Ind. Eng. 56(4), 1309–1318 (2009). https://doi.org/10.1016/j.cie.2008.07.021, https://www.sciencedirect.com/science/article/pii/S0360835208001666
Zhang, Y., Zhu, H., Tang, D., Zhou, T., Gui, Y.: Dynamic job shop scheduling based on deep reinforcement learning for multi-agent manufacturing systems. Robot. Comput.-Integr. Manuf. 78, 102412 (2022). https://doi.org/10.1016/j.rcim.2022.102412
Zhang, Z., Ong, Y.S., Wang, D., Xue, B.: A collaborative multiagent reinforcement learning method based on policy gradient potential. IEEE Trans. Cybern. 51(2), 1015–1027 (2021). https://doi.org/10.1109/TCYB.2019.2932203
Article Google Scholar
Zhou, T., Tang, D., Zhu, H., Zhang, Z.: Multi-agent reinforcement learning for online scheduling in smart factories. Robot. Comput.-Integr. Manuf. 72, 102202 (2021). https://doi.org/10.1016/j.rcim.2021.102202
Zizic, M.C., Mladineo, M., Gjeldum, N., Celent, L.: From industry 4.0 towards industry 5.0: a review and analysis of paradigm shift for the people, organization and technology. Energies 15(14) (2022). https://doi.org/10.3390/en15145221, https://www.mdpi.com/1996-1073/15/14/5221

Download references

Acknowledgements

This research was funded as part of the project “Produktionssysteme mit Menschen und Technik als Team” (ProMenTaT, application number: 100649455) with funds from the European Social Fund Plus (ESF Plus) and from tax revenues based on the budget passed by the Saxon State Parliament.

Author information

Authors and Affiliations

Faculty of Informatics/Mathematics, University of Applied Sciences Dresden, 01069, Dresden, Germany
David Heik, Fouad Bahrpeyma & Dirk Reichelt

Authors

David Heik
View author publications
You can also search for this author in PubMed Google Scholar
Fouad Bahrpeyma
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Reichelt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Heik .

Editor information

Editors and Affiliations

University of Catania, Catania, Catania, Italy
Giuseppe Nicosia
Newcastle University, Newcastle upon Tyne, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Gabriele La Malfa
University of Florida, Gainesville, FL, USA
Panos M. Pardalos
Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heik, D., Bahrpeyma, F., Reichelt, D. (2024). Application of Multi-agent Reinforcement Learning to the Dynamic Scheduling Problem in Manufacturing Systems. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Pardalos, P.M., Umeton, R. (eds) Machine Learning, Optimization, and Data Science. LOD 2023. Lecture Notes in Computer Science, vol 14506. Springer, Cham. https://doi.org/10.1007/978-3-031-53966-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-53966-4_18
Published: 15 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53965-7
Online ISBN: 978-3-031-53966-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Application of Multi-agent Reinforcement Learning to the Dynamic Scheduling Problem in Manufacturing Systems