Abstract
In the realm of Automated Guided Vehicle (AGV) systems, training Deep Reinforcement Learning (DRL) models presents significant challenges due to the complexity of tasks and environments, as well as the large state and action spaces involved. This paper introduces the Gradual Task Complexity Scaling (GTCS) approach, a novel learning procedure for DRL that effectively addresses these challenges. Unlike existing methods, which focus on directly achieving the final objective, GTCS incrementally upgrades the agent’s objectives while maintaining the same environmental context, enabling a more efficient balance between exploration and exploitation during training. The GTCS procedure features four key components: gradually expanding the reduced effective space size within the warehouse, increasing the number of products the agent must deliver, enhancing the capabilities of AGVs represented as DRL agents, and reducing the maximum number of steps allowed for task completion. GTCS outperforms previous approaches by improving the stability of the learning process, optimizing the delivery workflow, and achieving more efficient learning outcomes.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Not provided
References
Albekairi M, Kaaniche K, Abbas G, Mercorelli P, Alanazi MD, Almadhor A (2024) Advanced neural classifier-based effective human assistance robots using comparable interactive input assessment technique. Mathematics 12(16):2500
Alshahir A, Kaaniche K, Abbas G, Mercorelli P, Albekairi M, Alanazi MD (2024) A study on the performance of adaptive neural networks for haze reduction with a focus on precision. Mathematics 12(16):2526
Amhraoui E, Masrour T (2021) A new approach for multi-agent reinforcement learning. Artificial intelligence and industrial applications: smart operation management. Springer, Berlin, pp 263–275
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38
Bulut V (2022) Optimal path planning method based on epsilon-greedy q-learning algorithm. J Braz Soc Mech Sci Eng 44(3):106
Buşoniu L, Babuška R, De Schutter B (2010) Multi-agent reinforcement learning: an overview. Innov Multi-agent Syst Appl 1:183–221
Canese L, Cardarilli GC, Di Nunzio L, Fazzolari R, Giardino D, Re M, Spanò S (2021) Multi-agent reinforcement learning: a review of challenges and applications. Appl Sci 11(11):4948
Contreras-Cruz MA, Ayala-Ramirez V, Hernandez-Belmonte UH (2015) Mobile robot path planning using artificial bee colony and evolutionary programming. Appl Soft Comput 30:319–328
El Mazgualdi C, Masrour T, El Hassani I, Khdoudi A (2021) Machine learning for kpis prediction: a case study of the overall equipment effectiveness within the automotive industry. Soft Comput 25:2891–2909
El Mazgualdi C, Masrour T, Barka N, El Hassani I (2022) A learning-based decision tool towards smart energy optimization in the manufacturing process. Systems 10(5):180
Fang Q, Xie C (2004) A study on intelligent path following and control for vision-based automated guided vehicle. In: Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No. 04EX788), vol. 6, pp. 4811–4815 . IEEE
Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Syst Sci Cybern 4(2):100–107
Hu H, Yang X, Xiao S, Wang F (2023) Anti-conflict agv path planning in automated container terminals based on multi-agent reinforcement learning. Int J Prod Res 2:1–16
Jeon SM, Kim KH, Kopfer H (2011) Routing automated guided vehicles in container terminals through the q-learning technique. Logist Res 3:19–27
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Karur K, Sharma N, Dharmatti C, Siegel JE (2021) A survey of path planning algorithms for mobile robots. Vehicles 3(3):448–468
Khdoudi A, Barka N, Masrour T, El-Hassani I, Mazgualdi CE (2023) Online prediction of automotive tempered glass quality using machine learning. Int J Adv Manuf Technol 2:1–26
Li MP, Sankaran P, Kuhl ME, Ganguly A, Kwasinski A, Ptucha R (2018) Simulation analysis of a deep reinforcement learning approach for task selection by autonomous material handling vehicles. In: 2018 Winter Simulation Conference (WSC), pp. 1073–1083 . IEEE
Masrour T, Rhazzaf M (2018) A new approach for dynamic parametrization of ant system algorithms. Int J Intell Syst Appl 10(6):1
Mazgualdi CE, Masrour T, Hassani IE, Khdoudi A (2021) A deep reinforcement learning (drl) decision model for heating process parameters identification in automotive glass manufacturing. Artificial intelligence and industrial applications: smart operation management. Springer, Berlin, pp 77–87
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Puterman ML (1990) Markov decision processes. Handb Oper Res Manag Sci 2:331–434
Ramer C, Sessner J, Scholz M, Zhang X, Franke J (2015) Fusing low-cost sensor data for localization and mapping of automated guided vehicle fleets in indoor applications. In: 2015 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), pp. 65–70. IEEE
Rhazzaf M, Masrour T (2020) A dynamic configuration with a shared knowledge centre for multi-objective ant colony optimisation algorithms. Int J Intell Syst Technol Appl 19(6):541–554
Rhazzaf M, Masrour T (2021) Deep learning approach for automated guided vehicle system. Artificial intelligence and industrial applications: smart operation management. Springer, Berlin, pp 227–237
Rhazzaf M, Masrour T (2021) Smart autonomous vehicles in high dimensional warehouses using deep reinforcement learning approach. Eng Lett 29:1
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952
Stentz A (1994) Optimal and efficient path planning for partially-known environments. In: Proceedings of the 1994 IEEE International Conference on Robotics and Automation, pp. 3310–3317. IEEE
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003 . PMLR
Xue T, Zeng P, Yu H (2018) A reinforcement learning method for multi-agv scheduling in manufacturing. In: 2018 IEEE International Conference on Industrial Technology (ICIT), pp. 1557–1561 . IEEE
Zhang Y, Qian Y, Yao Y, Hu H, Xu Y (2020) Learning to cooperate: Application of deep reinforcement learning for online agv path finding. In: Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, pp. 2077–2079
Acknowledgements
The authors wish to acknowledge that there are no individuals or organizations to be acknowledged for their contributions to this paper.
Funding
This research study was conducted without any external funding or financial support.
Author information
Authors and Affiliations
Contributions
The authors contributed equally to this work.
Corresponding author
Ethics declarations
Conflict of interest/Conflict of interest
The authors declare no Conflict of interest or Conflict of interest.
Ethics Approval
Not Applicable
Consent to Participate
Not Applicable
Consent for Publication
Not Applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rhazzaf, M., Masrour, T. Gradual task complexity scaling (GTCS-DRL): a deep reinforcement learning approach for training automated guided vehicle system. Evolving Systems 16, 38 (2025). https://doi.org/10.1007/s12530-025-09660-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12530-025-09660-6