Abstract
We present a proximal policy optimization agent trained through curriculum learning (CL) principles and meticulous reward engineering to optimize a real-world high-throughput waste sorting facility. Our work addresses the challenge of effectively balancing the competing objectives of operational safety, volume optimization, and minimizing resource usage. A vanilla agent trained from scratch on these multiple criteria fails to solve the problem due to its inherent complexities. This problem is particularly difficult due to the environment’s extremely delayed rewards with long time horizons and class (or action) imbalance, with important actions being infrequent in the optimal policy. This forces the agent to anticipate long-term action consequences and prioritize rare but rewarding behaviours, creating a non-trivial reinforcement learning task. Our five-stage CL approach tackles these challenges by gradually increasing the complexity of the environmental dynamics during policy transfer while simultaneously refining the reward mechanism. This iterative and adaptable process enables the agent to learn a desired optimal policy. Results demonstrate that our approach significantly improves inference-time safety, achieving near-zero safety violations in addition to enhancing waste sorting plant efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pendyala, A., Dettmer, J., Glasmachers, T., Aamna, T.: ContainerGym: a real-world reinforcement learning benchmark for resource allocation. In: Machine Learning, Optimization, and Data Science, pp. 78–92. Springer (2024). https://doi.org/10.1007/978-3-031-53969-5_7
Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M.E., Stone, P.: Curriculum learning for reinforcement learning domains: a framework and survey. J. Mach. Learn. Res. 21(1), 7382–7431 (2020)
Weinshall, D., Cohen, G., Amir, D.: Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks. arXiv preprint arXiv:1802.03796 (2018)
Justesen, N., Bontrager, P., Togelius, J., Risi, S.: Automated curriculum learning for deep reinforcement learning. In: Conference on Computational Intelligence and Games (CIG) (2018)
Riedmiller, M., et al.: Learning by Playing - Solving Sparse Reward Tasks from Scratch. arXiv preprint arXiv:1802.10567 (2018)
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)
Wang, X., Chen, Y., Zhu, W.: A survey on curriculum learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 4555–4576 (2022)
Gupta, K., Mukherjee, D., Najjaran, H.: Extending the capabilities of reinforcement learning through curriculum: a review of methods and applications. SN Comput. Sci. 3(1), 28 (2021)
Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. In: 1st Conference on Robot Learning (CoRL) (2017)
Racanière, S., Lampinen, A.K., Santoro, A., Reichert, D.P., Firoiu, V., Lillicrap, T.P.: Imagination augmented agents for deep reinforcement learning. In: Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (2019)
Xu, Z., Tewari, A.: On the statistical benefits of curriculum learning. In: Chaudhuri, K., (eds.) Proceedings of the 39th International Conference on Machine Learning, vol. 162, Proceedings of Machine Learning Research, PMLR, pp. 24663–24682 (2022)
Pollack, T., Van Kampen, E.-J.: Safe Curriculum Learning for Optimal Flight Control of Unmanned Aerial Vehicles with Uncertain System Dynamics. In: AIAA Scitech 2020 Forum. AIAA SciTech Forum, American Institute of Aeronautics and Astronautics (2020)
Wang, X., et al.: SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II. arXiv preprint arXiv:2012.13169, December 2020
Huang, S., Ontañón, S.: A closer look at invalid action masking in policy gradient algorithms. arXiv preprint arXiv:2006.14171, June 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pendyala, A., Atamna, A., Glasmachers, T. (2024). Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering. In: Bifet, A., Krilavičius, T., Miliou, I., Nowaczyk, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14950. Springer, Cham. https://doi.org/10.1007/978-3-031-70381-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-70381-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70380-5
Online ISBN: 978-3-031-70381-2
eBook Packages: Computer ScienceComputer Science (R0)