Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering

Pendyala, Abhijeet; Atamna, Asma; Glasmachers, Tobias

doi:10.1007/978-3-031-70381-2_10

Abhijeet Pendyala¹¹,
Asma Atamna¹¹ &
Tobias Glasmachers¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14950))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

700 Accesses

Abstract

We present a proximal policy optimization agent trained through curriculum learning (CL) principles and meticulous reward engineering to optimize a real-world high-throughput waste sorting facility. Our work addresses the challenge of effectively balancing the competing objectives of operational safety, volume optimization, and minimizing resource usage. A vanilla agent trained from scratch on these multiple criteria fails to solve the problem due to its inherent complexities. This problem is particularly difficult due to the environment’s extremely delayed rewards with long time horizons and class (or action) imbalance, with important actions being infrequent in the optimal policy. This forces the agent to anticipate long-term action consequences and prioritize rare but rewarding behaviours, creating a non-trivial reinforcement learning task. Our five-stage CL approach tackles these challenges by gradually increasing the complexity of the environmental dynamics during policy transfer while simultaneously refining the reward mechanism. This iterative and adaptable process enables the agent to learn a desired optimal policy. Results demonstrate that our approach significantly improves inference-time safety, achieving near-zero safety violations in addition to enhancing waste sorting plant efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reinforcement Learning for Patient Scheduling with Combinatorial Optimisation

ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation

Neural combinatorial optimization with reinforcement learning in industrial engineering: a survey

Article Open access 14 February 2025

Notes

1.
https://gitlab.com/anonymousppocl1/ppo_paper.git.

References

Pendyala, A., Dettmer, J., Glasmachers, T., Aamna, T.: ContainerGym: a real-world reinforcement learning benchmark for resource allocation. In: Machine Learning, Optimization, and Data Science, pp. 78–92. Springer (2024). https://doi.org/10.1007/978-3-031-53969-5_7
Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M.E., Stone, P.: Curriculum learning for reinforcement learning domains: a framework and survey. J. Mach. Learn. Res. 21(1), 7382–7431 (2020)
MathSciNet Google Scholar
Weinshall, D., Cohen, G., Amir, D.: Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks. arXiv preprint arXiv:1802.03796 (2018)
Justesen, N., Bontrager, P., Togelius, J., Risi, S.: Automated curriculum learning for deep reinforcement learning. In: Conference on Computational Intelligence and Games (CIG) (2018)
Google Scholar
Riedmiller, M., et al.: Learning by Playing - Solving Sparse Reward Tasks from Scratch. arXiv preprint arXiv:1802.10567 (2018)
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)
MathSciNet Google Scholar
Wang, X., Chen, Y., Zhu, W.: A survey on curriculum learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 4555–4576 (2022)
Google Scholar
Gupta, K., Mukherjee, D., Najjaran, H.: Extending the capabilities of reinforcement learning through curriculum: a review of methods and applications. SN Comput. Sci. 3(1), 28 (2021)
Article Google Scholar
Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. In: 1st Conference on Robot Learning (CoRL) (2017)
Google Scholar
Racanière, S., Lampinen, A.K., Santoro, A., Reichert, D.P., Firoiu, V., Lillicrap, T.P.: Imagination augmented agents for deep reinforcement learning. In: Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (2019)
Google Scholar
Xu, Z., Tewari, A.: On the statistical benefits of curriculum learning. In: Chaudhuri, K., (eds.) Proceedings of the 39th International Conference on Machine Learning, vol. 162, Proceedings of Machine Learning Research, PMLR, pp. 24663–24682 (2022)
Google Scholar
Pollack, T., Van Kampen, E.-J.: Safe Curriculum Learning for Optimal Flight Control of Unmanned Aerial Vehicles with Uncertain System Dynamics. In: AIAA Scitech 2020 Forum. AIAA SciTech Forum, American Institute of Aeronautics and Astronautics (2020)
Google Scholar
Wang, X., et al.: SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II. arXiv preprint arXiv:2012.13169, December 2020
Huang, S., Ontañón, S.: A closer look at invalid action masking in policy gradient algorithms. arXiv preprint arXiv:2006.14171, June 2020

Download references

Author information

Authors and Affiliations

Ruhr-University Bochum, Bochum, Germany
Abhijeet Pendyala, Asma Atamna & Tobias Glasmachers

Authors

Abhijeet Pendyala
View author publications
You can also search for this author in PubMed Google Scholar
Asma Atamna
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Glasmachers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abhijeet Pendyala .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Stockholm University, Kista, Sweden
Ioanna Miliou
School of Information Technology, Halmstad University, Halmstad, Sweden
Slawomir Nowaczyk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pendyala, A., Atamna, A., Glasmachers, T. (2024). Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering. In: Bifet, A., Krilavičius, T., Miliou, I., Nowaczyk, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14950. Springer, Cham. https://doi.org/10.1007/978-3-031-70381-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-70381-2_10
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70380-5
Online ISBN: 978-3-031-70381-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering