Skip to main content

Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track (ECML PKDD 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14950))

  • 700 Accesses

Abstract

We present a proximal policy optimization agent trained through curriculum learning (CL) principles and meticulous reward engineering to optimize a real-world high-throughput waste sorting facility. Our work addresses the challenge of effectively balancing the competing objectives of operational safety, volume optimization, and minimizing resource usage. A vanilla agent trained from scratch on these multiple criteria fails to solve the problem due to its inherent complexities. This problem is particularly difficult due to the environment’s extremely delayed rewards with long time horizons and class (or action) imbalance, with important actions being infrequent in the optimal policy. This forces the agent to anticipate long-term action consequences and prioritize rare but rewarding behaviours, creating a non-trivial reinforcement learning task. Our five-stage CL approach tackles these challenges by gradually increasing the complexity of the environmental dynamics during policy transfer while simultaneously refining the reward mechanism. This iterative and adaptable process enables the agent to learn a desired optimal policy. Results demonstrate that our approach significantly improves inference-time safety, achieving near-zero safety violations in addition to enhancing waste sorting plant efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://gitlab.com/anonymousppocl1/ppo_paper.git.

References

  1. Pendyala, A., Dettmer, J., Glasmachers, T., Aamna, T.: ContainerGym: a real-world reinforcement learning benchmark for resource allocation. In: Machine Learning, Optimization, and Data Science, pp. 78–92. Springer (2024). https://doi.org/10.1007/978-3-031-53969-5_7

  2. Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M.E., Stone, P.: Curriculum learning for reinforcement learning domains: a framework and survey. J. Mach. Learn. Res. 21(1), 7382–7431 (2020)

    MathSciNet  Google Scholar 

  3. Weinshall, D., Cohen, G., Amir, D.: Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks. arXiv preprint arXiv:1802.03796 (2018)

  4. Justesen, N., Bontrager, P., Togelius, J., Risi, S.: Automated curriculum learning for deep reinforcement learning. In: Conference on Computational Intelligence and Games (CIG) (2018)

    Google Scholar 

  5. Riedmiller, M., et al.: Learning by Playing - Solving Sparse Reward Tasks from Scratch. arXiv preprint arXiv:1802.10567 (2018)

  6. García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)

    MathSciNet  Google Scholar 

  7. Wang, X., Chen, Y., Zhu, W.: A survey on curriculum learning. IEEE Trans. Pattern Anal. Mach. Intell. 44(9), 4555–4576 (2022)

    Google Scholar 

  8. Gupta, K., Mukherjee, D., Najjaran, H.: Extending the capabilities of reinforcement learning through curriculum: a review of methods and applications. SN Comput. Sci. 3(1), 28 (2021)

    Article  Google Scholar 

  9. Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P.: Reverse curriculum generation for reinforcement learning. In: 1st Conference on Robot Learning (CoRL) (2017)

    Google Scholar 

  10. Racanière, S., Lampinen, A.K., Santoro, A., Reichert, D.P., Firoiu, V., Lillicrap, T.P.: Imagination augmented agents for deep reinforcement learning. In: Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (2019)

    Google Scholar 

  11. Xu, Z., Tewari, A.: On the statistical benefits of curriculum learning. In: Chaudhuri, K., (eds.) Proceedings of the 39th International Conference on Machine Learning, vol. 162, Proceedings of Machine Learning Research, PMLR, pp. 24663–24682 (2022)

    Google Scholar 

  12. Pollack, T., Van Kampen, E.-J.: Safe Curriculum Learning for Optimal Flight Control of Unmanned Aerial Vehicles with Uncertain System Dynamics. In: AIAA Scitech 2020 Forum. AIAA SciTech Forum, American Institute of Aeronautics and Astronautics (2020)

    Google Scholar 

  13. Wang, X., et al.: SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II. arXiv preprint arXiv:2012.13169, December 2020

  14. Huang, S., Ontañón, S.: A closer look at invalid action masking in policy gradient algorithms. arXiv preprint arXiv:2006.14171, June 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhijeet Pendyala .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pendyala, A., Atamna, A., Glasmachers, T. (2024). Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering. In: Bifet, A., Krilavičius, T., Miliou, I., Nowaczyk, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14950. Springer, Cham. https://doi.org/10.1007/978-3-031-70381-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70381-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70380-5

  • Online ISBN: 978-3-031-70381-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics