Skip to main content

Launchpad: Learning to Schedule Using Offline and Online RL Methods

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14591))

Included in the following conference series:

  • 104 Accesses

Abstract

Reinforcement Learning (RL) algorithms have succeeded in several challenging domains. Classic Online RL job schedulers can learn efficient scheduling strategies but often take tens of thousands of timesteps to explore the environment and adapt to a randomly initialized policy. Current RL schedulers overlook the importance of learning from pre-recorded datasets and improving upon existing customized heuristic policies. Data-driven RL (a.k.a batch RL) presents the prospect of policy optimization from pre-recorded datasets without online environment interaction. We explore two data-driven RL methods: Behaviour Cloning and Offline RL, which aim to learn policies from pre-recorded data without interacting with the environment. These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL. Although the data-driven RL methods produce good results, we show that the performance is highly dependent on the quality of the pre-recorded datasets used during training. We demonstrate that by effectively incorporating prior expert demonstrations to pre-train the RL scheduling agent, we can short-circuit the random exploration phase to learn a reasonable policy with minimal online training. We utilize batch RL as a launchpad to learn effective scheduling policies from datasets collected using an Oracle or custom heuristic policies. We demonstrate that by combining offline pre-training and minimal online RL training, we can achieve comparable performance and reduce the training time by \(\sim 3x\) compared to the state-of-the-art online RL method. This framework is highly effective for pre-training using prior datasets with batch RL methods and well suited to continuous improvement using online learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amazon EC2 spot instances. Accessed May 2022. https://aws.amazon.com/ec2/spot/

  2. Azure spot virtual machines. Accessed May 2022. https://azure.microsoft.com/en-us/pricing/spot/

  3. Ambati, P., Bashir, N., Irwin, D., Shenoy, P.: Waiting game: optimally provisioning fixed resources for cloud-enabled schedulers. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14 (2020). https://doi.org/10.1109/SC41405.2020.00071

  4. Chen, R., Shi, H., Li, Y., Liu, X., Wang, G.: OLPart: online learning based resource partitioning for colocating multiple latency-critical jobs on commodity computers. In: Proceedings of the Eighteenth European Conference on Computer Systems, EuroSys 2023. Association for Computing Machinery (2023)

    Google Scholar 

  5. Christodoulou, P.: Soft actor-critic for discrete action settings. arXiv preprint arXiv:1910.07207 (2019)

  6. Delande, D., Stolf, P., Feraud, R., Pierson, J.M., Bottaro, A.: Horizontal scaling in cloud using contextual bandits. In: Sousa, L., Roma, N., Tomás, P. (eds.) Euro-Par 2021: Parallel Processing. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85665-6_18

  7. Fan, Y., Lan, Z., Rich, P., Allcock, W., Papka, M.E.: Hybrid workload scheduling on HPC systems. In: 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE Computer Society (2022)

    Google Scholar 

  8. Fan, Y., et al.: DRAS: deep reinforcement learning for cluster scheduling in high performance computing. IEEE Trans. Parallel Distrib. Syst. 33(12), 4903–4917 (2022)

    Article  Google Scholar 

  9. Feitelson, D.G.: Resampling with feedback: a new paradigm of using workload data for performance evaluation. In: Klusáček, D., Cirne, W., Rodrigo, G.P. (eds.) Job Scheduling Strategies for Parallel Processing, pp. 3–32. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88224-2_1

    Chapter  Google Scholar 

  10. Florence, P., et al.: Implicit behavioral cloning. In: 5th Annual Conference on Robot Learning (2021)

    Google Scholar 

  11. Fu, J., Kumar, A., Nachum, O., Tucker, G., Levine, S.: D4RL: datasets for deep data-driven reinforcement learning (2021)

    Google Scholar 

  12. Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning (2018)

    Google Scholar 

  13. Gabel, T., Lange, S., Riedmiller, M.: Batch Reinforcement Learning. Springer, Cham (2012)

    Google Scholar 

  14. Gao, Y., Chen, L., Li, B.: Spotlight: optimizing device placement for training deep neural networks. In: Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR (2018)

    Google Scholar 

  15. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor (2018)

    Google Scholar 

  16. Hahn, M., Chaplot, D.S., Tulsiani, S., Mukadam, M., Rehg, J.M., Gupta, A.: No RL, no simulation: learning to navigate without navigating. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021)

    Google Scholar 

  17. Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H.: Multi-task deep reinforcement learning with PopArt. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)

    Google Scholar 

  18. Hu, Z., Tu, J., Li, B.: Spear: optimized dependency-aware task scheduling with deep reinforcement learning. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS) (2019)

    Google Scholar 

  19. Jain, T., Cooperman, G.: CRAC: checkpoint-restart architecture for CUDA with streams and UVM. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020. IEEE Press (2020)

    Google Scholar 

  20. Jay, N., Rotman, N., Godfrey, B., Schapira, M., Tamar, A.: A deep reinforcement learning perspective on internet congestion control. In: Proceedings of the 36th International Conference on Machine Learning (2019)

    Google Scholar 

  21. Krishnakumar, A., et al.: Runtime task scheduling using imitation learning for heterogeneous many-core systems. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 39(11), 4064–4077 (2020)

    Google Scholar 

  22. Kumar, A., Fu, J., Soh, M., Tucker, G., Levine, S.: Stabilizing off-policy Q-learning via bootstrapping error reduction. In: Advances in Neural Information Processing Systems (2019)

    Google Scholar 

  23. Levin, S.: Lecture notes. https://cs182sp21.github.io/static/slides/lec-14.pdf

  24. Li, B., Basu Roy, R., Wang, D., Samsi, S., Gadepally, V., Tiwari, D.: Toward sustainable HPC: carbon footprint estimation and environmental implications of HPC systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023. ACM (2023)

    Google Scholar 

  25. Mandal, S.K., Bhat, G., Patil, C.A., Doppa, J.R., Pande, P.P., Ogras, U.Y.: Dynamic resource management of heterogeneous mobile platforms via imitation learning. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(12), 2842–2854 (2019)

    Google Scholar 

  26. Mandlekar, A., et al.: What matters in learning from offline human demonstrations for robot manipulation (2021)

    Google Scholar 

  27. Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM Workshop on Hot Topics in Networks (2016)

    Google Scholar 

  28. Mao, H., et al.: Park: an open platform for learning-augmented computer systems. In: Advances in Neural Information Processing Systems. Curran Associates, Inc. (2019)

    Google Scholar 

  29. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research. PMLR (2016)

    Google Scholar 

  30. Nair, A., Dalal, M., Gupta, A., Levine, S.: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359 (2020)

  31. Nair, A., Dalal, M., Gupta, A., Levine, S.: AWAC: accelerating online reinforcement learning with offline datasets (2021)

    Google Scholar 

  32. Narayanan, D., Santhanam, K., Kazhamiaka, F., Phanishayee, A., Zaharia, M.: Heterogeneity-aware cluster scheduling policies for deep learning workloads. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2020), pp. 481–498. USENIX Association, November 2020. https://www.usenix.org/conference/osdi20/presentation/narayanan-deepak

  33. Pomerleau, D.A.: Efficient training of artificial neural networks for autonomous navigation. Neural Comput. 3(1), 88–97 (1991)

    Article  Google Scholar 

  34. Sartor, A.L., Krishnakumar, A., Arda, S.E., Ogras, U.Y., Marculescu, R.: HiLITE: hierarchical and lightweight imitation learning for power management of embedded SoCs. IEEE Comput. Archit. Lett. 19(1), 63–67 (2020)

    Article  Google Scholar 

  35. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning (2015)

    Google Scholar 

  36. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017)

    Google Scholar 

  37. Souza, A., Pelckmans, K., Tordsson, J.: A HPC co-scheduler with reinforcement learning. In: Klusáček, D., Cirne, W., Rodrigo, G.P. (eds.) Job Scheduling Strategies for Parallel Processing. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88224-2_7

  38. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    Google Scholar 

  39. Venkataswamy, V.: Scheduling to ensure performance and cost effectiveness in power-modulated datacenters (2023). https://doi.org/10.18130/efrq-c210

  40. Venkataswamy, V., Grigsby, J., Grimshaw, A., Qi, Y.: RARE: renewable energy aware resource management in datacenters. In: Job Scheduling Strategies for Parallel Processing, JSSPP 2022 (2022)

    Google Scholar 

  41. Venkataswamy, V., Grimshaw, A.: Job scheduling in datacenters using constraint controlled RL. arXiv https://arxiv.org/abs/2211.05338 (2022)

  42. Wang, Z., et al.: Critic regularized regression. In: Advances in Neural Information Processing Systems (2020)

    Google Scholar 

  43. Zhou, Y., et al.: Carbink: fault-tolerant far memory. In: 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2022). USENIX Association (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vanamala Venkataswamy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Venkataswamy, V., Grigsby, J., Grimshaw, A., Qi, Y. (2025). Launchpad: Learning to Schedule Using Offline and Online RL Methods. In: Klusáček, D., Corbalán, J., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2024. Lecture Notes in Computer Science, vol 14591. Springer, Cham. https://doi.org/10.1007/978-3-031-74430-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-74430-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-74429-7

  • Online ISBN: 978-3-031-74430-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics