Launchpad: Learning to Schedule Using Offline and Online RL Methods

Venkataswamy, Vanamala; Grigsby, Jake; Grimshaw, Andrew; Qi, Yanjun

doi:10.1007/978-3-031-74430-3_4

Vanamala Venkataswamy¹⁰,
Jake Grigsby¹²,
Andrew Grimshaw¹¹ &
…
Yanjun Qi¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14591))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

104 Accesses

Abstract

Reinforcement Learning (RL) algorithms have succeeded in several challenging domains. Classic Online RL job schedulers can learn efficient scheduling strategies but often take tens of thousands of timesteps to explore the environment and adapt to a randomly initialized policy. Current RL schedulers overlook the importance of learning from pre-recorded datasets and improving upon existing customized heuristic policies. Data-driven RL (a.k.a batch RL) presents the prospect of policy optimization from pre-recorded datasets without online environment interaction. We explore two data-driven RL methods: Behaviour Cloning and Offline RL, which aim to learn policies from pre-recorded data without interacting with the environment. These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL. Although the data-driven RL methods produce good results, we show that the performance is highly dependent on the quality of the pre-recorded datasets used during training. We demonstrate that by effectively incorporating prior expert demonstrations to pre-train the RL scheduling agent, we can short-circuit the random exploration phase to learn a reasonable policy with minimal online training. We utilize batch RL as a launchpad to learn effective scheduling policies from datasets collected using an Oracle or custom heuristic policies. We demonstrate that by combining offline pre-training and minimal online RL training, we can achieve comparable performance and reduce the training time by $\sim 3x$ compared to the state-of-the-art online RL method. This framework is highly effective for pre-training using prior datasets with batch RL methods and well suited to continuous improvement using online learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amazon EC2 spot instances. Accessed May 2022. https://aws.amazon.com/ec2/spot/
Azure spot virtual machines. Accessed May 2022. https://azure.microsoft.com/en-us/pricing/spot/
Ambati, P., Bashir, N., Irwin, D., Shenoy, P.: Waiting game: optimally provisioning fixed resources for cloud-enabled schedulers. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14 (2020). https://doi.org/10.1109/SC41405.2020.00071
Chen, R., Shi, H., Li, Y., Liu, X., Wang, G.: OLPart: online learning based resource partitioning for colocating multiple latency-critical jobs on commodity computers. In: Proceedings of the Eighteenth European Conference on Computer Systems, EuroSys 2023. Association for Computing Machinery (2023)
Google Scholar
Christodoulou, P.: Soft actor-critic for discrete action settings. arXiv preprint arXiv:1910.07207 (2019)
Delande, D., Stolf, P., Feraud, R., Pierson, J.M., Bottaro, A.: Horizontal scaling in cloud using contextual bandits. In: Sousa, L., Roma, N., Tomás, P. (eds.) Euro-Par 2021: Parallel Processing. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85665-6_18
Fan, Y., Lan, Z., Rich, P., Allcock, W., Papka, M.E.: Hybrid workload scheduling on HPC systems. In: 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE Computer Society (2022)
Google Scholar
Fan, Y., et al.: DRAS: deep reinforcement learning for cluster scheduling in high performance computing. IEEE Trans. Parallel Distrib. Syst. 33(12), 4903–4917 (2022)
Article Google Scholar
Feitelson, D.G.: Resampling with feedback: a new paradigm of using workload data for performance evaluation. In: Klusáček, D., Cirne, W., Rodrigo, G.P. (eds.) Job Scheduling Strategies for Parallel Processing, pp. 3–32. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88224-2_1
Chapter Google Scholar
Florence, P., et al.: Implicit behavioral cloning. In: 5th Annual Conference on Robot Learning (2021)
Google Scholar
Fu, J., Kumar, A., Nachum, O., Tucker, G., Levine, S.: D4RL: datasets for deep data-driven reinforcement learning (2021)
Google Scholar
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning (2018)
Google Scholar
Gabel, T., Lange, S., Riedmiller, M.: Batch Reinforcement Learning. Springer, Cham (2012)
Google Scholar
Gao, Y., Chen, L., Li, B.: Spotlight: optimizing device placement for training deep neural networks. In: Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR (2018)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor (2018)
Google Scholar
Hahn, M., Chaplot, D.S., Tulsiani, S., Mukadam, M., Rehg, J.M., Gupta, A.: No RL, no simulation: learning to navigate without navigating. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021)
Google Scholar
Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H.: Multi-task deep reinforcement learning with PopArt. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)
Google Scholar
Hu, Z., Tu, J., Li, B.: Spear: optimized dependency-aware task scheduling with deep reinforcement learning. In: 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS) (2019)
Google Scholar
Jain, T., Cooperman, G.: CRAC: checkpoint-restart architecture for CUDA with streams and UVM. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020. IEEE Press (2020)
Google Scholar
Jay, N., Rotman, N., Godfrey, B., Schapira, M., Tamar, A.: A deep reinforcement learning perspective on internet congestion control. In: Proceedings of the 36th International Conference on Machine Learning (2019)
Google Scholar
Krishnakumar, A., et al.: Runtime task scheduling using imitation learning for heterogeneous many-core systems. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 39(11), 4064–4077 (2020)
Google Scholar
Kumar, A., Fu, J., Soh, M., Tucker, G., Levine, S.: Stabilizing off-policy Q-learning via bootstrapping error reduction. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Levin, S.: Lecture notes. https://cs182sp21.github.io/static/slides/lec-14.pdf
Li, B., Basu Roy, R., Wang, D., Samsi, S., Gadepally, V., Tiwari, D.: Toward sustainable HPC: carbon footprint estimation and environmental implications of HPC systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023. ACM (2023)
Google Scholar
Mandal, S.K., Bhat, G., Patil, C.A., Doppa, J.R., Pande, P.P., Ogras, U.Y.: Dynamic resource management of heterogeneous mobile platforms via imitation learning. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(12), 2842–2854 (2019)
Google Scholar
Mandlekar, A., et al.: What matters in learning from offline human demonstrations for robot manipulation (2021)
Google Scholar
Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM Workshop on Hot Topics in Networks (2016)
Google Scholar
Mao, H., et al.: Park: an open platform for learning-augmented computer systems. In: Advances in Neural Information Processing Systems. Curran Associates, Inc. (2019)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research. PMLR (2016)
Google Scholar
Nair, A., Dalal, M., Gupta, A., Levine, S.: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359 (2020)
Nair, A., Dalal, M., Gupta, A., Levine, S.: AWAC: accelerating online reinforcement learning with offline datasets (2021)
Google Scholar
Narayanan, D., Santhanam, K., Kazhamiaka, F., Phanishayee, A., Zaharia, M.: Heterogeneity-aware cluster scheduling policies for deep learning workloads. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2020), pp. 481–498. USENIX Association, November 2020. https://www.usenix.org/conference/osdi20/presentation/narayanan-deepak
Pomerleau, D.A.: Efficient training of artificial neural networks for autonomous navigation. Neural Comput. 3(1), 88–97 (1991)
Article Google Scholar
Sartor, A.L., Krishnakumar, A., Arda, S.E., Ogras, U.Y., Marculescu, R.: HiLITE: hierarchical and lightweight imitation learning for power management of embedded SoCs. IEEE Comput. Archit. Lett. 19(1), 63–67 (2020)
Article Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017)
Google Scholar
Souza, A., Pelckmans, K., Tordsson, J.: A HPC co-scheduler with reinforcement learning. In: Klusáček, D., Cirne, W., Rodrigo, G.P. (eds.) Job Scheduling Strategies for Parallel Processing. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88224-2_7
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Venkataswamy, V.: Scheduling to ensure performance and cost effectiveness in power-modulated datacenters (2023). https://doi.org/10.18130/efrq-c210
Venkataswamy, V., Grigsby, J., Grimshaw, A., Qi, Y.: RARE: renewable energy aware resource management in datacenters. In: Job Scheduling Strategies for Parallel Processing, JSSPP 2022 (2022)
Google Scholar
Venkataswamy, V., Grimshaw, A.: Job scheduling in datacenters using constraint controlled RL. arXiv https://arxiv.org/abs/2211.05338 (2022)
Wang, Z., et al.: Critic regularized regression. In: Advances in Neural Information Processing Systems (2020)
Google Scholar
Zhou, Y., et al.: Carbink: fault-tolerant far memory. In: 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2022). USENIX Association (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Virginia, Charlottesville, VA, 22903, USA
Vanamala Venkataswamy & Yanjun Qi
Lancium Compute, 6006 Thomas Road, Houston, TX, 77401, USA
Andrew Grimshaw
University of Texas at Austin, Austin, TX, 78712, USA
Jake Grigsby

Authors

Vanamala Venkataswamy
View author publications
You can also search for this author in PubMed Google Scholar
Jake Grigsby
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Grimshaw
View author publications
You can also search for this author in PubMed Google Scholar
Yanjun Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vanamala Venkataswamy .

Editor information

Editors and Affiliations

CESNET a.l.e, Prague, Czech Republic
Dalibor Klusáček
Polytechnic University of Catalonia, Barcelona, Spain
Julita Corbalán
Apple Inc., Cupertino, CA, USA
Gonzalo P. Rodrigo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Venkataswamy, V., Grigsby, J., Grimshaw, A., Qi, Y. (2025). Launchpad: Learning to Schedule Using Offline and Online RL Methods. In: Klusáček, D., Corbalán, J., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2024. Lecture Notes in Computer Science, vol 14591. Springer, Cham. https://doi.org/10.1007/978-3-031-74430-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-74430-3_4
Published: 21 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74429-7
Online ISBN: 978-3-031-74430-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Launchpad: Learning to Schedule Using Offline and Online RL Methods