Abstract
Job walltime estimates are used by current batch schedulers to optimize the performance and predictability when scheduling parallel jobs on the computing resources. Since the user-provided estimates are inaccurate and often overestimated, system administrators often seek ways to improve them artificially using some form of walltime predictor. In this work, we present our real-life experience with deploying such a predictor using the soft walltime feature available in PBS Professional resource manager. Our results indicate that the applied solution is working properly, significantly increasing the accuracy of user-provided estimates. We share our experience when tuning the scheduler, discussing several problems that occurred along the way. Also, we provide a comparison of how the system behavior evolved once soft walltimes were deployed in production. Last but not least, we publish collected workload traces along with this paper to allow other researchers to further study and extend our work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In PBS Professional, not every waiting job gets a reservation. Only a predefined number of high priority jobs (per queue) has guaranteed (latest) start times and these are called top jobs. Remaining jobs can be backfilled around top jobs provided they will not interfere with their reservations.
- 2.
In this context, job batch is the set of jobs submitted into the system by a given user in a short time frame, e.g., during few minutes.
- 3.
The difference is caused by the fact that it takes some time before we collect enough data for each user to produce soft walltimes.
- 4.
In our case, those are 2, 4 and 24 h, 2, 4 and 7 days and 2, 4 or >4 weeks.
- 5.
- 6.
This is also coupled with fair-share based job ordering which we use to prioritize less active users over those who utilize the system heavily.
References
CERIT Scientific Cloud, August 2022. http://www.cerit-sc.cz
Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36180-4_7
Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Trans. Parallel Distrib. Syst. 16(2), 175–182 (2005)
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_14
Soft walltime predictor implementation, August 2022. https://github.com/CESNET/softwalltime-predictor/
JSSPP workloads archive, August 2022. https://jsspp.org/workload/
Klusáček, D., Chlumský, V.: Evaluating the impact of soft walltimes on job scheduling performance. In: Desai, N., Klusáček, D., Cirne, W. (eds.) JSSPP 2018. LNCS, vol. 11332, pp. 15–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-10632-4_2
Lee, C.B., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2004). https://doi.org/10.1007/11407522_14
MetaCentrum, February 2022. http://www.metacentrum.cz/
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)
Seneviratne, S., Witharana, S.: A survey on methodologies for runtime prediction on grid environments. In: 7th International Conference on Information and Automation for Sustainability, pp. 1–6. IEEE (2014)
Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 202–219. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47954-6_11
Non-destructive walltime, February 2022. https://community.openpbs.org/t/pp-482-non-destructive-walltime/587/4
Soft walltime documentation, February 2022. https://pbspro.atlassian.net/wiki/spaces/PD/pages/42532871/PP-482+Soft+Walltime
Soysal, M., Bergho, M., Streit, A.: Analysis of job metadata for enhanced wall time prediction. In: Desai, N., Klusáček, D., Cirne, W. (eds.) JSSPP 2018. LNCS, vol. 11332, pp. 1–14. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-10632-4_1
Tsafrir, D.: Using inaccurate estimates accurately. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 208–221. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16505-4_12
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst. 18(6), 789–803 (2007)
Vasupongayya, S., Chiang, S.-H.: On job fairness in non-preemptive parallel job scheduling. In: Zheng, S.Q. (ed.) International Conference on Parallel and Distributed Computing Systems (PDCS 2005), pp. 100–105. IASTED/ACTA Press (2005)
Acknowledgments
We kindly acknowledge the support and computational resources supplied by the project “e-Infrastruktura CZ” (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chlumský, V., Klusáček, D. (2023). Improving Accuracy of Walltime Estimates in PBS Professional Using Soft Walltimes. In: Klusáček, D., Julita, C., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2022. Lecture Notes in Computer Science, vol 13592. Springer, Cham. https://doi.org/10.1007/978-3-031-22698-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-22698-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22697-7
Online ISBN: 978-3-031-22698-4
eBook Packages: Computer ScienceComputer Science (R0)