Skip to main content

Improving Accuracy of Walltime Estimates in PBS Professional Using Soft Walltimes

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13592))

Included in the following conference series:

  • 282 Accesses

Abstract

Job walltime estimates are used by current batch schedulers to optimize the performance and predictability when scheduling parallel jobs on the computing resources. Since the user-provided estimates are inaccurate and often overestimated, system administrators often seek ways to improve them artificially using some form of walltime predictor. In this work, we present our real-life experience with deploying such a predictor using the soft walltime feature available in PBS Professional resource manager. Our results indicate that the applied solution is working properly, significantly increasing the accuracy of user-provided estimates. We share our experience when tuning the scheduler, discussing several problems that occurred along the way. Also, we provide a comparison of how the system behavior evolved once soft walltimes were deployed in production. Last but not least, we publish collected workload traces along with this paper to allow other researchers to further study and extend our work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In PBS Professional, not every waiting job gets a reservation. Only a predefined number of high priority jobs (per queue) has guaranteed (latest) start times and these are called top jobs. Remaining jobs can be backfilled around top jobs provided they will not interfere with their reservations.

  2. 2.

    In this context, job batch is the set of jobs submitted into the system by a given user in a short time frame, e.g., during few minutes.

  3. 3.

    The difference is caused by the fact that it takes some time before we collect enough data for each user to produce soft walltimes.

  4. 4.

    In our case, those are 2, 4 and 24 h, 2, 4 and 7 days and 2, 4 or >4 weeks.

  5. 5.

    Other values such as 10 s [3, 4] or 1 min [18] are used as well in the literature. In CERIT-SC, 10 min is the recommended minimal runtime of regular job. Shorter jobs are not recommended due to excessive overhead related to their (frequent) processing.

  6. 6.

    This is also coupled with fair-share based job ordering which we use to prioritize less active users over those who utilize the system heavily.

References

  1. CERIT Scientific Cloud, August 2022. http://www.cerit-sc.cz

  2. Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36180-4_7

    Chapter  MATH  Google Scholar 

  3. Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Trans. Parallel Distrib. Syst. 16(2), 175–182 (2005)

    Article  Google Scholar 

  4. Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_14

    Chapter  Google Scholar 

  5. Soft walltime predictor implementation, August 2022. https://github.com/CESNET/softwalltime-predictor/

  6. JSSPP workloads archive, August 2022. https://jsspp.org/workload/

  7. Klusáček, D., Chlumský, V.: Evaluating the impact of soft walltimes on job scheduling performance. In: Desai, N., Klusáček, D., Cirne, W. (eds.) JSSPP 2018. LNCS, vol. 11332, pp. 15–38. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-10632-4_2

    Chapter  Google Scholar 

  8. Lee, C.B., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2004). https://doi.org/10.1007/11407522_14

    Chapter  Google Scholar 

  9. MetaCentrum, February 2022. http://www.metacentrum.cz/

  10. Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)

    Article  Google Scholar 

  11. Seneviratne, S., Witharana, S.: A survey on methodologies for runtime prediction on grid environments. In: 7th International Conference on Information and Automation for Sustainability, pp. 1–6. IEEE (2014)

    Google Scholar 

  12. Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1999. LNCS, vol. 1659, pp. 202–219. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-47954-6_11

    Chapter  Google Scholar 

  13. Non-destructive walltime, February 2022. https://community.openpbs.org/t/pp-482-non-destructive-walltime/587/4

  14. Soft walltime documentation, February 2022. https://pbspro.atlassian.net/wiki/spaces/PD/pages/42532871/PP-482+Soft+Walltime

  15. Soysal, M., Bergho, M., Streit, A.: Analysis of job metadata for enhanced wall time prediction. In: Desai, N., Klusáček, D., Cirne, W. (eds.) JSSPP 2018. LNCS, vol. 11332, pp. 1–14. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-10632-4_1

    Chapter  Google Scholar 

  16. Tsafrir, D.: Using inaccurate estimates accurately. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 208–221. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16505-4_12

    Chapter  Google Scholar 

  17. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst. 18(6), 789–803 (2007)

    Article  Google Scholar 

  18. Vasupongayya, S., Chiang, S.-H.: On job fairness in non-preemptive parallel job scheduling. In: Zheng, S.Q. (ed.) International Conference on Parallel and Distributed Computing Systems (PDCS 2005), pp. 100–105. IASTED/ACTA Press (2005)

    Google Scholar 

Download references

Acknowledgments

We kindly acknowledge the support and computational resources supplied by the project “e-Infrastruktura CZ” (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dalibor Klusáček .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chlumský, V., Klusáček, D. (2023). Improving Accuracy of Walltime Estimates in PBS Professional Using Soft Walltimes. In: Klusáček, D., Julita, C., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2022. Lecture Notes in Computer Science, vol 13592. Springer, Cham. https://doi.org/10.1007/978-3-031-22698-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22698-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22697-7

  • Online ISBN: 978-3-031-22698-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics