High Performance Computing Queue Time Prediction Using Clustering and Regression

Hutchison, Scott; Andresen, Daniel; Neilsen, Mitchell; Hsu, William; Parsons, Benjamin

doi:10.1007/978-3-031-30445-3_22

Scott Hutchison ORCID: orcid.org/0000-0002-6698-3033¹¹,
Daniel Andresen¹¹,
Mitchell Neilsen¹¹,
William Hsu¹¹ &
…
Benjamin Parsons¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13827))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

420 Accesses

Abstract

High Performance Computing (HPC) users are often provided little or no information at job submission time regarding how long their job will be queued until it begins execution. Foreknowledge of a long queue time can inform HPC user’s decision to migrate their jobs to commercial cloud infrastructure to receive their results sooner. Various researchers have used different machine learning techniques to build queue time estimators. This research applies the proven technique of K-Means clustering followed by Gradient Boosted Tree regression on over 700,000 jobs actually submitted to an HPC system to predict a submitted job’s queue time from HPC system characteristics and user provided job requirements. This method applied to HPC queue time prediction achieves better than 96% accuracy at classifying whether a job will start prior to an assigned deadline. Additionally, this research shows that historic HPC CPU allocation data can be used to predict future increases or decreases in job queue time with accuracy exceeding 96%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Becker, D.J., Sterling, T., Savarese, D., Dorband, J.E., Ranawak, U.A., Packer, C.V.: Beowulf: a parallel workstation for scientific computation. In: Proceedings, International Conference on Parallel Processing, vol. 95, pp. 11–14 (1995)
Google Scholar
Brown, N., Gibb, G., Belikov, E., Nash, R.: Predicting batch queue job wait times for informed scheduling of urgent hpc workloads. arXiv preprint arXiv:2204.13543 (2022)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Henriques, J., Caldeira, F., Cruz, T., Simões, P.: Combining k-means and xgboost models for anomaly detection using log datasets. Electronics 9(7), 1164 (2020)
Article Google Scholar
Hunter, J.D.: Matplotlib: a 2d graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007). https://doi.org/10.1109/MCSE.2007.55
Article Google Scholar
Jancauskas, V., Piontek, T., Kopta, P., Bosak, B.: Predicting queue wait time probabilities for multi-scale computing. Philos. Trans. Roy. Soc. A 377(2142), 20180151 (2019)
Article Google Scholar
Kumar, R., Vadhiyar, S.: Prediction of queue waiting times for metascheduling on parallel batch systems. In: Cirne, W., Desai, N. (eds.) JSSPP 2014. LNCS, vol. 8828, pp. 108–128. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15789-4_7
Chapter Google Scholar
Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recogn. 36(2), 451–461 (2003)
Article Google Scholar
Liu, Y., Luo, H., Zhao, B., Zhao, X., Han, Z.: Short-term power load forecasting based on clustering and xgboost method. In: 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), pp. 536–539. IEEE (2018)
Google Scholar
Meng, X., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
MathSciNet MATH Google Scholar
Okanlawon, A., Yang, H., Bose, A., Hsu, W., Andresen, D., Tanash, M.: Feature selection for learning to predict outcomes of compute cluster jobs with application to decision support. In: 2020 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 1231–1236. IEEE (2020)
Google Scholar
Pearson, K.: Vii. note on regression and inheritance in the case of two parents. In: Proceedings of the Royal Society of London, vol. 58, pp. 347–352, 240–242 (1895)
Google Scholar
Tanash, M., Dunn, B., Andresen, D., Hsu, W., Yang, H., Okanlawon, A.: Improving hpc system performance by predicting job resources via supervised machine learning. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), pp. 1–8 (2019)
Google Scholar
Thorndike, R.L.: Who belongs in the family. Psychometrika, pp. 267–276 (1953)
Google Scholar
Van Rossum, G., Drake, F.L.: Python 3 Reference Manual. CreateSpace, Scotts Valley (2009)
Google Scholar
Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003). https://doi.org/10.1007/10968987_3
Chapter Google Scholar
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Kansas State University, Manhattan, KS, 66506, USA
Scott Hutchison, Daniel Andresen, Mitchell Neilsen & William Hsu
Engineering Research and Development Center, Vicksburg, MS, 39180, USA
Benjamin Parsons

Authors

Scott Hutchison
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Andresen
View author publications
You can also search for this author in PubMed Google Scholar
Mitchell Neilsen
View author publications
You can also search for this author in PubMed Google Scholar
William Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Parsons
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Scott Hutchison .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Knoxville, TN, USA
Jack Dongarra
University of Southern California, Marina del Rey, CA, USA
Ewa Deelman
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hutchison, S., Andresen, D., Neilsen, M., Hsu, W., Parsons, B. (2023). High Performance Computing Queue Time Prediction Using Clustering and Regression. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13827. Springer, Cham. https://doi.org/10.1007/978-3-031-30445-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-30445-3_22
Published: 27 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30444-6
Online ISBN: 978-3-031-30445-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

High Performance Computing Queue Time Prediction Using Clustering and Regression