Measurement and Modeling of Performance of HPC Applications Towards Overcommitting Scheduling Systems

Minami, Shohei; Endo, Toshio; Nomura, Akihiro

doi:10.1007/978-3-030-88224-2_4

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12985))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

466 Accesses
3 Citations

Abstract

Recently, the use of interactive jobs in addition to traditional batch jobs is attracting attention in supercomputer systems. We expect overcommitting scheduling, in which multiple HPC jobs share computational resources, to accept them while keeping resource utilization higher and response time lower. In order to realize overcommitting scheduling, the following approaches are necessary: 1) to understand the impact on performance when various applications share resources, and 2) to predict the performance before overcommitting. With this knowledge, we will be able to optimize overcommitting scheduling that avoids performance degradation of jobs. In this paper, we describe the overall picture of overcommitting scheduling and took the two approaches shown above. We confirmed that overcommitting shows performance improvement of the overall system and also built a performance model that can be applied to job scheduling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fair Resource Allocation for Running HPC Workloads Simultaneously

Collecting HPC Applications Processing Characteristics to Facilitate Co-scheduling

DJSB: Dynamic Job Scheduling Benchmark

Notes

1.
Some batch jobs can have fluctuation of CPU utilization for file I/O or network. This paper omits discussion for such jobs for simplicity; however, overcommitting scheduling is expected to work efficiently for such jobs.
2.
If the opponent is a batch job, it is same as the execution time.

References

Bailey, D., et al.: The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991). https://doi.org/10.1177/109434209100500306
Article Google Scholar
Chen, W., Rao, J., Zhou, X.: Preemptive, low latency datacenter scheduling via lightweight virtualization. In: 2017 USENIX Annual Technical Conference (USENIX ATC 2017), pp. 251–263 (2017)
Google Scholar
Delgado, P., Didona, D., Dinu, F., Zwaenepoel, W.: Kairos: preemptive data center scheduling without runtime estimates. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 135–148 (2018)
Google Scholar
Feitelson, D.G.: Metrics for parallel job scheduling and their convergence. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 188–205. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45540-X_11
Chapter MATH Google Scholar
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_14
Chapter Google Scholar
Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Brunst, H., Müller, M., Nagel, W., Resch, M. (eds.) Tools for High Performance Computing 2011, pp. 79–91. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31476-6_7
Chapter Google Scholar
Matsuoka, S., et al.: Overview of TSUBAME3.0, green cloud supercomputer for convergence of HPC, AI and big-data. TSUBAME e-Science J. 16, 2–9 (2017)
Google Scholar
Minami, S., Endo, T., Nomura, A.: Performance modeling of HPC applications on overcommitted systems (extended abstract). In: The International Conference on High Performance Computing in Asia-Pacific Region, pp. 129–132 (2021)
Google Scholar
Tau Leng, R.A., Hsieh, J., Mashayekhi, V., Rooholamini, R.: An empirical study of hyper-threading in high performance computing clusters. Linux HPC Revolution 45 (2002)
Google Scholar
Terpstra, D., Jagode, H., You, H., Dongarra, J.: Collecting performance data with PAPI-C. In: Müller, M.S., Resch, M.M., Schulz, A., Nagel, W.E. (eds.) Tools for High Performance Computing 2009, pp. 157–173. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11261-4_11
Chapter Google Scholar
Wong, F.C., Arpaci-Dusseau, A.C., Culler, D.E.: Building MPI for multi-programming systems using implicit information. In: Dongarra, J., Luque, E., Margalef, T. (eds.) EuroPVM/MPI 1999. LNCS, vol. 1697, pp. 215–222. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48158-3_27
Chapter Google Scholar
Yabuuchi, H., Taniwaki, D., Omura, S.: Low-latency job scheduling with preemption for the development of deep learning. In: 2019 USENIX Conference on Operational Machine Learning (OpML 2019), pp. 27–30 (2019)
Google Scholar
Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple Linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003). https://doi.org/10.1007/10968987_3
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP19H04121.

Author information

Authors and Affiliations

Fujitsu Limited, Kawasaki, Kanagawa, Japan
Shohei Minami
Tokyo Institute of Technology, Meguro City, Tokyo, Japan
Shohei Minami, Toshio Endo & Akihiro Nomura

Authors

Shohei Minami
View author publications
You can also search for this author in PubMed Google Scholar
Toshio Endo
View author publications
You can also search for this author in PubMed Google Scholar
Akihiro Nomura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shohei Minami .

Editor information

Editors and Affiliations

CESNET, Prague, Czech Republic
Dalibor Klusáček
Google, Mountain View, CA, USA
Walfredo Cirne
Apple, Cupertino, CA, USA
Gonzalo P. Rodrigo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Minami, S., Endo, T., Nomura, A. (2021). Measurement and Modeling of Performance of HPC Applications Towards Overcommitting Scheduling Systems. In: Klusáček, D., Cirne, W., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2021. Lecture Notes in Computer Science(), vol 12985. Springer, Cham. https://doi.org/10.1007/978-3-030-88224-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-88224-2_4
Published: 06 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88223-5
Online ISBN: 978-3-030-88224-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Measurement and Modeling of Performance of HPC Applications Towards Overcommitting Scheduling Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Fair Resource Allocation for Running HPC Workloads Simultaneously

Collecting HPC Applications Processing Characteristics to Facilitate Co-scheduling

DJSB: Dynamic Job Scheduling Benchmark

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Measurement and Modeling of Performance of HPC Applications Towards Overcommitting Scheduling Systems

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Fair Resource Allocation for Running HPC Workloads Simultaneously

Collecting HPC Applications Processing Characteristics to Facilitate Co-scheduling

DJSB: Dynamic Job Scheduling Benchmark

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation