Skip to main content

Measurement and Modeling of Performance of HPC Applications Towards Overcommitting Scheduling Systems

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 2021)

Abstract

Recently, the use of interactive jobs in addition to traditional batch jobs is attracting attention in supercomputer systems. We expect overcommitting scheduling, in which multiple HPC jobs share computational resources, to accept them while keeping resource utilization higher and response time lower. In order to realize overcommitting scheduling, the following approaches are necessary: 1) to understand the impact on performance when various applications share resources, and 2) to predict the performance before overcommitting. With this knowledge, we will be able to optimize overcommitting scheduling that avoids performance degradation of jobs. In this paper, we describe the overall picture of overcommitting scheduling and took the two approaches shown above. We confirmed that overcommitting shows performance improvement of the overall system and also built a performance model that can be applied to job scheduling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Some batch jobs can have fluctuation of CPU utilization for file I/O or network. This paper omits discussion for such jobs for simplicity; however, overcommitting scheduling is expected to work efficiently for such jobs.

  2. 2.

    If the opponent is a batch job, it is same as the execution time.

References

  1. Bailey, D., et al.: The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991). https://doi.org/10.1177/109434209100500306

    Article  Google Scholar 

  2. Chen, W., Rao, J., Zhou, X.: Preemptive, low latency datacenter scheduling via lightweight virtualization. In: 2017 USENIX Annual Technical Conference (USENIX ATC 2017), pp. 251–263 (2017)

    Google Scholar 

  3. Delgado, P., Didona, D., Dinu, F., Zwaenepoel, W.: Kairos: preemptive data center scheduling without runtime estimates. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 135–148 (2018)

    Google Scholar 

  4. Feitelson, D.G.: Metrics for parallel job scheduling and their convergence. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 188–205. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45540-X_11

    Chapter  MATH  Google Scholar 

  5. Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_14

    Chapter  Google Scholar 

  6. Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Brunst, H., Müller, M., Nagel, W., Resch, M. (eds.) Tools for High Performance Computing 2011, pp. 79–91. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31476-6_7

    Chapter  Google Scholar 

  7. Matsuoka, S., et al.: Overview of TSUBAME3.0, green cloud supercomputer for convergence of HPC, AI and big-data. TSUBAME e-Science J. 16, 2–9 (2017)

    Google Scholar 

  8. Minami, S., Endo, T., Nomura, A.: Performance modeling of HPC applications on overcommitted systems (extended abstract). In: The International Conference on High Performance Computing in Asia-Pacific Region, pp. 129–132 (2021)

    Google Scholar 

  9. Tau Leng, R.A., Hsieh, J., Mashayekhi, V., Rooholamini, R.: An empirical study of hyper-threading in high performance computing clusters. Linux HPC Revolution 45 (2002)

    Google Scholar 

  10. Terpstra, D., Jagode, H., You, H., Dongarra, J.: Collecting performance data with PAPI-C. In: Müller, M.S., Resch, M.M., Schulz, A., Nagel, W.E. (eds.) Tools for High Performance Computing 2009, pp. 157–173. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11261-4_11

    Chapter  Google Scholar 

  11. Wong, F.C., Arpaci-Dusseau, A.C., Culler, D.E.: Building MPI for multi-programming systems using implicit information. In: Dongarra, J., Luque, E., Margalef, T. (eds.) EuroPVM/MPI 1999. LNCS, vol. 1697, pp. 215–222. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48158-3_27

    Chapter  Google Scholar 

  12. Yabuuchi, H., Taniwaki, D., Omura, S.: Low-latency job scheduling with preemption for the development of deep learning. In: 2019 USENIX Conference on Operational Machine Learning (OpML 2019), pp. 27–30 (2019)

    Google Scholar 

  13. Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple Linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003). https://doi.org/10.1007/10968987_3

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP19H04121.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shohei Minami .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Minami, S., Endo, T., Nomura, A. (2021). Measurement and Modeling of Performance of HPC Applications Towards Overcommitting Scheduling Systems. In: Klusáček, D., Cirne, W., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2021. Lecture Notes in Computer Science(), vol 12985. Springer, Cham. https://doi.org/10.1007/978-3-030-88224-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88224-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88223-5

  • Online ISBN: 978-3-030-88224-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics