Skip to main content

Tools for Monitoring CPU Usage and Affinity in Multicore Supercomputers

  • Conference paper
  • First Online:
Tools and Techniques for High Performance Computing (HUST 2019, SE-HER 2019, WIHPC 2019)

Abstract

Performance boosts in HPC nodes have come from making SIMD units wider and aggressively packing more and more cores in each processor. With multiple processors and so many cores it has become necessary to understand and manage process and thread affinity and pinning. However, affinity tools have not been designed specifically for HPC users to quickly evaluate process affinity and execution location. To fill in the gap, three HPC user-friendly tools, core_usage, show_affinity, and amask, have been designed to eliminate barriers that frustrate users and impede users from evaluating and analyzing affinity for applications. These tools focus on providing convenient methods, easy-to-understand affinity representations for large process counts, process locality, and run-time core load with socket aggregation. These tools will significantly help HPC users, developers and site administrators easily monitor processor utilization from an affinity perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. 2017 IXPUG US Annual Meeting, Austin, TX, USA (2017). https://www.ixpug.org/events/ixpug-2017-us. Accessed 27 Aug 2019

  2. Linux Documentation: numactl(8): Linux man page (2019). https://linux.die.net/man/8/numactl. Accessed 27 Aug 2019

  3. Linux Documentation: ps(1): Linux man page (2019). https://linux.die.net/man/1/ps. Accessed 27 Aug 2019

  4. Linux Documentation: pthread\(\_\)setaffinity\(\_\)np(3) - Linux man page (2019). https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html. Accessed 27 Aug 2019

  5. Linux Documentation: sched\(\_\)getaffinity(2): Linux man page (2019). https://linux.die.net/man/2/sched_getaffinity. Accessed 27 Aug 2019

  6. Linux Documentation: sched\(\_\)setaffinity(2): Linux man page (2019). https://linux.die.net/man/2/sched_setaffinity. Accessed 27 Aug 2019

  7. Linux Documentation: taskset(1): Linux man page (2019). https://linux.die.net/man/1/taskset. Accessed 27 Aug 2019

  8. Linux Documentation: top(1) - Linux man page (2019). https://linux.die.net/man/1/top. Accessed 27 Aug 2019

  9. Broquedis, F., et al.: hwloc: A generic framework for managing hardware affinities in HPC applications. In: PDP 2010 - The 18th Euromicro International Conference on Parallel, Distributed and Network- Based Computing (2010)

    Google Scholar 

  10. Evans, T., et al.: Comprehensive resource use monitoring for HPC systems with TACC stats. In: 2014 First International Workshop on HPC User Support Tools, pp. 13–21, November 2014. https://doi.org/10.1109/HUST.2014.7

  11. Hafner, J., Kresse, G.: The Vienna AB-initio simulation program VASP: an efficient and versatile tool for studying the structural, dynamic, and electronic properties of materials. In: Gonis, A., Meike, A., Turchi, P.E.A. (eds.) Properties of Complex Inorganic Solids, pp. 69–82. Springer, Boston (1997). https://doi.org/10.1007/978-1-4615-5943-6_10

    Chapter  Google Scholar 

  12. Hennessy, J., Patterson, D.: Computer Architecture: A Quantitative Approach. The Morgan Kaufmann Series in Computer Architecture and Design, 6th edn. Elsevier, Amsterdam (2017)

    MATH  Google Scholar 

  13. IBM: POWER9 Servers Overview, Scalable servers to meet the business needs of tomorrow (2019). https://www.ibm.com/downloads/cas/KDQRVQRR. Accessed 27 Aug 2019

  14. Intel: Intel Math Kernel Library Developer Reference (2019). https://software.intel.com/en-us/articles/mkl-reference-manual. Accessed 27 Aug 2019

  15. Intel-developers (2019). https://software.intel.com/en-us/mpi-library. Accessed 27 Aug 2019

  16. Lawrence Livermore National Laboratory: Sierra supercomputer (2019). https://computation.llnl.gov/computers/sierra. Accessed 27 Aug 2019

  17. Mvapich-developers (2019). http://mvapich.cse.ohio-state.edu/. Accessed 27 Aug 2019

  18. National Supercomputer Center in Wuxi: The Sunway TaihuLight system (2019). http://www.nsccwx.cn/wxcyw/soft1.php?word=soft&i=46. Accessed 27 Aug 2019

  19. Oak Ridge National Lab: Summit: Oak Ridge National Laboratory’s 200 petaflop supercomputer (2019). https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/. Accessed 27 Aug 2019

  20. OpenMP Architecture Review Board: OpenMP Application Programming Interface, Version 4.5, November 2015 (2015)

    Google Scholar 

  21. OpenMP Architecture Review Board: OpenMP Application Programming Interface, Version 5.0, November 2018 (2018)

    Google Scholar 

  22. Phillips, J.C., et al.: Scalable molecular dynamics with NAMD. J. Comput. Chem. 26, 1781–1802 (2005)

    Article  Google Scholar 

  23. Roehl, T., Treibig, J., Hager, G., Wellein, G.: Overhead analysis of performance counter measurements. In: 43rd International Conference on Parallel Processing Workshops (ICCPW), pp. 176–185, September 2014. https://doi.org/10.1109/ICPPW.2014.34

  24. TACC Staff: TACC: amask project page (2019). https://github.com/TACC/amask/. Accessed 27 Aug 2019

  25. TACC Staff: TACC core\(\_\)usage project page (2019). https://github.com/TACC/core_usage/. Accessed 27 Aug 2019

  26. TACC Staff: TACC show\(\_\)affinity project page (2019). https://github.com/TACC/show_affinity/. Accessed 27 Aug 2019

  27. Texas Advanced Computing Center: Frontera User Guide (2019). https://portal.tacc.utexas.edu/user-guides/frontera. Accessed 27 Aug 2019

  28. Texas Advanced Computing Center: Stampede2 User Guide (2019). https://portal.tacc.utexas.edu/user-guides/stampede2. Accessed 27 Aug 2019

  29. Travis, O.: NumPy: A Guide to NumPy. Trelgol Publishing, USA (2006). http://www.numpy.org/. Accessed 27 Aug 2019

  30. Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego, CA (2010)

    Google Scholar 

  31. Wikipedia contributors: List of Intel CPU microarchitectures (2019). https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures. Accessed 27 Aug 2019

  32. Wikipedia contributors: The Sunway TaihuLight Supercomputer (2019). https://en.wikipedia.org/wiki/Sunway_TaihuLight. Accessed 27 Aug 2019

Download references

Acknowledgments

We would like to thank all our users who worked with these new tools and provided us with constructive feedback and suggestions to make improvements. We would also like to thank our colleagues in the High-Performance Computing group and Advanced Computing Systems group who provided expertise and insight that significantly assisted this work. Particularly, we would like to show our gratitude to Hang Liu, Albert Lu, John Cazes, Robert McLay, Victor Eijkhout, and Bill Barth who helped us design, test, and debug the early versions of these products. We also appreciate the technical writing assistance from Bob Garza.

All these tools are mainly developed and tested on TACC’s supercomputer systems, including Stampede, Stampede2, Lonestar5, Wrangler, Maverick2, and Frontera. The computation of all experiments was supported by the National Science Foundation, through the Frontera (OAC-1818253), Stampede2 (OAC-1540931) and XSEDE (ACI-1953575) awards.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Si Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, L., Milfeld, K., Liu, S. (2020). Tools for Monitoring CPU Usage and Affinity in Multicore Supercomputers. In: Juckeland, G., Chandrasekaran, S. (eds) Tools and Techniques for High Performance Computing. HUST SE-HER WIHPC 2019 2019 2019. Communications in Computer and Information Science, vol 1190. Springer, Cham. https://doi.org/10.1007/978-3-030-44728-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-44728-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-44727-4

  • Online ISBN: 978-3-030-44728-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics