Skip to main content

Automatic Performance Modeling of HPC Applications

  • Conference paper
  • First Online:
Software for Exascale Computing - SPPEXA 2013-2015

Abstract

Many existing applications suffer from inherent scalability limitations that will prevent them from running at exascale. Current tuning practices, which rely on diagnostic experiments, have drawbacks because (i) they detect scalability problems relatively late in the development process when major effort has already been invested into an inadequate solution and (ii) they incur the extra cost of potentially numerous full-scale experiments. Analytical performance models, in contrast, allow application developers to address performance issues already during the design or prototyping phase. Unfortunately, the difficulties of creating such models combined with the lack of appropriate tool support still render performance modeling an esoteric discipline mastered only by a relatively small community of experts. This article summarizes the results of the Catwalk project, which aimed to create tools that automate key activities of the performance modeling process, making this powerful methodology accessible to a wider audience of HPC application developers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.scalasca.org/software/extra-p/download.html

  2. 2.

    http://www.scalasca.org/software/extra-p/documentation.html

References

  1. Adhianto, L., Banerjee, S., Fagan, M.W., Krentel, M.W., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exper. 22 (6), 685–701 (2010)

    Google Scholar 

  2. Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks–summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (SC), Albuquerque, pp. 158–165. ACM (1991)

    Google Scholar 

  3. Bauer, G., Gottlieb, S., Hoefler, T.: Performance modeling and comparative analysis of the MILC lattice QCD application su3_rmd. In: Proceedings of the CCGrid, Ottawa, pp. 652–659. IEEE (2012)

    Google Scholar 

  4. Behr, M., Nicolai, M., Probst, M.: Efficient parallel simulations in support of medical device design. NIC Ser. 38, 19–26 (2008)

    Google Scholar 

  5. Benabderrahmane, M.W., Pouchet, L.N., Cohen, A., Bastoul, C.: The polyhedral model is more widely applicable than you think. In: Gupta, R. (ed.) Compiler Construction. LNCS, vol. 6011, pp. 283–303. Springer (2010). http://dx.doi.org/10.1007/978-3-642-11970-5_16

  6. Bhattacharyya, A., Kwasniewski, G., Hoefler, T.: Using compiler techniques to improve automatic performance modeling. In: Accepted at the 24th International Conference on Parallel Architectures and Compilation (PACT’15), San Francisco. ACM (2015)

    Google Scholar 

  7. Bhattacharyya, A., Hoefler, T.: PEMOGEN: automatic adaptive performance modeling during program runtime. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques (PACT’14). ACM, Edmonton (2014)

    Google Scholar 

  8. Blanc, R., Henzinger, T.A., Hottelier, T., Kovacs, L.: ABC: algebraic bound computation for loops. In: Clarke, E., Voronkov, A. (eds.) Logic for Programming, Artificial Intelligence, and Reasoning. LNCS, vol. 6355, pp. 103–118 (2010). http://dx.doi.org/10.1007/978-3-642-17511-4_7

    MathSciNet  MATH  Google Scholar 

  9. Bull, J.M., O’Neill, D.: A microbenchmark suite for OpenMP 2.0. ACM Comput. Architech. News 29 (5), 41–48 (2001)

    Google Scholar 

  10. Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC13), Denver, pp. 1–12. ACM (2013)

    Google Scholar 

  11. Carrington, L., Snavely, A., Wolter, N.: A performance prediction framework for scientific applications. Future Gener. Comput. Syst. 22 (3), 336–346 (2006). http://dx.doi.org/10.1016/j.future.2004.11.019

    Article  Google Scholar 

  12. Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.: Collective communication: theory, practice, and experience. Concurr. Comput. Pract. Exp. 19 (13), 1749–1783 (2007)

    Article  Google Scholar 

  13. Dennis, J.M., Edwards, J., Evans, K.J., Guba, O., Lauritzen, P.H., Mirin, A.A., St-Cyr, A., Taylor, M.A., Worley, P.H.: CAM-SE: a scalable spectral element dynamical core for the community atmosphere model. Int. J. High Perform. Comput. 26 (1), 74–89 (2012). http://hpc.sagepub.com/content/26/1/74.abstract

    Article  Google Scholar 

  14. Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput. Pract. Exp. 22 (6), 702–719 (2010)

    Google Scholar 

  15. Gewaltig, M.O., Diesmann, M.: Nest (neural simulation tool). Scholarpedia J. 2 (4), 1430 (2007)

    Article  Google Scholar 

  16. Goldsmith, S.F., Aiken, A.S., Wilkerson, D.S.: Measuring empirical computational complexity. In: Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC-FSE ’07), New York, pp. 395–404. ACM (2007). http://doi.acm.org/10.1145/1287624.1287681

  17. Hammer, J., Hager, G., Eitzinger, J., Wellein, G.: Automatic loop kernel analysis and performance modeling with kerncraft. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems (PMBS ’15), New York, pp. 4:1–4:11. ACM (2015). http://doi.acm.org/10.1145/2832087.2832092

  18. Hoefler, T., Kwasniewski, G.: Automatic complexity analysis of explicitly parallel programs. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’14), Prague. ACM (2014)

    Google Scholar 

  19. Hoefler, T., Snir, M.: Performance engineering: a must for petaflops and beyond. In: Proceedings of the Workshop on Large-Scale System and Application Performance (LSAP), in Conjunction with HPDC, San Jose. ACM (2011)

    Google Scholar 

  20. Hoefler, T., Gropp, W., Kramer, W., Snir, M.: Performance modeling for systematic performance tuning. In: State of the Practice Reports (SC ’11), pp. 6:1–6:12. ACM (2011). http://doi.acm.org/10.1145/2063348.2063356

  21. Hoefler, T., Kwasniewski, G.: Automatic complexity analysis of explicitly parallel programs. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’14), New York, pp. 226–235. ACM (2014). http://doi.acm.org/10.1145/2612669.2612685

  22. Iwainsky, C., Shudler, S., Calotoiu, A., Strube, A., Knobloch, M., Bischof, C., Wolf, F.: How many threads will be too many? On the scalability of OpenMP implementations. In: Proceedings of the 21st Euro-Par Conference, Vienna. LNCS, vol. 9233, pp. 451–463. Springer (2015)

    Google Scholar 

  23. Jayakumar, A., Murali, P., Vadhiyar, S.: Matching application signatures for performance predictions using a single execution. In: 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Hyderabad, pp. 1161–1170. IEEE (2015)

    Google Scholar 

  24. JuBE – Jülich Benchmarking Environment (2016). http://www.fz-juelich.de/jsc/jube

  25. JuSPIC – Jülich Scalable Particle-in-Cell Code (2016). http://www.fz-juelich.de/ias/jsc/EN/Expertise/High-Q-Club/JuSPIC/_node.html

  26. Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC’01), Denver, p. 37. ACM (2001)

    Google Scholar 

  27. LLVM home page (2016). http://llvm.org/

  28. Lo, Y.J., Williams, S., Van Straalen, B., Ligocki, T.J., Cordery, M.J., Wright, N.J., Hall, M.W., Oliker, L.: Roofline model toolkit: a practical tool for architectural and program analysis. In: High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, New Orleans, pp. 129–148. Springer (2014)

    Google Scholar 

  29. Marin, G., Mellor-Crummey, J.: Cross-architecture performance predictions for scientific applications using parameterized models. SIGMETRICS Perform. Eval. Rev. 32 (1), 2–13 (2004). http://doi.acm.org/10.1145/1012888.1005691

    Article  Google Scholar 

  30. MILC Code Version 7 (2016). http://www.physics.utah.edu/~detar/milc/milc_qcd.html

  31. Pllana, S., Brandic, I., Benkner, S.: Performance modeling and prediction of parallel and distributed computing systems: a survey of the state of the art. In: Proceedings of the 1st International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), Vienna, pp. 279–284. IEEE (2007)

    Google Scholar 

  32. Shudler, S., Calotoiu, A., Hoefler, T., Strube, A., Wolf, F.: Exascaling your library: will your implementation meet your expectations? In: Proceedings of the 29th ACM on International Conference on Supercomputing (ICS ’15), New York, pp. 165–175. ACM (2015). http://doi.acm.org/10.1145/2751205.2751216

  33. Siegmund, N., Grebhahn, A., Apel, S., Kästner, C.: Performance-influence models for highly configurable systems. In: Proceedings of the 2015-10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015), New York, pp. 284–294. ACM (2015). http://doi.acm.org/10.1145/2786805.2786845

  34. Spafford, K.L., Vetter, J.S.: Aspen: a domain specific language for performance modeling. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12), Los Alamitos, pp. 84:1–84:11. IEEE Computer Society Press (2012). http://dl.acm.org/citation.cfm?id=2388996.2389110

  35. Sutmann, G., Westphal, L., Bolten, M.: Particle based simulations of complex systems with mp2c: hydrodynamics and electrostatics. In: International Conference of Numerical Analysis and Applied Mathematics 2010 (ICNAAM 2010), Rhodes, vol. 1281, pp. 1768–1772. AIP Publishing (2010)

    Google Scholar 

  36. Tallent, N.R., Hoisie, A.: Palm: easing the burden of analytical performance modeling. In: Proceedings of the 28th ACM International Conference on Supercomputing (ICS ’14), NewYork, pp. 221–230. ACM (2014). http://doi.acm.org/10.1145/2597652.2597683

  37. Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in mpich. Int. J. High Perform. Comput. 19 (1), 49–66 (2005)

    Article  Google Scholar 

  38. Vetter, J., Worley, P.: Asserting performance expectations. In: Proceedings of the ACM/IEEE Conference on Supercomputing, Baltimore, pp. 1–13. ACM (2002)

    Google Scholar 

  39. Vogel, A., Reiter, S., Rupp, M., Nägel, A., Wittum, G.: UG 4: a novel flexible software system for simulating PDE based models on high performance computers. Comput. Vis. Sci. 16 (4), 165–179 (2013)

    Article  Google Scholar 

  40. Vogel, A., Calotoiu, A., Strube, A., Reiter, S., Nägel, A., Wolf, F., Wittum, G.: 10,000 performance models per minute – scalability of the ug4 simulation framework. In: Proceedings of the 21st Euro-Par Conference, Vienna. LNCS, vol. 9233, pp. 519–531. Springer (2015)

    Google Scholar 

  41. Vömel, C.: ScaLAPACK’s MRRR algorithm. ACM T. Math. Softw. 37 (1), 1:1–1:35 (2010)

    Google Scholar 

  42. Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical models for empirical search-based performance tuning. Int. J. High Perform. Comput. 18 (1), 65–94 (2004). http://dx.doi.org/10.1177/1094342004041293

    Article  MATH  Google Scholar 

  43. Wasserman, H., Hoisie, A., Lubeck, O., Lubeck, O.: Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. Int. J. High Perform. Comput. 14, 330–346 (2000)

    Article  Google Scholar 

  44. Wu, X., Müller, F.: Scalaextrap: trace-based communication extrapolation for SPMD programs. ACM T. Lang. Sys. 34 (1), 113–122 (2012)

    Google Scholar 

  45. Wylie, B.J.N., Geimer, M., Mohr, B., Böhme, D., Szebenyi, Z., Wolf, F.: Large-scale performance analysis of Sweep3D with the Scalasca toolset. Parallel Process. Lett. 20 (4), 397–414 (2010)

    Article  MathSciNet  Google Scholar 

  46. Zaparanuks, D., Hauswirth, M.: Algorithmic profiling. Sigplan Not. 47 (6), 67–76 (2012). http://doi.acm.org/10.1145/2345156.2254074

    Google Scholar 

  47. Zhai, J., Chen, W., Zheng, W.: Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. Sigplan Not. 45 (5), 305–314 (2010). http://doi.acm.org/10.1145/1837853.1693493

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandru Calotoiu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Wolf, F. et al. (2016). Automatic Performance Modeling of HPC Applications. In: Bungartz, HJ., Neumann, P., Nagel, W. (eds) Software for Exascale Computing - SPPEXA 2013-2015. Lecture Notes in Computational Science and Engineering, vol 113. Springer, Cham. https://doi.org/10.1007/978-3-319-40528-5_20

Download citation

Publish with us

Policies and ethics