Automatic Performance Modeling of HPC Applications

Wolf, Felix; Bischof, Christian; Calotoiu, Alexandru; Hoefler, Torsten; Iwainsky, Christian; Kwasniewski, Grzegorz; Mohr, Bernd; Shudler, Sergei; Strube, Alexandre; Vogel, Andreas; Wittum, Gabriel

doi:10.1007/978-3-319-40528-5_20

Felix Wolf¹⁰,
Christian Bischof¹⁰,
Alexandru Calotoiu¹⁰,
Torsten Hoefler¹¹,
Christian Iwainsky¹⁰,
Grzegorz Kwasniewski¹¹,
Bernd Mohr¹²,
Sergei Shudler¹⁰,
Alexandre Strube¹²,
Andreas Vogel¹³ &
…
Gabriel Wittum¹³

Part of the book series: Lecture Notes in Computational Science and Engineering ((LNCSE,volume 113))

1039 Accesses
3 Citations

Abstract

Many existing applications suffer from inherent scalability limitations that will prevent them from running at exascale. Current tuning practices, which rely on diagnostic experiments, have drawbacks because (i) they detect scalability problems relatively late in the development process when major effort has already been invested into an inadequate solution and (ii) they incur the extra cost of potentially numerous full-scale experiments. Analytical performance models, in contrast, allow application developers to address performance issues already during the design or prototyping phase. Unfortunately, the difficulties of creating such models combined with the lack of appropriate tool support still render performance modeling an esoteric discipline mastered only by a relatively small community of experts. This article summarizes the results of the Catwalk project, which aimed to create tools that automate key activities of the performance modeling process, making this powerful methodology accessible to a wider audience of HPC application developers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Adhianto, L., Banerjee, S., Fagan, M.W., Krentel, M.W., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exper. 22 (6), 685–701 (2010)
Google Scholar
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks–summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (SC), Albuquerque, pp. 158–165. ACM (1991)
Google Scholar
Bauer, G., Gottlieb, S., Hoefler, T.: Performance modeling and comparative analysis of the MILC lattice QCD application su3_rmd. In: Proceedings of the CCGrid, Ottawa, pp. 652–659. IEEE (2012)
Google Scholar
Behr, M., Nicolai, M., Probst, M.: Efficient parallel simulations in support of medical device design. NIC Ser. 38, 19–26 (2008)
Google Scholar
Benabderrahmane, M.W., Pouchet, L.N., Cohen, A., Bastoul, C.: The polyhedral model is more widely applicable than you think. In: Gupta, R. (ed.) Compiler Construction. LNCS, vol. 6011, pp. 283–303. Springer (2010). http://dx.doi.org/10.1007/978-3-642-11970-5_16
Bhattacharyya, A., Kwasniewski, G., Hoefler, T.: Using compiler techniques to improve automatic performance modeling. In: Accepted at the 24th International Conference on Parallel Architectures and Compilation (PACT’15), San Francisco. ACM (2015)
Google Scholar
Bhattacharyya, A., Hoefler, T.: PEMOGEN: automatic adaptive performance modeling during program runtime. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques (PACT’14). ACM, Edmonton (2014)
Google Scholar
Blanc, R., Henzinger, T.A., Hottelier, T., Kovacs, L.: ABC: algebraic bound computation for loops. In: Clarke, E., Voronkov, A. (eds.) Logic for Programming, Artificial Intelligence, and Reasoning. LNCS, vol. 6355, pp. 103–118 (2010). http://dx.doi.org/10.1007/978-3-642-17511-4_7
MathSciNet MATH Google Scholar
Bull, J.M., O’Neill, D.: A microbenchmark suite for OpenMP 2.0. ACM Comput. Architech. News 29 (5), 41–48 (2001)
Google Scholar
Calotoiu, A., Hoefler, T., Poke, M., Wolf, F.: Using automated performance modeling to find scalability bugs in complex codes. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC13), Denver, pp. 1–12. ACM (2013)
Google Scholar
Carrington, L., Snavely, A., Wolter, N.: A performance prediction framework for scientific applications. Future Gener. Comput. Syst. 22 (3), 336–346 (2006). http://dx.doi.org/10.1016/j.future.2004.11.019
Article Google Scholar
Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.: Collective communication: theory, practice, and experience. Concurr. Comput. Pract. Exp. 19 (13), 1749–1783 (2007)
Article Google Scholar
Dennis, J.M., Edwards, J., Evans, K.J., Guba, O., Lauritzen, P.H., Mirin, A.A., St-Cyr, A., Taylor, M.A., Worley, P.H.: CAM-SE: a scalable spectral element dynamical core for the community atmosphere model. Int. J. High Perform. Comput. 26 (1), 74–89 (2012). http://hpc.sagepub.com/content/26/1/74.abstract
Article Google Scholar
Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput. Pract. Exp. 22 (6), 702–719 (2010)
Google Scholar
Gewaltig, M.O., Diesmann, M.: Nest (neural simulation tool). Scholarpedia J. 2 (4), 1430 (2007)
Article Google Scholar
Goldsmith, S.F., Aiken, A.S., Wilkerson, D.S.: Measuring empirical computational complexity. In: Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC-FSE ’07), New York, pp. 395–404. ACM (2007). http://doi.acm.org/10.1145/1287624.1287681
Hammer, J., Hager, G., Eitzinger, J., Wellein, G.: Automatic loop kernel analysis and performance modeling with kerncraft. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems (PMBS ’15), New York, pp. 4:1–4:11. ACM (2015). http://doi.acm.org/10.1145/2832087.2832092
Hoefler, T., Kwasniewski, G.: Automatic complexity analysis of explicitly parallel programs. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’14), Prague. ACM (2014)
Google Scholar
Hoefler, T., Snir, M.: Performance engineering: a must for petaflops and beyond. In: Proceedings of the Workshop on Large-Scale System and Application Performance (LSAP), in Conjunction with HPDC, San Jose. ACM (2011)
Google Scholar
Hoefler, T., Gropp, W., Kramer, W., Snir, M.: Performance modeling for systematic performance tuning. In: State of the Practice Reports (SC ’11), pp. 6:1–6:12. ACM (2011). http://doi.acm.org/10.1145/2063348.2063356
Hoefler, T., Kwasniewski, G.: Automatic complexity analysis of explicitly parallel programs. In: Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’14), New York, pp. 226–235. ACM (2014). http://doi.acm.org/10.1145/2612669.2612685
Iwainsky, C., Shudler, S., Calotoiu, A., Strube, A., Knobloch, M., Bischof, C., Wolf, F.: How many threads will be too many? On the scalability of OpenMP implementations. In: Proceedings of the 21st Euro-Par Conference, Vienna. LNCS, vol. 9233, pp. 451–463. Springer (2015)
Google Scholar
Jayakumar, A., Murali, P., Vadhiyar, S.: Matching application signatures for performance predictions using a single execution. In: 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Hyderabad, pp. 1161–1170. IEEE (2015)
Google Scholar
JuBE – Jülich Benchmarking Environment (2016). http://www.fz-juelich.de/jsc/jube
JuSPIC – Jülich Scalable Particle-in-Cell Code (2016). http://www.fz-juelich.de/ias/jsc/EN/Expertise/High-Q-Club/JuSPIC/_node.html
Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the ACM/IEEE Conference on Supercomputing (SC’01), Denver, p. 37. ACM (2001)
Google Scholar
LLVM home page (2016). http://llvm.org/
Lo, Y.J., Williams, S., Van Straalen, B., Ligocki, T.J., Cordery, M.J., Wright, N.J., Hall, M.W., Oliker, L.: Roofline model toolkit: a practical tool for architectural and program analysis. In: High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, New Orleans, pp. 129–148. Springer (2014)
Google Scholar
Marin, G., Mellor-Crummey, J.: Cross-architecture performance predictions for scientific applications using parameterized models. SIGMETRICS Perform. Eval. Rev. 32 (1), 2–13 (2004). http://doi.acm.org/10.1145/1012888.1005691
Article Google Scholar
MILC Code Version 7 (2016). http://www.physics.utah.edu/~detar/milc/milc_qcd.html
Pllana, S., Brandic, I., Benkner, S.: Performance modeling and prediction of parallel and distributed computing systems: a survey of the state of the art. In: Proceedings of the 1st International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), Vienna, pp. 279–284. IEEE (2007)
Google Scholar
Shudler, S., Calotoiu, A., Hoefler, T., Strube, A., Wolf, F.: Exascaling your library: will your implementation meet your expectations? In: Proceedings of the 29th ACM on International Conference on Supercomputing (ICS ’15), New York, pp. 165–175. ACM (2015). http://doi.acm.org/10.1145/2751205.2751216
Siegmund, N., Grebhahn, A., Apel, S., Kästner, C.: Performance-influence models for highly configurable systems. In: Proceedings of the 2015-10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015), New York, pp. 284–294. ACM (2015). http://doi.acm.org/10.1145/2786805.2786845
Spafford, K.L., Vetter, J.S.: Aspen: a domain specific language for performance modeling. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12), Los Alamitos, pp. 84:1–84:11. IEEE Computer Society Press (2012). http://dl.acm.org/citation.cfm?id=2388996.2389110
Sutmann, G., Westphal, L., Bolten, M.: Particle based simulations of complex systems with mp2c: hydrodynamics and electrostatics. In: International Conference of Numerical Analysis and Applied Mathematics 2010 (ICNAAM 2010), Rhodes, vol. 1281, pp. 1768–1772. AIP Publishing (2010)
Google Scholar
Tallent, N.R., Hoisie, A.: Palm: easing the burden of analytical performance modeling. In: Proceedings of the 28th ACM International Conference on Supercomputing (ICS ’14), NewYork, pp. 221–230. ACM (2014). http://doi.acm.org/10.1145/2597652.2597683
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in mpich. Int. J. High Perform. Comput. 19 (1), 49–66 (2005)
Article Google Scholar
Vetter, J., Worley, P.: Asserting performance expectations. In: Proceedings of the ACM/IEEE Conference on Supercomputing, Baltimore, pp. 1–13. ACM (2002)
Google Scholar
Vogel, A., Reiter, S., Rupp, M., Nägel, A., Wittum, G.: UG 4: a novel flexible software system for simulating PDE based models on high performance computers. Comput. Vis. Sci. 16 (4), 165–179 (2013)
Article Google Scholar
Vogel, A., Calotoiu, A., Strube, A., Reiter, S., Nägel, A., Wolf, F., Wittum, G.: 10,000 performance models per minute – scalability of the ug4 simulation framework. In: Proceedings of the 21st Euro-Par Conference, Vienna. LNCS, vol. 9233, pp. 519–531. Springer (2015)
Google Scholar
Vömel, C.: ScaLAPACK’s MRRR algorithm. ACM T. Math. Softw. 37 (1), 1:1–1:35 (2010)
Google Scholar
Vuduc, R., Demmel, J.W., Bilmes, J.A.: Statistical models for empirical search-based performance tuning. Int. J. High Perform. Comput. 18 (1), 65–94 (2004). http://dx.doi.org/10.1177/1094342004041293
Article MATH Google Scholar
Wasserman, H., Hoisie, A., Lubeck, O., Lubeck, O.: Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. Int. J. High Perform. Comput. 14, 330–346 (2000)
Article Google Scholar
Wu, X., Müller, F.: Scalaextrap: trace-based communication extrapolation for SPMD programs. ACM T. Lang. Sys. 34 (1), 113–122 (2012)
Google Scholar
Wylie, B.J.N., Geimer, M., Mohr, B., Böhme, D., Szebenyi, Z., Wolf, F.: Large-scale performance analysis of Sweep3D with the Scalasca toolset. Parallel Process. Lett. 20 (4), 397–414 (2010)
Article MathSciNet Google Scholar
Zaparanuks, D., Hauswirth, M.: Algorithmic profiling. Sigplan Not. 47 (6), 67–76 (2012). http://doi.acm.org/10.1145/2345156.2254074
Google Scholar
Zhai, J., Chen, W., Zheng, W.: Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. Sigplan Not. 45 (5), 305–314 (2010). http://doi.acm.org/10.1145/1837853.1693493
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Darmstadt, Darmstadt, Germany
Felix Wolf, Christian Bischof, Alexandru Calotoiu, Christian Iwainsky & Sergei Shudler
ETH Zurich, Zurich, Switzerland
Torsten Hoefler & Grzegorz Kwasniewski
Jülich Supercomputing Center, Juelich, Germany
Bernd Mohr & Alexandre Strube
Goethe Universität Frankfurt, Frankfurt, Germany
Andreas Vogel & Gabriel Wittum

Authors

Felix Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Christian Bischof
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Calotoiu
View author publications
You can also search for this author in PubMed Google Scholar
Torsten Hoefler
View author publications
You can also search for this author in PubMed Google Scholar
Christian Iwainsky
View author publications
You can also search for this author in PubMed Google Scholar
Grzegorz Kwasniewski
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Mohr
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Shudler
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Strube
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Vogel
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Wittum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandru Calotoiu .

Editor information

Editors and Affiliations

Technische Universität München Institut für Informatik, Garching, Bayern, Germany
Hans-Joachim Bungartz
Technische Universität München Institut für Informatik, Garching, Germany
Philipp Neumann
Technische Universität Dresden, Dresden, Germany
Wolfgang E. Nagel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wolf, F. et al. (2016). Automatic Performance Modeling of HPC Applications. In: Bungartz, HJ., Neumann, P., Nagel, W. (eds) Software for Exascale Computing - SPPEXA 2013-2015. Lecture Notes in Computational Science and Engineering, vol 113. Springer, Cham. https://doi.org/10.1007/978-3-319-40528-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-40528-5_20
Published: 15 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40526-1
Online ISBN: 978-3-319-40528-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics