skip to main content
10.1145/2751205.2751216acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Exascaling Your Library: Will Your Implementation Meet Your Expectations?

Published: 08 June 2015 Publication History

Abstract

Many libraries in the HPC field encapsulate sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very long time, we go one step further and show how this comparison can become part of automated testing procedures. The most important applications of our method include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. Advancing the concept of performance assertions, we verify asymptotic scaling trends rather than precise analytical expressions, relieving the developer from the burden of having to specify and maintain very fine grained and potentially non-portable expectations. In this way, scalability validation can be continuously applied throughout the whole development cycle with very little effort. Using MPI as an example, we show how our method can help uncover non-obvious limitations of both libraries and underlying platforms.

References

[1]
JuBE: Jülich Benchmarking Environment. http://www.fz-juelich.de/jsc/jube.
[2]
ParaStation MPI User's Guide. http://docs.par-tec.com/html/psmpi-userguide/index.html.
[3]
A. Adinetz, J. Kraus, J. Meinke, and D. Pleiter. GPUMAFIA: Efficient Subspace Clustering with MAFIA on GPUs. In Proc of the 19th International Conference on Parallel Processing, Euro-Par'13, pages 838--849. Springer-Verlag, 2013.
[4]
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, T. Hoefler, S. Kumar, E. Lusk, R. Thakur, and J. L. Traeff. MPI on Millions of Cores. Parallel Processing Letters (PPL), 21(1):45--60, Mar. 2011.
[5]
A. Bhatele, K. Mohror, S. H. Langer, and K. E. Isaacs. There Goes the Neighborhood: Performance Degradation Due to Nearby Jobs. In Proc. of the ACM/IEEE Conference on Supercomputing, SC '13, pages 41:1--41:12. ACM, 2013.
[6]
A. Calotoiu, T. Hoefler, M. Poke, and F. Wolf. Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In Proc. of the ACM/IEEE Conference on Supercomputing, SC '13, pages 45:1--45:12. ACM, 2013.
[7]
E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. Collective Communication: Theory, Practice, and Experience. Concurr. Comput. - Pract. Exper., 19(13):1749--1783, Sep 2007.
[8]
D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker. The IBM Blue Gene/Q Interconnection Network and Message Unit. In Proc. of the ACM/IEEE Conference on Supercomputing, SC '11, pages 26:1--26:10. ACM, 2011.
[9]
T. Hoefler, A. Lumsdaine, and W. Rehm. Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In Proc. of the ACM/IEEE Conference on Supercomputing, SC '07, pages 52:1--52:10. IEEE Computer Society/ACM, 2007.
[10]
T. Hoefler, T. Schneider, and A. Lumsdaine. Accurately Measuring Collective Operations at Massive Scale. In Proc. of the IEEE International Parallel & Distributed Processing Symp., IPDPS '08, pages 1--8, 2008.
[11]
T. Hoefler, T. Schneider, and A. Lumsdaine. The Impact of Network Noise at Large-Scale Communication Performance. In Proc. of the IEEE International Parallel & Distributed Processing Symp., IPDPS '09, pages 1--8, 2009.
[12]
T. Hoefler and M. Snir. Generic Topology Mapping Strategies for Large-scale Parallel Architectures. In Proc. of the ACM International Conference on Supercomputing, ICS '11, pages 75--84. ACM, 2011.
[13]
S. Kumar, A. R. Mamidala, D. A. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. Steinmacher-Burrow. PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer. In Proc. of the IEEE International Parallel & Distributed Processing Symp., IPDPS '12, pages 763--773. IEEE Computer Society, 2012.
[14]
M. R. Meswani, L. Carrington, D. Unat, A. Snavely, S. Baden, and S. Poole. Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators. Int. J. High Perform. Comput. Appl., 27(2):89--108, May 2013.
[15]
J. Pješivac-Grbović, T. Angskun, G. Bosilca, G. E. Fagg, E. Gabriel, and J. J. Dongarra. Performance Analysis of MPI Collective Operations. Cluster Computing, 10(2):127--143, 2007.
[16]
M. Poke. SymPtOM: Informed Automatic Performance Modeling. Master's thesis, German Research School for Simulation Sciences, Aachen, Germany, Oct 2013.
[17]
R. Reussner, P. Sanders, and J. L. Träff. SKaMPI: a comprehensive benchmark for public benchmarking of MPI. Scientific Programming, 10(1):55--65, 2002.
[18]
N. R. Tallent and A. Hoisie. Palm: Easing the Burden of Analytical Performance Modeling. In Proc. of the ACM International Conference on Supercomputing, ICS '14, pages 221--230. ACM, 2014.
[19]
R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of Collective Communication Operations in MPICH. International Journal of High Performance Computing Applications, 19(1):49--66, 2005.
[20]
J. L. Träff. mpicroscope: Towards an MPI Benchmark Tool for Performance Guideline Verification. In Proc. of the European MPI Users' Group Meeting, EuroMPI '12, pages 100--109. Springer-Verlag, 2012.
[21]
J. S. Vetter and P. H. Worley. Asserting Performance Expectations. In Proc. of the ACM/IEEE Conference on Supercomputing, SC '02, pages 1--13. IEEE Computer Society Press, 2002.
[22]
C. Vömel. ScaLAPACK's MRRR algorithm. ACM Transactions on Mathematical Software (TOMS), 37(1), 2010.
[23]
T. Worsch, R. Reussner, and W. Augustin. On Benchmarking Collective MPI Operations. In Proc. of the European PVM/MPI Users' Group Meeting, pages 271--279. Springer-Verlag, 2002.

Cited By

View all
  • (2023)Verifying Performance Guidelines for MPI Collectives at ScaleProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3625532(1264-1268)Online publication date: 12-Nov-2023
  • (2022)Performance Models for Heterogeneous Iterative ProgramsInternational Journal of Networking and Computing10.15803/ijnc.12.1_13112:1(131-163)Online publication date: 2022
  • (2022)Performance-detectiveProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532391(1-13)Online publication date: 28-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
June 2015
446 pages
ISBN:9781450335591
DOI:10.1145/2751205
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. high performance computing
  2. parallel programming
  3. performance analysis
  4. software engineering

Qualifiers

  • Research-article

Funding Sources

  • Swiss National Science Foundation
  • German Research Foundation

Conference

ICS'15
Sponsor:
ICS'15: 2015 International Conference on Supercomputing
June 8 - 11, 2015
California, Newport Beach, USA

Acceptance Rates

ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Verifying Performance Guidelines for MPI Collectives at ScaleProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3625532(1264-1268)Online publication date: 12-Nov-2023
  • (2022)Performance Models for Heterogeneous Iterative ProgramsInternational Journal of Networking and Computing10.15803/ijnc.12.1_13112:1(131-163)Online publication date: 2022
  • (2022)Performance-detectiveProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532391(1-13)Online publication date: 28-Jun-2022
  • (2021)Extracting clean performance models from tainted programsProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441613(403-417)Online publication date: 17-Feb-2021
  • (2021)Performance Models for Hybrid Programs Accelerated by GPUs2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW52791.2021.00098(641-651)Online publication date: Jun-2021
  • (2021)csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUs2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00037(192-203)Online publication date: Sep-2021
  • (2020)Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00095(884-895)Online publication date: May-2020
  • (2020)ExtraPeak: Advanced Automatic Performance Modeling for HPC ApplicationsSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_15(453-482)Online publication date: 31-Jul-2020
  • (2019)Engineering Algorithms for Scalability through Continuous Validation of Performance ExpectationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2896993(1-1)Online publication date: 2019
  • (2019)Automatic Instrumentation Refinement for Empirical Performance Modeling2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)10.1109/ProTools49597.2019.00011(40-47)Online publication date: Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media