research-article

Exascaling Your Library: Will Your Implementation Meet Your Expectations?

Authors:

Sergei Shudler,

Alexandru Calotoiu,

Torsten Hoefler,

Alexandre Strube,

Felix WolfAuthors Info & Claims

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Pages 165 - 175

https://doi.org/10.1145/2751205.2751216

Published: 08 June 2015 Publication History

Abstract

Many libraries in the HPC field encapsulate sophisticated algorithms with clear theoretical scalability expectations. However, hardware constraints or programming bugs may sometimes render these expectations inaccurate or even plainly wrong. While algorithm engineers have already been advocating the systematic combination of analytical performance models with practical measurements for a very long time, we go one step further and show how this comparison can become part of automated testing procedures. The most important applications of our method include initial validation, regression testing, and benchmarking to compare implementation and platform alternatives. Advancing the concept of performance assertions, we verify asymptotic scaling trends rather than precise analytical expressions, relieving the developer from the burden of having to specify and maintain very fine grained and potentially non-portable expectations. In this way, scalability validation can be continuously applied throughout the whole development cycle with very little effort. Using MPI as an example, we show how our method can help uncover non-obvious limitations of both libraries and underlying platforms.

References

[1]

JuBE: Jülich Benchmarking Environment. http://www.fz-juelich.de/jsc/jube.

[2]

ParaStation MPI User's Guide. http://docs.par-tec.com/html/psmpi-userguide/index.html.

[3]

A. Adinetz, J. Kraus, J. Meinke, and D. Pleiter. GPUMAFIA: Efficient Subspace Clustering with MAFIA on GPUs. In Proc of the 19th International Conference on Parallel Processing, Euro-Par'13, pages 838--849. Springer-Verlag, 2013.

Digital Library

[4]

P. Balaji, D. Buntinas, D. Goodell, W. Gropp, T. Hoefler, S. Kumar, E. Lusk, R. Thakur, and J. L. Traeff. MPI on Millions of Cores. Parallel Processing Letters (PPL), 21(1):45--60, Mar. 2011.

[5]

A. Bhatele, K. Mohror, S. H. Langer, and K. E. Isaacs. There Goes the Neighborhood: Performance Degradation Due to Nearby Jobs. In Proc. of the ACM/IEEE Conference on Supercomputing, SC '13, pages 41:1--41:12. ACM, 2013.

Digital Library

[6]

A. Calotoiu, T. Hoefler, M. Poke, and F. Wolf. Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes. In Proc. of the ACM/IEEE Conference on Supercomputing, SC '13, pages 45:1--45:12. ACM, 2013.

Digital Library

[7]

E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. Collective Communication: Theory, Practice, and Experience. Concurr. Comput. - Pract. Exper., 19(13):1749--1783, Sep 2007.

Digital Library

[8]

D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker. The IBM Blue Gene/Q Interconnection Network and Message Unit. In Proc. of the ACM/IEEE Conference on Supercomputing, SC '11, pages 26:1--26:10. ACM, 2011.

Digital Library

[9]

T. Hoefler, A. Lumsdaine, and W. Rehm. Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In Proc. of the ACM/IEEE Conference on Supercomputing, SC '07, pages 52:1--52:10. IEEE Computer Society/ACM, 2007.

Digital Library

[10]

T. Hoefler, T. Schneider, and A. Lumsdaine. Accurately Measuring Collective Operations at Massive Scale. In Proc. of the IEEE International Parallel & Distributed Processing Symp., IPDPS '08, pages 1--8, 2008.

Digital Library

[11]

T. Hoefler, T. Schneider, and A. Lumsdaine. The Impact of Network Noise at Large-Scale Communication Performance. In Proc. of the IEEE International Parallel & Distributed Processing Symp., IPDPS '09, pages 1--8, 2009.

Digital Library

[12]

T. Hoefler and M. Snir. Generic Topology Mapping Strategies for Large-scale Parallel Architectures. In Proc. of the ACM International Conference on Supercomputing, ICS '11, pages 75--84. ACM, 2011.

Digital Library

[13]

S. Kumar, A. R. Mamidala, D. A. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. Steinmacher-Burrow. PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer. In Proc. of the IEEE International Parallel & Distributed Processing Symp., IPDPS '12, pages 763--773. IEEE Computer Society, 2012.

Digital Library

[14]

M. R. Meswani, L. Carrington, D. Unat, A. Snavely, S. Baden, and S. Poole. Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators. Int. J. High Perform. Comput. Appl., 27(2):89--108, May 2013.

Digital Library

[15]

J. Pješivac-Grbović, T. Angskun, G. Bosilca, G. E. Fagg, E. Gabriel, and J. J. Dongarra. Performance Analysis of MPI Collective Operations. Cluster Computing, 10(2):127--143, 2007.

Digital Library

[16]

M. Poke. SymPtOM: Informed Automatic Performance Modeling. Master's thesis, German Research School for Simulation Sciences, Aachen, Germany, Oct 2013.

[17]

R. Reussner, P. Sanders, and J. L. Träff. SKaMPI: a comprehensive benchmark for public benchmarking of MPI. Scientific Programming, 10(1):55--65, 2002.

Digital Library

[18]

N. R. Tallent and A. Hoisie. Palm: Easing the Burden of Analytical Performance Modeling. In Proc. of the ACM International Conference on Supercomputing, ICS '14, pages 221--230. ACM, 2014.

Digital Library

[19]

R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of Collective Communication Operations in MPICH. International Journal of High Performance Computing Applications, 19(1):49--66, 2005.

Digital Library

[20]

J. L. Träff. mpicroscope: Towards an MPI Benchmark Tool for Performance Guideline Verification. In Proc. of the European MPI Users' Group Meeting, EuroMPI '12, pages 100--109. Springer-Verlag, 2012.

Digital Library

[21]

J. S. Vetter and P. H. Worley. Asserting Performance Expectations. In Proc. of the ACM/IEEE Conference on Supercomputing, SC '02, pages 1--13. IEEE Computer Society Press, 2002.

Digital Library

[22]

C. Vömel. ScaLAPACK's MRRR algorithm. ACM Transactions on Mathematical Software (TOMS), 37(1), 2010.

Digital Library

[23]

T. Worsch, R. Reussner, and W. Augustin. On Benchmarking Collective MPI Operations. In Proc. of the European PVM/MPI Users' Group Meeting, pages 271--279. Springer-Verlag, 2002.

Digital Library

Cited By

Hunold S(2023)Verifying Performance Guidelines for MPI Collectives at ScaleProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3625532(1264-1268)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3625532
Sasidharan A(2022)Performance Models for Heterogeneous Iterative ProgramsInternational Journal of Networking and Computing10.15803/ijnc.12.1_13112:1(131-163)Online publication date: 2022
https://doi.org/10.15803/ijnc.12.1_131
Schmid LCopik MCalotoiu AWerle DReiter ASelzer MKoziolek AHoefler TRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)Performance-detectiveProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532391(1-13)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532391
Show More Cited By

Index Terms

Exascaling Your Library: Will Your Implementation Meet Your Expectations?
1. Software and its engineering
  1. Software creation and management
    1. Software development process management
      1. Software development methods
    2. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi Coprocessors
IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing

We analyse gather-scatter performance bottlenecks in molecular dynamics codes and the challenges that they pose for obtaining benefits from SIMD execution. This analysis informs a number of novel code-level and algorithmic improvements to Sandia's ...
GVProf: a value profiler for GPU-based clusters
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

GPGPUs are widely used in high-performance computing systems to accelerate scientific and machine learning workloads. Developing efficient GPU kernels is critically important to obtain "bare-metal" performance on GPU-based clusters. In this paper, we ...
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
ICPP '13: Proceedings of the 2013 42nd International Conference on Parallel Processing

NAS parallel benchmarks (NPB) are a set of applications commonly used to evaluate parallel systems. We use the NPB-OpenMP version to examine the performance of the Intel's new Xeon Phi co-processor and focus in particular on the many core aspect of the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

June 2015

446 pages

ISBN:9781450335591

DOI:10.1145/2751205

General Chair:
Laxmi N. Bhuyan
University of California, Riverside
,
Program Chairs:
Fred Chong
University of California, Santa Barbara
,
Vivek Sarkar
Rice University

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Swiss National Science Foundation
German Research Foundation

Conference

ICS'15

Sponsor:

SIGARCH

ICS'15: 2015 International Conference on Supercomputing

June 8 - 11, 2015

California, Newport Beach, USA

Acceptance Rates

ICS '15 Paper Acceptance Rate 40 of 160 submissions, 25%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
201
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hunold S(2023)Verifying Performance Guidelines for MPI Collectives at ScaleProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3625532(1264-1268)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3625532
Sasidharan A(2022)Performance Models for Heterogeneous Iterative ProgramsInternational Journal of Networking and Computing10.15803/ijnc.12.1_13112:1(131-163)Online publication date: 2022
https://doi.org/10.15803/ijnc.12.1_131
Schmid LCopik MCalotoiu AWerle DReiter ASelzer MKoziolek AHoefler TRauchwerger LCameron KNikolopoulos DPnevmatikatos D(2022)Performance-detectiveProceedings of the 36th ACM International Conference on Supercomputing10.1145/3524059.3532391(1-13)Online publication date: 28-Jun-2022
https://dl.acm.org/doi/10.1145/3524059.3532391
Copik MCalotoiu AGrosser TWicki NWolf FHoefler TLee JPetrank E(2021)Extracting clean performance models from tainted programsProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441613(403-417)Online publication date: 17-Feb-2021
https://dl.acm.org/doi/10.1145/3437801.3441613
Sasidharan A(2021)Performance Models for Hybrid Programs Accelerated by GPUs2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW52791.2021.00098(641-651)Online publication date: Jun-2021
https://doi.org/10.1109/IPDPSW52791.2021.00098
Sun QLiu YYang HJiang ZLiu XDun MLuan ZQian D(2021)csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUs2021 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/Cluster48925.2021.00037(192-203)Online publication date: Sep-2021
https://doi.org/10.1109/Cluster48925.2021.00037
Ritter MCalotoiu ARinke SReimann THoefler TWolf F(2020)Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00095(884-895)Online publication date: May-2020
https://doi.org/10.1109/IPDPS47924.2020.00095
Calotoiu ACopik MHoefler TRitter MShudler SWolf F(2020)ExtraPeak: Advanced Automatic Performance Modeling for HPC ApplicationsSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_15(453-482)Online publication date: 31-Jul-2020
https://doi.org/10.1007/978-3-030-47956-5_15
Shudler SBerens YCalotoiu AHoefler TStrube AWolf F(2019)Engineering Algorithms for Scalability through Continuous Validation of Performance ExpectationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2896993(1-1)Online publication date: 2019
https://doi.org/10.1109/TPDS.2019.2896993
Lehr JCalotoiu ABischof CWolf F(2019)Automatic Instrumentation Refinement for Empirical Performance Modeling2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)10.1109/ProTools49597.2019.00011(40-47)Online publication date: Nov-2019
https://doi.org/10.1109/ProTools49597.2019.00011
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten