Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues

Hoefler, Torsten; Gropp, William; Thakur, Rajeev; Träff, Jesper Larsson

doi:10.1007/978-3-642-15646-5_3

Torsten Hoefler²⁰,
William Gropp²⁰,
Rajeev Thakur²¹ &
…
Jesper Larsson Träff²²

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6305))

Included in the following conference series:

European MPI Users' Group Meeting

Abstract

Designing and tuning parallel applications with MPI, particularly at large scale, requires understanding the performance implications of different choices of algorithms and implementation options. Which algorithm is better depends in part on the performance of the different possible communication approaches, which in turn can depend on both the system hardware and the MPI implementation. In the absence of detailed performance models for different MPI implementations, application developers often must select methods and tune codes without the means to realistically estimate the achievable performance and rationally defend their choices. In this paper, we advocate the construction of more useful performance models that take into account limitations on network-injection rates and effective bisection bandwidth. Since collective communication plays a crucial role in enabling scalability, we also provide analytical models for scalability of collective communication algorithms, such as broadcast, allreduce, and all-to-all. We apply these models to an IBM Blue Gene/P system and compare the analytical performance estimates with experimentally measured values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A New Model-Based Approach to Performance Comparison of MPI Collective Algorithms

Maximizing Communication–Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations

Article 26 November 2016

Algorithm Selection of MPI Collectives Considering System Utilization

References

Al-Tawil, K., Moritz, C.A.: Performance modeling and evaluation of MPI. Journal of Parallel and Distributed Computing 61(2), 202–223 (2001)
Article MATH Google Scholar
Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.J.: LogGP: Incorporating long messages into the LogP model for parallel computation. Journal of Parallel and Distributed Computing 44(1), 71–79 (1997)
Article Google Scholar
Barker, K.J., Davis, K., Kerbyson, D.J.: Performance modeling in action: Performance prediction of a Cray XT4 system during upgrade. In: Proceedings of the 2009 IEEE Intl. Symp. on Parallel&Distributed Processing, pp. 1–8 (2009)
Google Scholar
Bhatelé, A., Bohm, E., Kalé, L.V.: Topology aware task mapping techniques: An API and case study. SIGPLAN Not. 44(4), 301–302 (2009)
Article Google Scholar
Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.A.: Collective communication: theory, practice, and experience. Conc. & Comp. 19(13), 1749–1783 (2007)
Google Scholar
Eleftheriou, M., Fitch, B.G., Rayshubskiy, A., Ward, T.J.C., Germain, R.S.: Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements. IBM J. Res. Dev. 49(2), 457–464 (2005)
Article Google Scholar
Faraj, A., Kumar, S., Smith, B., Mamidala, A., Gunnels, J.: MPI collective communications on the Blue Gene/P supercomputer: Algorithms and optimizations. In: 17th IEEE Symposium on High-Performance Interconnects, pp. 63–72 (2009)
Google Scholar
Gropp, W., Lusk, E.L.: Reproducible measurements of mpi performance characteristics. In: Margalef, T., Dongarra, J., Luque, E. (eds.) PVM/MPI 1999. LNCS, vol. 1697, pp. 11–18. Springer, Heidelberg (1999)
Chapter Google Scholar
Hoefler, T., Janisch, R., Rehm, W.: Parallel scaling of Teter’s minimization for Ab Initio calculations . In: HPC Nano 2006 in conjunction with the Intl. Conference on High Performance Computing, Networking, Storage and Analysis, SC 2006 (November 2006)
Google Scholar
Hoefler, T., Schneider, T., Lumsdaine, A.: Multistage Switches are not Crossbars: Effects of Static Routing in High-Performance Networks. In: Proc. of IEEE Intl. Conf. on Cluster Computing, October 2008. IEEE Computer Society Press, Los Alamitos (2008)
Google Scholar
Hoefler, T., Cerquetti, L., Mehlan, T., Mietke, F., Rehm, W.: A practical approach to the rating of barrier algorithms using the LogP model and Open MPI. In: Proc. of the Intl. Conf. on Parallel Proc. Workshops (ICPP 2005), June 2005, pp. 562–569 (2005)
Google Scholar
Ino, F., Fujimoto, N., Hagihara, K.: LogGPS: A Parallel Computational Model for Synchronization Analysis. In: PPoPP 2001: Proc. of ACM SIGPLAN symposium on Principles and practices of parallel programming, pp. 133–142 (2001)
Google Scholar
Jia, B.: Process cooperation in multiple message broadcast. Parallel Computing 35(12), 572–580 (2009)
Article MathSciNet Google Scholar
Lam, C.C., Huang, C.H., Sadayappan, P.: Optimal algorithms for all-to-all personalized communication on rings and two dimensional tori. J. Parallel Distrib. Comput. 43(1), 3–13
Google Scholar
Martínez, D.R., Cabaleiro, J.C., Pena, T.F., Rivera, F.F., Blanco, V.: Accurate analytical performance model of communications in MPI applications. In: 23rd IEEE Intl. Symp. on Parallel and Distributed Processing (IPDPS), pp. 1–8 (2009)
Google Scholar
Moritz, C.A., Frank, M.: LoGPC: Modeling network contention in message-passing programs. IEEE Trans. on Par. and Distrib. Systems 12(4), 404–415 (2001)
Article Google Scholar
Mudalige, G.R., Vernon, M.K., Jarvis, S.A.: A plug-and-play model for evaluating wavefront computations on parallel architectures. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–14 (2008)
Google Scholar
Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance Analysis of MPI Collective Operations. In: 4th Intl. Workshop on Perf. Modeling, Evaluation, and Optimization of Par. and Distrib. Syst. (2005)
Google Scholar
Rodríguez, G., Badia, R.M., Labarta, J.: Generation of simple analytical models for message passing applications. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 183–188. Springer, Heidelberg (2004)
Google Scholar
Touriño, J., Doallo, R.: Performance evaluation and modeling of the Fujitsu AP3000 message-passing libraries. In: Amestoy, P.R., Berger, P., Daydé, M., Duff, I.S., Frayssé, V., Giraud, L., Ruiz, D. (eds.) Euro-Par 1999. LNCS, vol. 1685, pp. 183–187. Springer, Heidelberg (1999)
Chapter Google Scholar
Träff, J.L., Ripke, A.: Optimal broadcast for fully connected processor-node networks. Journal of Parallel and Distributed Computing 68(7), 887–901 (2008)
Article Google Scholar
Xu, Z., Hwang, K.: Modeling communication overhead: MPI and MPL performance on the IBM SP2. IEEE Parallel Distrib. Technol. 4(1), 9–23 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, IL, USA
Torsten Hoefler & William Gropp
Argonne National Laboratory, Argonne, IL, USA
Rajeev Thakur
Dept. of Scientific Computing, University of Vienna, Austria
Jesper Larsson Träff

Authors

Torsten Hoefler
View author publications
You can also search for this author in PubMed Google Scholar
William Gropp
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Thakur
View author publications
You can also search for this author in PubMed Google Scholar
Jesper Larsson Träff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

High Performance Computing Center Stuttgart (HLRS), Universität Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
Rainer Keller
Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston,
Edgar Gabriel
High Performance Computing Center Stuttgart, University of Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
Michael Resch
Department of Electrical Engineering and Computer Science, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoefler, T., Gropp, W., Thakur, R., Träff, J.L. (2010). Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2010. Lecture Notes in Computer Science, vol 6305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15646-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-15646-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15645-8
Online ISBN: 978-3-642-15646-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A New Model-Based Approach to Performance Comparison of MPI Collective Algorithms

Maximizing Communication–Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations

Algorithm Selection of MPI Collectives Considering System Utilization

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A New Model-Based Approach to Performance Comparison of MPI Collective Algorithms

Maximizing Communication–Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations

Algorithm Selection of MPI Collectives Considering System Utilization

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation