MPI on a Million Processors

Balaji, Pavan; Buntinas, Darius; Goodell, David; Gropp, William; Kumar, Sameer; Lusk, Ewing; Thakur, Rajeev; Träff, Jesper Larsson

doi:10.1007/978-3-642-03770-2_9

Pavan Balaji¹⁸,
Darius Buntinas¹⁸,
David Goodell¹⁸,
William Gropp¹⁹,
Sameer Kumar²⁰,
Ewing Lusk¹⁸,
Rajeev Thakur¹⁸ &
…
Jesper Larsson Träff²¹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5759))

Included in the following conference series:

European Parallel Virtual Machine / Message Passing Interface Users’ Group Meeting

Abstract

Petascale machines with close to a million processors will soon be available. Although MPI is the dominant programming model today, some researchers and users wonder (and perhaps even doubt) whether MPI will scale to such large processor counts. In this paper, we examine this issue of how scalable is MPI. We first examine the MPI specification itself and discuss areas with scalability concerns and how they can be overcome. We then investigate issues that an MPI implementation must address to be scalable. We ran some experiments to measure MPI memory consumption at scale on up to 131,072 processes or 80% of the IBM Blue Gene/P system at Argonne National Laboratory. Based on the results, we tuned the MPI implementation to reduce its memory footprint. We also discuss issues in application algorithmic scalability to large process counts and features of MPI that enable the use of other techniques to overcome scalability limitations in applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs

Speaking Pygion: Experiences Writing an Exascale Single Particle Imaging Code

A Case Study for Performance Portability Using OpenMP 4.5

References

ADLB library, http://www.cs.mtsu.edu/~rbutler/adlb/
Barbay, J., Navarro, G.: Compressed representations of permutations, and applications. In: Proc. of 26th Int’l Symposium on Theoretical Aspects of Computer Science (STACS), pp. 111–122 (2009)
Google Scholar
Bonachea, D., Duell, J.: Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations. In: 2nd Workshop on Hardware/Software Support for High Performance Sci. and Eng. Computing (2003)
Google Scholar
Bosilca, G., Bouteiller, A., Cappello, F., Djilali, S., Fedak, G., Germain, C., Herault, T., Lemarinier, P., Lodygensky, O., Magniette, F., Neri, V., Selikhov, A.: MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes. In: Proc. of SC 2002. IEEE, Los Alamitos (2002)
Google Scholar
Chapman, B., Jost, G., van der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press, Cambridge (2007)
Google Scholar
Fagg, G.E., Dongarra, J.J.: FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds.) PVM/MPI 2000. LNCS, vol. 1908, pp. 346–353. Springer, Heidelberg (2000)
Chapter Google Scholar
Gropp, W.D., Lusk, E.: Fault tolerance in MPI programs. Int’l Journal of High Performance Computer Applications 18(3), 363–372 (2004)
Article Google Scholar
Hoefler, T., Träff, J.L.: Sparse collective operations for MPI. In: Proc. of 14th Int’l Workshop on High-level Parallel Programming Models and Supportive Environments at IPDPS (2009)
Google Scholar
Jitsumoto, H., Endo, T., Matsuoka, S.: ABARIS: An adaptable fault detection/recovery component framework for MPIs. In: Proc. of 12th IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems (DPDNS 2007) in conjunction with IPDPS 2007 (March 2007)
Google Scholar
Kumar, S., Dozsa, G., Berg, J., Cernohous, B., Miller, D., Ratterman, J., Smith, B., Heidelberger, P.: Architecture of the component collective messaging interface. In: Lastovetsky, A., Kechadi, T., Dongarra, J. (eds.) EuroPVM/MPI 2008. LNCS, vol. 5205, pp. 23–32. Springer, Heidelberg (2008)
Chapter Google Scholar
MPI Forum fault tolerance working group, https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/FaultToleranceWikiPage
MPI Forum RMA working group, https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/RmaWikiPage
PETSc library, http://www.mcs.anl.gov/petsc
Pieper, S.C., Wiringa, R.B.: Quantum Monte Carlo Calculations of Light Nuclei. Annu. Rev. Nucl. Part. Sci. 51, 53 (2001)
Article Google Scholar
Proposal for distributed graph topology, https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/33
Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: Proc. of 17th Euromicro Int’l Conference on Parallel, Distributed, and Network-Based Processing (PDP 2009), February 2009, pp. 427–236 (2009)
Google Scholar
Rane, A., Stanzione, D.: Experiences in tuning performance of hybrid MPI/OpenMP applications on quad-core systems. In: Proc. of 10th LCI Int’l Conference on High-Performance Clustered Computing (March 2009)
Google Scholar
Ross, R., Miller, N., Gropp, W.: Implementing fast and reusable datatype processing. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 404–413. Springer, Heidelberg (2003)
Chapter Google Scholar
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int’l Journal of High-Performance Computing Applications 19(1), 49–66 (spring 2005)
Article Google Scholar
Träff, J.L.: SMP-aware message passing programming. In: Proc. of 8th Int’l Workshop on High-level Parallel Programming Models and Supportive Environments at IPDPS 2003, pp. 56–65 (2003)
Google Scholar
Träff, J.L.: A simple work-optimal broadcast algorithm for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 173–180. Springer, Heidelberg (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Argonne National Laboratory, Argonne, IL, 60439, USA
Pavan Balaji, Darius Buntinas, David Goodell, Ewing Lusk & Rajeev Thakur
University of Illinois, Urbana, IL, 61801, USA
William Gropp
IBM T.J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Sameer Kumar
NEC Laboratories Europe, Sankt Augustin, Germany
Jesper Larsson Träff

Authors

Pavan Balaji
View author publications
You can also search for this author in PubMed Google Scholar
Darius Buntinas
View author publications
You can also search for this author in PubMed Google Scholar
David Goodell
View author publications
You can also search for this author in PubMed Google Scholar
William Gropp
View author publications
You can also search for this author in PubMed Google Scholar
Sameer Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Ewing Lusk
View author publications
You can also search for this author in PubMed Google Scholar
Rajeev Thakur
View author publications
You can also search for this author in PubMed Google Scholar
Jesper Larsson Träff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Technology, Åbo Akademi, 20500, Turku, Finland
Matti Ropo & Jan Westerholm &
Department of Electrical Engineering and Computer Science, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Balaji, P. et al. (2009). MPI on a Million Processors. In: Ropo, M., Westerholm, J., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2009. Lecture Notes in Computer Science, vol 5759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03770-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-03770-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03769-6
Online ISBN: 978-3-642-03770-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics