Abstract
Designing and tuning parallel applications with MPI, particularly at large scale, requires understanding the performance implications of different choices of algorithms and implementation options. Which algorithm is better depends in part on the performance of the different possible communication approaches, which in turn can depend on both the system hardware and the MPI implementation. In the absence of detailed performance models for different MPI implementations, application developers often must select methods and tune codes without the means to realistically estimate the achievable performance and rationally defend their choices. In this paper, we advocate the construction of more useful performance models that take into account limitations on network-injection rates and effective bisection bandwidth. Since collective communication plays a crucial role in enabling scalability, we also provide analytical models for scalability of collective communication algorithms, such as broadcast, allreduce, and all-to-all. We apply these models to an IBM Blue Gene/P system and compare the analytical performance estimates with experimentally measured values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Al-Tawil, K., Moritz, C.A.: Performance modeling and evaluation of MPI. Journal of Parallel and Distributed Computing 61(2), 202–223 (2001)
Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.J.: LogGP: Incorporating long messages into the LogP model for parallel computation. Journal of Parallel and Distributed Computing 44(1), 71–79 (1997)
Barker, K.J., Davis, K., Kerbyson, D.J.: Performance modeling in action: Performance prediction of a Cray XT4 system during upgrade. In: Proceedings of the 2009 IEEE Intl. Symp. on Parallel&Distributed Processing, pp. 1–8 (2009)
Bhatelé, A., Bohm, E., Kalé, L.V.: Topology aware task mapping techniques: An API and case study. SIGPLAN Not. 44(4), 301–302 (2009)
Chan, E., Heimlich, M., Purkayastha, A., van de Geijn, R.A.: Collective communication: theory, practice, and experience. Conc. & Comp. 19(13), 1749–1783 (2007)
Eleftheriou, M., Fitch, B.G., Rayshubskiy, A., Ward, T.J.C., Germain, R.S.: Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements. IBM J. Res. Dev. 49(2), 457–464 (2005)
Faraj, A., Kumar, S., Smith, B., Mamidala, A., Gunnels, J.: MPI collective communications on the Blue Gene/P supercomputer: Algorithms and optimizations. In: 17th IEEE Symposium on High-Performance Interconnects, pp. 63–72 (2009)
Gropp, W., Lusk, E.L.: Reproducible measurements of mpi performance characteristics. In: Margalef, T., Dongarra, J., Luque, E. (eds.) PVM/MPI 1999. LNCS, vol. 1697, pp. 11–18. Springer, Heidelberg (1999)
Hoefler, T., Janisch, R., Rehm, W.: Parallel scaling of Teter’s minimization for Ab Initio calculations . In: HPC Nano 2006 in conjunction with the Intl. Conference on High Performance Computing, Networking, Storage and Analysis, SC 2006 (November 2006)
Hoefler, T., Schneider, T., Lumsdaine, A.: Multistage Switches are not Crossbars: Effects of Static Routing in High-Performance Networks. In: Proc. of IEEE Intl. Conf. on Cluster Computing, October 2008. IEEE Computer Society Press, Los Alamitos (2008)
Hoefler, T., Cerquetti, L., Mehlan, T., Mietke, F., Rehm, W.: A practical approach to the rating of barrier algorithms using the LogP model and Open MPI. In: Proc. of the Intl. Conf. on Parallel Proc. Workshops (ICPP 2005), June 2005, pp. 562–569 (2005)
Ino, F., Fujimoto, N., Hagihara, K.: LogGPS: A Parallel Computational Model for Synchronization Analysis. In: PPoPP 2001: Proc. of ACM SIGPLAN symposium on Principles and practices of parallel programming, pp. 133–142 (2001)
Jia, B.: Process cooperation in multiple message broadcast. Parallel Computing 35(12), 572–580 (2009)
Lam, C.C., Huang, C.H., Sadayappan, P.: Optimal algorithms for all-to-all personalized communication on rings and two dimensional tori. J. Parallel Distrib. Comput. 43(1), 3–13
Martínez, D.R., Cabaleiro, J.C., Pena, T.F., Rivera, F.F., Blanco, V.: Accurate analytical performance model of communications in MPI applications. In: 23rd IEEE Intl. Symp. on Parallel and Distributed Processing (IPDPS), pp. 1–8 (2009)
Moritz, C.A., Frank, M.: LoGPC: Modeling network contention in message-passing programs. IEEE Trans. on Par. and Distrib. Systems 12(4), 404–415 (2001)
Mudalige, G.R., Vernon, M.K., Jarvis, S.A.: A plug-and-play model for evaluating wavefront computations on parallel architectures. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–14 (2008)
Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.J.: Performance Analysis of MPI Collective Operations. In: 4th Intl. Workshop on Perf. Modeling, Evaluation, and Optimization of Par. and Distrib. Syst. (2005)
Rodríguez, G., Badia, R.M., Labarta, J.: Generation of simple analytical models for message passing applications. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 183–188. Springer, Heidelberg (2004)
Touriño, J., Doallo, R.: Performance evaluation and modeling of the Fujitsu AP3000 message-passing libraries. In: Amestoy, P.R., Berger, P., Daydé, M., Duff, I.S., Frayssé, V., Giraud, L., Ruiz, D. (eds.) Euro-Par 1999. LNCS, vol. 1685, pp. 183–187. Springer, Heidelberg (1999)
Träff, J.L., Ripke, A.: Optimal broadcast for fully connected processor-node networks. Journal of Parallel and Distributed Computing 68(7), 887–901 (2008)
Xu, Z., Hwang, K.: Modeling communication overhead: MPI and MPL performance on the IBM SP2. IEEE Parallel Distrib. Technol. 4(1), 9–23 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hoefler, T., Gropp, W., Thakur, R., Träff, J.L. (2010). Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2010. Lecture Notes in Computer Science, vol 6305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15646-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-15646-5_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15645-8
Online ISBN: 978-3-642-15646-5
eBook Packages: Computer ScienceComputer Science (R0)