Locality and Topology Aware Intra-node Communication among Multicore CPUs

Ma, Teng; Bosilca, George; Bouteiller, Aurelien; Dongarra, Jack J.

doi:10.1007/978-3-642-15646-5_28

Teng Ma²⁰,
George Bosilca²⁰,
Aurelien Bouteiller²⁰ &
…
Jack J. Dongarra²⁰

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6305))

Included in the following conference series:

European MPI Users' Group Meeting

1074 Accesses
8 Citations

Abstract

A major trend in HPC is the escalation toward manycore, where systems are composed of shared memory nodes featuring numerous processing units. Unfortunately, with scale comes complexity, here in the form of non-uniform memory accesses and cache hierarchies. For most HPC applications, harnessing the power of multicores is hindered by the topology oblivious tuning of the MPI library. In this paper, we propose a framework to tune every type of shared memory communications according to locality and topology. An implementation inside Open MPI is evaluated experimentally and demonstrates significant speedups compared to vanilla Open MPI and MPICH2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: Parallel, Distributed and Network-based Processing, pp. 427–436 (2009)
Google Scholar
Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing 22, 789–828 (1996)
Article MATH Google Scholar
Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, pp. 97–104 (2004)
Google Scholar
Graham, R.L., Woodall, T.S., Squyres, J.M.: Open MPI: A flexible high performance MPI. In: Proceedings of 6th Annual International Conference on Parallel Processing and Applied Mathematics, Poznan, Poland (2005)
Google Scholar
Buntinas, D., Mercier, G., Gropp, W.: Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem. In: Sixth IEEE International Symposium on Cluster Computing and the Grid, vol. 1, pp. 10–20 (2006)
Google Scholar
Chaarawi, M., Squyres, J.M., Gabriel, E., Feki, S.: A tool for optimizing runtime parameters of Open MPI. In: Lastovetsky, A., Kechadi, T., Dongarra, J. (eds.) EuroPVM/MPI 2008. LNCS, vol. 5205, pp. 210–217. Springer, Heidelberg (2008)
Chapter Google Scholar
Jin, H.W., Sur, S., Chai, L., Panda, D.: LiMIC: support for high-performance MPI intra-node communication on linux cluster. In: International Conference on Parallel Processing, ICPP 2005, pp. 184–191 (2005)
Google Scholar
Buntinas, D., Goglin, B., Goodell, D., Mercier, G., Moreaud, S.: Cache-Efficient, Intranode Large-Message MPI Communication with MPICH2-Nemesis. In: Proceedings of the 38th International Conference on Parallel Processing (ICPP 2009), pp. 462–469. IEEE Computer Society Press, Vienna (2009)
Google Scholar
Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: Magpie: Mpi’s collective communication operations for clustered wide area systems. In: Proceedings of the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP 1999), pp. 131–140 (1999)
Google Scholar
Karonis, N.T., de Supinski, B.R., Foster, I., Gropp, W., Lusk, E., Bresnahan, J.: Exploiting hierarchy in parallel computer networks to optimize collective operation performance. In: The 14th International Parallel and Distributed Processing Symposium, p. 377 (2000)
Google Scholar
Filgueira, R., Singh, D.E., Pichel, J.C., Isaila, F., Carretero, J.: Data Locality Aware Strategy for two-phase Collective I/O. In: Palma, J.M.L.M., Amestoy, P.R., Daydé, M., Mattoso, M., Lopes, J.C. (eds.) VECPAR 2008. LNCS, vol. 5336, pp. 137–149. Springer, Heidelberg (2008)
Chapter Google Scholar
Broquedis, F., Clet Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In: The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing (2010)
Google Scholar
Shipman, G.M., Woodall, T.S., Bosilca, G., Graham, R.L., Maccabe, A.B.: High performance RDMA protocols in HPC. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 76–85. Springer, Heidelberg (2006)
Google Scholar
Snell, Q.O., Mikler, A.R., Gustafson, J.L.: NetPIPE: A network protocol independent performance evaluator. In: IASTED International Conference on Intelligent Information Management and Systems (1996)
Google Scholar
Intel: Intel MPI benchmarks 3.2 (2010), http://software.intel.com/en-us/articles/intel-mpi-benchmarks/
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks. Technical report. The International Journal of Supercomputer Applications (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Innovative Computing Laboratory, University of Tennessee Computer Science Department, 1122 Volunteer Blvd., Knoxville, TN, 37996-3450, USA
Teng Ma, George Bosilca, Aurelien Bouteiller & Jack J. Dongarra

Authors

Teng Ma
View author publications
You can also search for this author in PubMed Google Scholar
George Bosilca
View author publications
You can also search for this author in PubMed Google Scholar
Aurelien Bouteiller
View author publications
You can also search for this author in PubMed Google Scholar
Jack J. Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

High Performance Computing Center Stuttgart (HLRS), Universität Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
Rainer Keller
Parallel Software Technologies Laboratory, Department of Computer Science, University of Houston,
Edgar Gabriel
High Performance Computing Center Stuttgart, University of Stuttgart, Nobelstr. 19, 70569, Stuttgart, Germany
Michael Resch
Department of Electrical Engineering and Computer Science, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, T., Bosilca, G., Bouteiller, A., Dongarra, J.J. (2010). Locality and Topology Aware Intra-node Communication among Multicore CPUs. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2010. Lecture Notes in Computer Science, vol 6305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15646-5_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-15646-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15645-8
Online ISBN: 978-3-642-15646-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics