Scalable Node Allocation for Improved Performance in Regular and Anisotropic 3D Torus Supercomputers

Albing, Carl; Troullier, Norm; Whalen, Stephen; Olson, Ryan; Glenski, Joe; Pritchard, Howard; Mills, Hugo

doi:10.1007/978-3-642-24449-0_9

Carl Albing^19,20,
Norm Troullier²⁰,
Stephen Whalen^20,21,
Ryan Olson²⁰,
Joe Glenski²⁰,
Howard Pritchard²⁰ &
…
Hugo Mills¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6960))

Included in the following conference series:

European MPI Users' Group Meeting

1178 Accesses
11 Citations

Abstract

MPI application performance can vary based on the scheduler’s placing of ranks, whether between nodes or on cores in the same multi-core chip. MPI applications, by default, are at the mercy of the application placement software decision that assigns nodes to a job. We describe herein the general approach of node ordering for allocation in a 3D torus, how it improved MPI application performance, even in the face of an anisotropic interconnect. We demonstrate, quantitatively, that our topologically-based ordering results in improved performance for several MPI applications running on a Top10 supercomputer.

This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0001. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hilbert curve – from wolfram MathWorld (March 2010), http://mathworld.wolfram.com/HilbertCurve.html
NERSC6 benchmarks (March 2011), http://www.nersc.gov/projects/SDSA/software/?benchmark=NERSC6
Agarwal, T., Sharma, A., Kal, L.V.: Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, p. 110 (2006)
Google Scholar
Albing, C., Baker, M.: ALPS, topology, and performance: A comparison of linear orderings for application placement in a 3D torus. Cray User Group, Edinburgh, Scotland, UK (May 2010)
Google Scholar
Bani-Mohammad, S., Ould-Khaoua, M., Ababneh, I.: An efficient non-contiguous processor allocation strategy for 2D mesh connected multicomputers. Information Sciences 177(14), 2867–2883 (2007)
Article Google Scholar
Bays, C.: A comparison of next-fit, first-fit, and best-fit. Communications of the ACM 20(3), 191–192 (1977)
Article Google Scholar
Bhatele, A., Kale, L.V.: Application-specific topology-aware mapping for three dimensional topologies. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE, Miami (2008)
Google Scholar
Bhatele, A., Kal, L.V.: An evaluative study on the effect of contention on message latencies in large supercomputers. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome, Italy, pp. 1–8 (May 2009)
Google Scholar
Krevat, E., Castaos, J., Moreira, J.: Job scheduling for the BlueGene/L system. LNCS, pp. 38–54. Springer, Edinburgh (2002)
MATH Google Scholar
Leung, V.J., Arkin, E.M., Bender, M.A., Bunde, D., Johnston, J., Lal, A., Mitchell, J.S., Phillips, C., Seiden, S.S.: Processor allocation on Cplant: achieving general processor locality using one-dimensional allocation strategies. In: Proc. 4th IEEE International Conference on Cluster Computing, pp. 296–304 (2002)
Google Scholar
Lo, V., Windisch, K., Liu, W., Nitzberg, B.: Noncontiguous processor allocation algorithms for mesh-connected multicomputers. IEEE Transactions on Parallel and Distributed Systems 8(7), 712–726 (1997)
Article Google Scholar
Russell, J.J.: A simulation of first and best fit allocation algorithms in a modern simulation environment. In: Proc. of 6th Annual CCEC Symposium (2008)
Google Scholar
Weisser, D., Nystrom, N., Brown, S., Gardner, J., O’Neal, D., Urbanic, J., Lim, J., Reddy, R., Raymond, R., Wang, Y., Welling, J.: Optimizing job placement on the Cray XT3. Lugano, Switzerland (May 2006)
Google Scholar
Yu, H., Chung, I., Moreira, J.: Topology mapping for blue Gene/L supercomputer. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 2006, Tampa, Florida, p. 116 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Reading, Reading, Berkshire, UK
Carl Albing & Hugo Mills
Cray Inc., Saint Paul, MN, USA
Carl Albing, Norm Troullier, Stephen Whalen, Ryan Olson, Joe Glenski & Howard Pritchard
University of Minnesota, Minneapolis, MN, USA
Stephen Whalen

Authors

Carl Albing
View author publications
You can also search for this author in PubMed Google Scholar
Norm Troullier
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Whalen
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Olson
View author publications
You can also search for this author in PubMed Google Scholar
Joe Glenski
View author publications
You can also search for this author in PubMed Google Scholar
Howard Pritchard
View author publications
You can also search for this author in PubMed Google Scholar
Hugo Mills
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, 15784, Athens, Greece
Yiannis Cotronis
University of Tennessee, 1122 Volunteer Blvd, 37996-3450, Knoxville, TN, USA
Anthony Danalis & Jack Dongarra &
University of Crete, Heraklion, Greece
Dimitrios S. Nikolopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Albing, C. et al. (2011). Scalable Node Allocation for Improved Performance in Regular and Anisotropic 3D Torus Supercomputers. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2011. Lecture Notes in Computer Science, vol 6960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24449-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-24449-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24448-3
Online ISBN: 978-3-642-24449-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics