Skip to main content

Scalable Node Allocation for Improved Performance in Regular and Anisotropic 3D Torus Supercomputers

  • Conference paper
Book cover Recent Advances in the Message Passing Interface (EuroMPI 2011)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6960))

Included in the following conference series:

Abstract

MPI application performance can vary based on the scheduler’s placing of ranks, whether between nodes or on cores in the same multi-core chip. MPI applications, by default, are at the mercy of the application placement software decision that assigns nodes to a job. We describe herein the general approach of node ordering for allocation in a 3D torus, how it improved MPI application performance, even in the face of an anisotropic interconnect. We demonstrate, quantitatively, that our topologically-based ordering results in improved performance for several MPI applications running on a Top10 supercomputer.

This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0001. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Defense Advanced Research Projects Agency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hilbert curve – from wolfram MathWorld (March 2010), http://mathworld.wolfram.com/HilbertCurve.html

  2. NERSC6 benchmarks (March 2011), http://www.nersc.gov/projects/SDSA/software/?benchmark=NERSC6

  3. Agarwal, T., Sharma, A., Kal, L.V.: Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, p. 110 (2006)

    Google Scholar 

  4. Albing, C., Baker, M.: ALPS, topology, and performance: A comparison of linear orderings for application placement in a 3D torus. Cray User Group, Edinburgh, Scotland, UK (May 2010)

    Google Scholar 

  5. Bani-Mohammad, S., Ould-Khaoua, M., Ababneh, I.: An efficient non-contiguous processor allocation strategy for 2D mesh connected multicomputers. Information Sciences 177(14), 2867–2883 (2007)

    Article  Google Scholar 

  6. Bays, C.: A comparison of next-fit, first-fit, and best-fit. Communications of the ACM 20(3), 191–192 (1977)

    Article  Google Scholar 

  7. Bhatele, A., Kale, L.V.: Application-specific topology-aware mapping for three dimensional topologies. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE, Miami (2008)

    Google Scholar 

  8. Bhatele, A., Kal, L.V.: An evaluative study on the effect of contention on message latencies in large supercomputers. In: 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome, Italy, pp. 1–8 (May 2009)

    Google Scholar 

  9. Krevat, E., Castaos, J., Moreira, J.: Job scheduling for the BlueGene/L system. LNCS, pp. 38–54. Springer, Edinburgh (2002)

    MATH  Google Scholar 

  10. Leung, V.J., Arkin, E.M., Bender, M.A., Bunde, D., Johnston, J., Lal, A., Mitchell, J.S., Phillips, C., Seiden, S.S.: Processor allocation on Cplant: achieving general processor locality using one-dimensional allocation strategies. In: Proc. 4th IEEE International Conference on Cluster Computing, pp. 296–304 (2002)

    Google Scholar 

  11. Lo, V., Windisch, K., Liu, W., Nitzberg, B.: Noncontiguous processor allocation algorithms for mesh-connected multicomputers. IEEE Transactions on Parallel and Distributed Systems 8(7), 712–726 (1997)

    Article  Google Scholar 

  12. Russell, J.J.: A simulation of first and best fit allocation algorithms in a modern simulation environment. In: Proc. of 6th Annual CCEC Symposium (2008)

    Google Scholar 

  13. Weisser, D., Nystrom, N., Brown, S., Gardner, J., O’Neal, D., Urbanic, J., Lim, J., Reddy, R., Raymond, R., Wang, Y., Welling, J.: Optimizing job placement on the Cray XT3. Lugano, Switzerland (May 2006)

    Google Scholar 

  14. Yu, H., Chung, I., Moreira, J.: Topology mapping for blue Gene/L supercomputer. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 2006, Tampa, Florida, p. 116 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Albing, C. et al. (2011). Scalable Node Allocation for Improved Performance in Regular and Anisotropic 3D Torus Supercomputers. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2011. Lecture Notes in Computer Science, vol 6960. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24449-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24449-0_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24448-3

  • Online ISBN: 978-3-642-24449-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics