Skip to main content

Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning

  • Conference paper
Supercomputing (ISC 2013)

Abstract

Achieving good application performance on a modern compute cluster of multi-core, multi-socket, NUMA-aware systems can be challenging. In this paper, we use VASP, a popular ab-initio quantum-mechanical MD simulation software, to investigate the various levels of the software, hardware, and network tuning that boosts performance on a Dell PowerEdge R815 HPC cluster with AMD “Interlagos” and “Abu-Dhabi” processors. We implement code changes with the free software stack that supports FMA and AVX CPU instructions on the Bulldozer/Piledriver architecture. We analyze the MPI communications by profiling, compare the scalability performance of different interconnects, and discuss various MPI tuning parameters show effects of the advanced features that are crucial to the scalability performance of InfiniBand, including MXM and SRQ, which optimize the network resources for MPI communications. We investigate the importance of the MPI process placement, and introduce a process allocation tool that facilitates the affinity grouping on a multicore architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. InfiniBand Trade Association, http://www.infinibandta.org/

  2. HPC Advisory Council HPC Center, http://www.hpcadvisorycouncil.com/cluster_center.php

  3. The TOP500 list, http://www.top500.org

  4. Shipman, G.M., Woodall, T.S., Graham, R.L., Maccabe, A.B., Bridges, P.G.: InfiniBand Scalability in Open MPI. In: IEEE Parallel and Distributed Processing Symposium (IPDPS), Rhodes Island, Greece (May 2006)

    Google Scholar 

  5. Bailey, D.H., Lucas, R.F., Williams, S.W.: Performance Tuning of Scientific Applications (2011) ISBN 978-1-4398-1569-4

    Google Scholar 

  6. Kresse, G., Hafner, J.: Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558 (1993)

    Article  Google Scholar 

  7. Kresse, G., Hafner, J.: Ab initio molecular-dynamics simulation of the liquid-metal-amorphous-semiconductor transition in germanium. Phys. Rev. B 49, 14251 (1994)

    Article  Google Scholar 

  8. Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mat. Sci. 6, 15 (1996)

    Article  Google Scholar 

  9. Kresse, G., Furthmüller, J.: Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169 (1996)

    Article  Google Scholar 

  10. Code changes that supports Open64 Compiler on VASP, http://www.hpcadvisorycouncil.com/pdf/open64.diff , http://www.hpcadvisorycouncil.com/pdf/open64.diff

  11. Shainer, G., Lui, P., Liu, T., Wilde, T., Layton, J.: The Impact of Inter-Node Latency versus Intra-Node Latency on HPC Applications. In: Parallel and Distributed Computing and Systems. ACTA Press (2011)

    Google Scholar 

  12. Shainer, G., Wilde, T., Lui, P., Liu, T., Kagan, M., Dubman, M., Shahar, Y., Graham, R., Shamis, P., Poole, S.: The Co-design Architecture for Exascale Systems, A Novel Approach for Scalable Designs. In: ISC 2012. Springer (2012) ISSN 1865-2034

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shainer, G. et al. (2013). Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2013. Lecture Notes in Computer Science, vol 7905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38750-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38750-0_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38749-4

  • Online ISBN: 978-3-642-38750-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics