Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning

Shainer, Gilad; Lui, Pak; Hilgeman, Martin; Layton, Jeffrey; Stevens, Cydney; Stemple, Walker; Schultz, Scot; Ludden, Guy; Mora, Joshua; Kresse, Georg

doi:10.1007/978-3-642-38750-0_17

Gilad Shainer¹⁹,
Pak Lui¹⁹,
Martin Hilgeman²⁰,
Jeffrey Layton²⁰,
Cydney Stevens²⁰,
Walker Stemple²⁰,
Scot Schultz²¹,
Guy Ludden²¹,
Joshua Mora²¹ &
…
Georg Kresse²²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7905))

Included in the following conference series:

International Supercomputing Conference

Abstract

Achieving good application performance on a modern compute cluster of multi-core, multi-socket, NUMA-aware systems can be challenging. In this paper, we use VASP, a popular ab-initio quantum-mechanical MD simulation software, to investigate the various levels of the software, hardware, and network tuning that boosts performance on a Dell PowerEdge R815 HPC cluster with AMD “Interlagos” and “Abu-Dhabi” processors. We implement code changes with the free software stack that supports FMA and AVX CPU instructions on the Bulldozer/Piledriver architecture. We analyze the MPI communications by profiling, compare the scalability performance of different interconnects, and discuss various MPI tuning parameters show effects of the advanced features that are crucial to the scalability performance of InfiniBand, including MXM and SRQ, which optimize the network resources for MPI communications. We investigate the importance of the MPI process placement, and introduce a process allocation tool that facilitates the affinity grouping on a multicore architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

InfiniBand Trade Association, http://www.infinibandta.org/
HPC Advisory Council HPC Center, http://www.hpcadvisorycouncil.com/cluster_center.php
The TOP500 list, http://www.top500.org
Shipman, G.M., Woodall, T.S., Graham, R.L., Maccabe, A.B., Bridges, P.G.: InfiniBand Scalability in Open MPI. In: IEEE Parallel and Distributed Processing Symposium (IPDPS), Rhodes Island, Greece (May 2006)
Google Scholar
Bailey, D.H., Lucas, R.F., Williams, S.W.: Performance Tuning of Scientific Applications (2011) ISBN 978-1-4398-1569-4
Google Scholar
Kresse, G., Hafner, J.: Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558 (1993)
Article Google Scholar
Kresse, G., Hafner, J.: Ab initio molecular-dynamics simulation of the liquid-metal-amorphous-semiconductor transition in germanium. Phys. Rev. B 49, 14251 (1994)
Article Google Scholar
Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mat. Sci. 6, 15 (1996)
Article Google Scholar
Kresse, G., Furthmüller, J.: Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169 (1996)
Article Google Scholar
Code changes that supports Open64 Compiler on VASP, http://www.hpcadvisorycouncil.com/pdf/open64.diff , http://www.hpcadvisorycouncil.com/pdf/open64.diff
Shainer, G., Lui, P., Liu, T., Wilde, T., Layton, J.: The Impact of Inter-Node Latency versus Intra-Node Latency on HPC Applications. In: Parallel and Distributed Computing and Systems. ACTA Press (2011)
Google Scholar
Shainer, G., Wilde, T., Lui, P., Liu, T., Kagan, M., Dubman, M., Shahar, Y., Graham, R., Shamis, P., Poole, S.: The Co-design Architecture for Exascale Systems, A Novel Approach for Scalable Designs. In: ISC 2012. Springer (2012) ISSN 1865-2034
Google Scholar

Download references

Author information

Authors and Affiliations

Mellanox Technologies, California, USA
Gilad Shainer & Pak Lui
Dell Inc., Texas, USA
Martin Hilgeman, Jeffrey Layton, Cydney Stevens & Walker Stemple
Advanced Micro Devices, California, USA
Scot Schultz, Guy Ludden & Joshua Mora
University of Vienna, Vienna, Austria
Georg Kresse

Authors

Gilad Shainer
View author publications
You can also search for this author in PubMed Google Scholar
Pak Lui
View author publications
You can also search for this author in PubMed Google Scholar
Martin Hilgeman
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey Layton
View author publications
You can also search for this author in PubMed Google Scholar
Cydney Stevens
View author publications
You can also search for this author in PubMed Google Scholar
Walker Stemple
View author publications
You can also search for this author in PubMed Google Scholar
Scot Schultz
View author publications
You can also search for this author in PubMed Google Scholar
Guy Ludden
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Mora
View author publications
You can also search for this author in PubMed Google Scholar
Georg Kresse
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Hamburg, Department of Informatics, Bundestraße 45a, 20146, Hamburg, Germany
Julian Martin Kunkel
Deutsches Klimarechenzentrum, Bundestraße 45a, 20146, Hamburg, Germany
Thomas Ludwig
Germany and Prometeus GmbH, University of Mannheim, Fliederstraße 2, 74915, Waibstadt, Germany
Hans Werner Meuer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shainer, G. et al. (2013). Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2013. Lecture Notes in Computer Science, vol 7905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38750-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-38750-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38749-4
Online ISBN: 978-3-642-38750-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics