skip to main content
10.1145/2488551.2488603acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Advancing application process affinity experimentation: open MPI's LAMA-based affinity interface

Published: 15 September 2013 Publication History

Abstract

Application studies have shown that the tuning of Message Passing Interface (MPI) processes placement in a server's non-uniform memory access (NUMA) networking topology can have a dramatic impact on performance. The performance implications are magnified when running a parallel job across multiple server nodes, especially with large scale MPI applications. As processor and NUMA topologies continue to grow more complex to meet the demands of ever-increasing processor core counts, best practices regarding process placement also need to evolve.
This paper presents Open MPI's flexible interface for distributing the individual processes of a parallel job across processing resources in a High Performance Computing (HPC) system, paying particular attention to the internal server NUMA topologies. The interface is a realization of the Locality-Aware Mapping Algorithm (LAMA) [8], and provides both simple and complex mechanisms for specifying regular process-to-processor mappings and affinitization. Open MPI's LAMA implementation is intended as a tool for MPI users to experiment with different process placement strategies on both current and emerging HPC platforms.

References

[1]
G. Almási, C. Archer, et al. Implementing MPI on the BlueGene/L supercomputer. In M. Danelutto and et al., editors, Euro-Par 2004 Parallel Processing, volume 3149 of Lecture Notes in Computer Science, pages 833--845. Springer Berlin/Heidelberg, 2004.
[2]
Argonne National Laboratory. MPICH. http://www.mpich.org/.
[3]
Argonne National Laboratory. Using the Hydra process manager. http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager.
[4]
F. Broquedis, J. Clet-Ortega, et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications. In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2010), pages 180--186, Pisa, Italia, Feb. 2010. IEEE Computer Society Press.
[5]
S. Ethier, W. M. Tang, et al. Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas. IBM Journal of Research and Development, 52:105--115, January 2008.
[6]
E. Gabriel, G. E. Fagg, et al. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proceedings of the 11th European PVM/MPI Users' Group Meeting, pages 97--104, Budapest, Hungary, September 2004.
[7]
M. Gilge. IBM System Blue Gene Solution: Blue Gene/Q application development. Technical report, IBM, February 2013.
[8]
J. Hursey, J. M. Squyres, et al. Locality-aware parallel process mapping for multi-core HPC systems. In IEEE International Conference on Cluster Computing, Austin, TX, September 2011. (Poster).
[9]
E. Jeannot and G. Mercier. Near-optimal placement of MPI processes on hierarchical NUMA architectures. In Proceedings of the 16th International Euro-Par Conference on Parallel Processing, Euro-Par'10, pages 199--210, Berlin, Heidelberg, 2010. Springer-Verlag.
[10]
M. Karo, R. Lagerstrom, et al. The application level placement scheduler. In Cray Users Group, 2006.
[11]
A. Yoo, M. Jette, et al. SLURM: Simple Linux Utility for Resource Management. In D. Feitelson, L. Rudolph, and U. Schwiegelshohn, editors, Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science, pages 44--60. Springer Berlin/Heidelberg, 2003.
[12]
H. Yu, I.-H. Chung, et al. Topology mapping for Blue Gene/L supercomputer. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC '06, New York, NY, USA, 2006. ACM.

Cited By

View all
  • (2024)Exploring Architectural-Aware Affinity Policies in Modern HPC RuntimesPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670566(1-5)Online publication date: 17-Jul-2024
  • (2023)PAARes: an efficient process allocation based on the available resources of cluster nodesThe Journal of Supercomputing10.1007/s11227-023-05085-779:9(10423-10441)Online publication date: 8-Feb-2023
  • (2020)Application-Driven Requirements for Node Resource Management in Next-Generation Systems2020 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)10.1109/ROSS51935.2020.00006(1-11)Online publication date: Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting
September 2013
289 pages
ISBN:9781450319034
DOI:10.1145/2488551
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • ARCOS: Computer Architecture and Technology Area, Universidad Carlos III de Madrid

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPI
  2. NUMA
  3. locality
  4. process affinity
  5. resource management

Qualifiers

  • Research-article

Funding Sources

Conference

EuroMPI '13
Sponsor:
  • ARCOS
EuroMPI '13: 20th European MPI Users's Group Meeting
September 15 - 18, 2013
Madrid, Spain

Acceptance Rates

EuroMPI '13 Paper Acceptance Rate 22 of 47 submissions, 47%;
Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring Architectural-Aware Affinity Policies in Modern HPC RuntimesPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670566(1-5)Online publication date: 17-Jul-2024
  • (2023)PAARes: an efficient process allocation based on the available resources of cluster nodesThe Journal of Supercomputing10.1007/s11227-023-05085-779:9(10423-10441)Online publication date: 8-Feb-2023
  • (2020)Application-Driven Requirements for Node Resource Management in Next-Generation Systems2020 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)10.1109/ROSS51935.2020.00006(1-11)Online publication date: Nov-2020
  • (2018)HPC Process and Optimal Network Device AffinitizationIEEE Transactions on Multi-Scale Computing Systems10.1109/TMSCS.2018.28714444:4(749-757)Online publication date: 1-Oct-2018
  • (2017)On the Overhead of Topology Discovery for Locality-Aware Scheduling in HPC2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP.2017.35(186-190)Online publication date: 2017
  • (2017)MPI Process and Network Device Affinitization for Optimal HPC Application Performance2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI)10.1109/HOTI.2017.12(80-86)Online publication date: Aug-2017
  • (2017)APHiDProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.33(228-237)Online publication date: 14-May-2017
  • (2016)Exposing the Locality of Heterogeneous Memory Architectures to HPC ApplicationsProceedings of the Second International Symposium on Memory Systems10.1145/2989081.2989115(30-39)Online publication date: 3-Oct-2016
  • (2014)Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc)2014 International Conference on High Performance Computing & Simulation (HPCS)10.1109/HPCSim.2014.6903671(74-81)Online publication date: Jul-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media