skip to main content
10.1145/3236367.3236377acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Using Node Information to Implement MPI Cartesian Topologies

Published:23 September 2018Publication History

ABSTRACT

The MPI API provides support for Cartesian process topologies, including the option to reorder the processes to achieve better communication performance. But MPI implementations rarely provide anything useful for the reorder option, typically ignoring it. One argument made is that modern interconnects are fast enough that applications are less sensitive to the exact layout of processes onto the system. However, intranode communication performance is much greater than internode communication performance. In this paper, we show a simple approach that takes into account only information about which MPI processes are on the same node to provide a fast and effective implementation of the MPI Cartesian topology. While not optimal, this approach provides a significant improvement over all tested MPI implementations and provides an implementation that may be used as the default in any MPI implementation of MPI_Cart_create.

References

  1. Blue Waters Project 2018. Topology Considerations. https://bluewaters.ncsa.illinois.edu/topology-considerations. (2018).Google ScholarGoogle Scholar
  2. CORAL 2014. CORAL Collaboration Benchmark Codes. https://asc.llnl.gov/CORAL-benchmarks/. (2014).Google ScholarGoogle Scholar
  3. Juan J. Galvez, Nikhil Jain, and Laxmikant V. Kale. 2017. Automatic Topology Mapping of Diverse Large-scale Parallel Applications. In Proceedings of the International Conference on Supercomputing (ICS '17). ACM, New York, NY, USA, Article 17, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. William Gropp, Luke N. Olson, and Philipp Samfass. 2016. Modeling MPI Communication Performance on SMP Nodes: Is It Time to Retire the Ping Pong Test. In Proceedings of the 23rd European MPI Users' Group Meeting (EuroMPI 2016). ACM, New York, NY, USA, 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. William D. Gropp and Ewing Lusk. 1999. Reproducible Measurements of MPI Performance Characteristics. In Recent Advances in Parallel Virtual Machine and Message Passing Interface (Lecture Notes in Computer Science), Jack Dongarra, Emilio Luque, and Tomàs Margalef (Eds.), Vol. 1697. Springer Verlag, 11--18. 6th European PVM/MPI Users' Group Meeting, Barcelona, Spain, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Hoefler, R. Rabenseifner, H. Ritzdorf, B. R. de Supinski, R. Thakur, and J. L. Träff. 2010. The Scalable Process Topology Interface of MPI 2.2. Concurrency and Computation: Practice and Experience 23, 4 (Aug. 2010), 293--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Hoefler and M. Snir. 2011. Generic Topology Mapping Strategies for Large-scale Parallel Architectures. In Proceedings of the 2011 ACM International Conference on Supercomputing (ICS'11). ACM, 75--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. HOMB 2009. Hybrid OpenMP MPI Benchmark 1.0. https://sourceforge.net/projects/homb/. (May 2009).Google ScholarGoogle Scholar
  9. HPCC 2003. HPC Challenge Benchmark. http://icl.cs.utk.edu/hpcc/. (2003).Google ScholarGoogle Scholar
  10. LAMMPS {n. d.}. LAMMPS Benchmarks. http://lammps.sandia.gov/bench.html. ({n. d.}).Google ScholarGoogle Scholar
  11. Piotr Luszczek, Jack J. Dongarra, David Koester, Rolf Rabenseifner, Bob Lucas, Jeremy Kepner, John McCalpin, David Bailey, and Daisuke Takahashi. 2005. Introduction to the HPC Challenge Benchmark Suite. Technical Report LBNL-57493. Lawrence Berkeley National Laboratory. https://www.osti.gov/servlets/purl/860347.Google ScholarGoogle Scholar
  12. Teng Ma, George Bosilca, Aurelien Bouteiller, and Jack J. Dongarra. 2010. Locality and Topology Aware Intra-node Communication Among Multicore CPUs. In Proceedings of the 17th European MPI Users' Group Meeting Conference on Recent Advances in the Message Passing Interface (EuroMPI'10). Springer-Verlag, Berlin, Heidelberg, 265--274. http://dl.acm.org/citation.cfm?id=1894122.1894158 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Guillaume Mercier and Emmanuel Jeannot. 2011. Improving MPI Applications Performance on Multicore Clusters with Rank Reordering. In Recent Advances in the Message Passing Interface, Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack Dongarra (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 39--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. NAS 2016. NAS Parallel Benchmarks. https://www.nas.nasa.gov/publications/npb.html. (2016).Google ScholarGoogle Scholar
  15. Antonio J. Peña, Ralf G. Correa Carvalho, James Dinan, Pavan Balaji, Rajeev Thakur, and William Gropp. 2013. Analysis of topology-dependent MPI performance on Gemini networks. In 20th European MPI Users's Group Meeting, EuroMPI '13, Madrid, Spain - September 15 - 18, 2013, Jack Dongarra, Javier García Blas, and Jesús Carretero (Eds.). ACM, 61--66. http://dl.acm.org/citation.cfm?id=2488551 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Steve Plimpton. 1995. Fast Parallel Algorithms for Short-range Molecular Dynamics. J. Comput. Phys. 117, 1 (March 1995), 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. POP. 2003. Parallel Ocean Program (POP) User Guide Version 2.0. Technical Report LACC 99--18. Los Alamos National Laboratory.Google ScholarGoogle Scholar
  18. Mohammad J. Rashti, Jonathan Green, Pavan Balaji, Ahmad Afsahi, and William Gropp. 2011. Multi-core and Network Aware MPI Topology Functions. In Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings (Lecture Notes in Computer Science), Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack Dongarra (Eds.), Vol. 6960. Springer, 50--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. SPP 2017. SPP-2017 Instructions and Reporting. (2017). https://bluewaters.ncsa.illinois.edu/spp-methodology.Google ScholarGoogle Scholar
  20. Jesper Larsson Träff. 2002. Implementing the MPI Process Topology Mechanism. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (SC '02). IEEE Computer Society Press, Los Alamitos, CA, USA, 1--14. http://dl.acm.org/citation.cfm?id=762761.762767 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jesper Larsson Träff and Felix Donatus Lübbe. 2015. Specification Guideline Violations by MPI_Dims_create. In Proceedings of the 22Nd European MPI Users' Group Meeting (EuroMPI '15). ACM, New York, NY, USA, Article 19, 2 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ghobad Zarrinchian, Mohsen Soryani, and Morteza Analoui. 2012. A New Process Placement Algorithm in Multi-core Clusters Aimed to Reducing Network Interface Contention. In Advances in Computer Science, Engineering & Applications, David C. Wyld, Jan Zizka, and Dhinaharan Nagamalai (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1041--1050.Google ScholarGoogle Scholar

Index Terms

  1. Using Node Information to Implement MPI Cartesian Topologies

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EuroMPI '18: Proceedings of the 25th European MPI Users' Group Meeting
      September 2018
      187 pages
      ISBN:9781450364928
      DOI:10.1145/3236367

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 September 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate66of139submissions,47%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader