skip to main content
10.1145/3339186.3339208acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

LPMS: A Low-cost Topology-aware Process Mapping Method for Large-scale Parallel Applications on Shared HPC Systems

Published: 05 August 2019 Publication History

Abstract

Topology-aware process mapping can reduce communication cost by embedding the application communication topology to the underlying networks. Being generally a NP-hard problem, process mapping methods strive to balance mapping cost and mapping performance. Moreover, many existing low-cost methods assume that the application owns the high-performance computer exclusively or the allocated resource forms a regular structure, which is no longer true under most supercomputers where the machine is often shared among users and applications, thus hinders their performance on such shard HPC systems. To address these issues, in this paper, we propose a label-propagation-based process mapping method, namely LPMS, that is both low-cost and fits well on shared HPC systems. Both analysis and experiments show that LPMS enjoys low algorithmic costs while maintaining performance even on a loaded shared HPC system. Real-world scientific application proxies gain a performance boost as large as 34.79% compared to default natural process placements on the TianHe-2 HPC system and a fat-tree-based cluster HPC system.

References

[1]
T. Agarwal, A. Sharma, A. Laxmikant, and L. V. Kale. 2006. Topology-aware task mapping for reducing communication contention on large parallel machines. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium. 10 pp.--.
[2]
S. ARUNKUMAR and T. CHOCKALINGAM. 2012. RANDOMIZED HEURISTICS FOR THE MAPPING PROBLEM. International Journal of High Speed Computing 4, 10 (2012), 289--299.
[3]
Gregory H. Bauer, Brett Bode, Jeremy Enos, William T. Kramer, Scott Lathrop, Celso L. Mendes, and Roberto R. Sisneros. {n. d.}. Best Practices and Lessons from Deploying and Operating a Sustained-Petascale System: The Blue Waters Experience. In Best Practices and Lessons from Deploying and Operating a Sustained-Petascale System: The Blue Waters Experience.
[4]
Gyan Bhanot, A. Gara, P. Heidelberger, E. Lawless, J. C. Sexton, and R. Walkup. 2005. Optimizing task layout on the Blue Gene/L supercomputer. Ibm Journal of Research and Development 49, 2.3 (2005), 489--500.
[5]
S. W. Bollinger and S. F. Midkiff. 1991. Heuristic technique for processor and link assignment in multicomputers. IEEE Computer Society. 325--333 pages.
[6]
Hu Chen, Wenguang Chen, Jian Huang, Bob Robert, and H. Kuhn. 2006. MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In International Conference on Supercomputing. 353--360.
[7]
Juan J. Galvez, Nikhil Jain, and Laxmikant V. Kale. 2017. Automatic topology mapping of diverse large-scale parallel applications. In International Conference on Supercomputing.
[8]
Shahid H. Bokhari. 1981. On the Mapping Problem. Computers, IEEE Transactions on C-30 (04 1981), 207--214.
[9]
Michael A Heroux, Douglas W Doerfler, Paul S Crozier, James M Willenbring, H Carter Edwards, Alan Williams, Mahesh Rajan, Eric R Keiter, Heidi K Thornquist, and Robert W Numrich. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories.
[10]
Torsten Hoefler, Emmanuel Jeannot, Guillaume Mercier, Emmanuel Jeannot, and Julius Žilinskas. 2014. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing.
[11]
Torsten Hoefler and Marc Snir. 2011. Generic topology mapping strategies for large-scale parallel architectures. In International Conference on Supercomputing, 2011, Tucson, Az, Usa, May 31 - June. 75--84.
[12]
Jeannot. 2018. TreeMatch. http://treematch.gforge.inria.fr/.
[13]
Emmanuel Jeannot and Guillaume Mercier. 2010. Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures. In International Euro-Par Conference on Parallel Processing. 199--210.
[14]
Emmanuel Jeannot, Guillaume Mercier, and Francois Tessier. 2014. Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques. IEEE Transactions on Parallel and Distributed Systems 25, 4 (2014), 993--1002.
[15]
George Karypis and Vipin Kumar. 1995. METIS--unstructured graph partitioning and sparse matrix ordering system, version 2.0. (1995).
[16]
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359--392.
[17]
George Karypis and Vipin Kumar. 2005. Multilevel k-way Partitioning Scheme for Irregular Graphs. In Supercomputing, 1996. Proceedings of the 1996 ACM/IEEE Conference on. 35.
[18]
Soo Young Lee and J. K. Aggarwal. 1987. A Mapping Strategy for Parallel Processing. IEEE Computer Society. 433--442 pages.
[19]
R Leland. 1995. The Chaco User's Guide Version 2.0. Technical Report. Technical Report SAND95-2344, Sandia National Laboratories, Albaquerque, NM...
[20]
George Michelogiannakis, Khaled Z. Ibrahim, John Shalf, Jeremiah J. Wilke, Samuel Knight, and Joseph P. Kenny. 2017. APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks. In Ieee/acm International Symposium on Cluster, Cloud and Grid Computing. 228--237.
[21]
NationalGrid.org. 2008. National Grid. http://www.cngrid.org/.
[22]
Juan M. Orduña, Federico Silla, and José Duato. 2001. A New Task Mapping Technique for Communication-Aware Scheduling Strategies. In International Conference on Parallel Processing Workshops. 349.
[23]
David Padua (Ed.). 2011. Encyclopedia of Parallel Computing. Springer US, Boston, MA.
[24]
Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detection in Social Media. Data Mining & Knowledge Discovery 24, 3 (2012), 515--554.
[25]
François Pellegrini and Jean Roman. 1996. SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs. In International Conference and Exhibition on High-performance Computing and Networking.
[26]
Craig S Steele. 1985. Placement of communicating processes on multiprocessor networks. (1985).
[27]
ASCAC Subcommittee et al. 2014. Top ten exascale research challenges. US Department Of Energy Report (2014).
[28]
J. L. Traff. 2002. Implementing the MPI Process Topology Mechanism. In IEEE Supercomputing Conference SC2002, November. 28--28.
[29]
Jingjin Wu, Xuanxing Xiong, and Zhiling Lan. 2015. Hierarchical task mapping for parallel applications on supercomputers. Journal of Supercomputing 71, 5 (2015), 1776--1802.
[30]
Haili Xiao, Hong Wu, and Xuebin Chi. 2009. SCE: Grid Environment for Scientific Computing. In Networks for Grid Applications. 35--42.
[31]
Jin Zhang, Jidong Zhai, Wenguang Chen, and Weimin Zheng. 2009. Process mapping for mpi collective communications. In European Conference on Parallel Processing. Springer, 81--92.
[32]
Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03). 912--919.

Cited By

View all
  • (2024)Network-Centered Resource Management for HPC Networks2024 IEEE 10th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft60951.2024.10588913(235-238)Online publication date: 24-Jun-2024
  • (2022)IPMPI: Improved MPI Communication Logger2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI56604.2022.00009(31-40)Online publication date: Nov-2022
  • (2021)Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622783(1-7)Online publication date: 20-Sep-2021

Index Terms

  1. LPMS: A Low-cost Topology-aware Process Mapping Method for Large-scale Parallel Applications on Shared HPC Systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel Processing
    August 2019
    241 pages
    ISBN:9781450371964
    DOI:10.1145/3339186
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • University of Tsukuba: University of Tsukuba

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 August 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Communication Optimization
    2. Shared HPC System
    3. Topology-aware Process Mapping

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICPP 2019
    ICPP 2019: Workshops
    August 5 - 8, 2019
    Kyoto, Japan

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Network-Centered Resource Management for HPC Networks2024 IEEE 10th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft60951.2024.10588913(235-238)Online publication date: 24-Jun-2024
    • (2022)IPMPI: Improved MPI Communication Logger2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI56604.2022.00009(31-40)Online publication date: Nov-2022
    • (2021)Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622783(1-7)Online publication date: 20-Sep-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media