research-article

LPMS: A Low-cost Topology-aware Process Mapping Method for Large-scale Parallel Applications on Shared HPC Systems

Authors:

Song YaoAuthors Info & Claims

ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel Processing

Article No.: 27, Pages 1 - 10

https://doi.org/10.1145/3339186.3339208

Published: 05 August 2019 Publication History

Abstract

Topology-aware process mapping can reduce communication cost by embedding the application communication topology to the underlying networks. Being generally a NP-hard problem, process mapping methods strive to balance mapping cost and mapping performance. Moreover, many existing low-cost methods assume that the application owns the high-performance computer exclusively or the allocated resource forms a regular structure, which is no longer true under most supercomputers where the machine is often shared among users and applications, thus hinders their performance on such shard HPC systems. To address these issues, in this paper, we propose a label-propagation-based process mapping method, namely LPMS, that is both low-cost and fits well on shared HPC systems. Both analysis and experiments show that LPMS enjoys low algorithmic costs while maintaining performance even on a loaded shared HPC system. Real-world scientific application proxies gain a performance boost as large as 34.79% compared to default natural process placements on the TianHe-2 HPC system and a fat-tree-based cluster HPC system.

References

[1]

T. Agarwal, A. Sharma, A. Laxmikant, and L. V. Kale. 2006. Topology-aware task mapping for reducing communication contention on large parallel machines. In Proceedings 20th IEEE International Parallel Distributed Processing Symposium. 10 pp.--.

Digital Library

[2]

S. ARUNKUMAR and T. CHOCKALINGAM. 2012. RANDOMIZED HEURISTICS FOR THE MAPPING PROBLEM. International Journal of High Speed Computing 4, 10 (2012), 289--299.

[3]

Gregory H. Bauer, Brett Bode, Jeremy Enos, William T. Kramer, Scott Lathrop, Celso L. Mendes, and Roberto R. Sisneros. {n. d.}. Best Practices and Lessons from Deploying and Operating a Sustained-Petascale System: The Blue Waters Experience. In Best Practices and Lessons from Deploying and Operating a Sustained-Petascale System: The Blue Waters Experience.

[4]

Gyan Bhanot, A. Gara, P. Heidelberger, E. Lawless, J. C. Sexton, and R. Walkup. 2005. Optimizing task layout on the Blue Gene/L supercomputer. Ibm Journal of Research and Development 49, 2.3 (2005), 489--500.

Digital Library

[5]

S. W. Bollinger and S. F. Midkiff. 1991. Heuristic technique for processor and link assignment in multicomputers. IEEE Computer Society. 325--333 pages.

Digital Library

[6]

Hu Chen, Wenguang Chen, Jian Huang, Bob Robert, and H. Kuhn. 2006. MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In International Conference on Supercomputing. 353--360.

Digital Library

[7]

Juan J. Galvez, Nikhil Jain, and Laxmikant V. Kale. 2017. Automatic topology mapping of diverse large-scale parallel applications. In International Conference on Supercomputing.

Digital Library

[8]

Shahid H. Bokhari. 1981. On the Mapping Problem. Computers, IEEE Transactions on C-30 (04 1981), 207--214.

Digital Library

[9]

Michael A Heroux, Douglas W Doerfler, Paul S Crozier, James M Willenbring, H Carter Edwards, Alan Williams, Mahesh Rajan, Eric R Keiter, Heidi K Thornquist, and Robert W Numrich. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories.

[10]

Torsten Hoefler, Emmanuel Jeannot, Guillaume Mercier, Emmanuel Jeannot, and Julius Žilinskas. 2014. An Overview of Topology Mapping Algorithms and Techniques in High-Performance Computing.

[11]

Torsten Hoefler and Marc Snir. 2011. Generic topology mapping strategies for large-scale parallel architectures. In International Conference on Supercomputing, 2011, Tucson, Az, Usa, May 31 - June. 75--84.

Digital Library

[12]

Jeannot. 2018. TreeMatch. http://treematch.gforge.inria.fr/.

[13]

Emmanuel Jeannot and Guillaume Mercier. 2010. Near-Optimal Placement of MPI Processes on Hierarchical NUMA Architectures. In International Euro-Par Conference on Parallel Processing. 199--210.

Digital Library

[14]

Emmanuel Jeannot, Guillaume Mercier, and Francois Tessier. 2014. Process Placement in Multicore Clusters:Algorithmic Issues and Practical Techniques. IEEE Transactions on Parallel and Distributed Systems 25, 4 (2014), 993--1002.

Digital Library

[15]

George Karypis and Vipin Kumar. 1995. METIS--unstructured graph partitioning and sparse matrix ordering system, version 2.0. (1995).

[16]

George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359--392.

Digital Library

[17]

George Karypis and Vipin Kumar. 2005. Multilevel k-way Partitioning Scheme for Irregular Graphs. In Supercomputing, 1996. Proceedings of the 1996 ACM/IEEE Conference on. 35.

Digital Library

[18]

Soo Young Lee and J. K. Aggarwal. 1987. A Mapping Strategy for Parallel Processing. IEEE Computer Society. 433--442 pages.

[19]

R Leland. 1995. The Chaco User's Guide Version 2.0. Technical Report. Technical Report SAND95-2344, Sandia National Laboratories, Albaquerque, NM...

[20]

George Michelogiannakis, Khaled Z. Ibrahim, John Shalf, Jeremiah J. Wilke, Samuel Knight, and Joseph P. Kenny. 2017. APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks. In Ieee/acm International Symposium on Cluster, Cloud and Grid Computing. 228--237.

Digital Library

[21]

NationalGrid.org. 2008. National Grid. http://www.cngrid.org/.

[22]

Juan M. Orduña, Federico Silla, and José Duato. 2001. A New Task Mapping Technique for Communication-Aware Scheduling Strategies. In International Conference on Parallel Processing Workshops. 349.

Digital Library

[23]

David Padua (Ed.). 2011. Encyclopedia of Parallel Computing. Springer US, Boston, MA.

Digital Library

[24]

Symeon Papadopoulos, Yiannis Kompatsiaris, Athena Vakali, and Ploutarchos Spyridonos. 2012. Community detection in Social Media. Data Mining & Knowledge Discovery 24, 3 (2012), 515--554.

Digital Library

[25]

François Pellegrini and Jean Roman. 1996. SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs. In International Conference and Exhibition on High-performance Computing and Networking.

Digital Library

[26]

Craig S Steele. 1985. Placement of communicating processes on multiprocessor networks. (1985).

[27]

ASCAC Subcommittee et al. 2014. Top ten exascale research challenges. US Department Of Energy Report (2014).

[28]

J. L. Traff. 2002. Implementing the MPI Process Topology Mechanism. In IEEE Supercomputing Conference SC2002, November. 28--28.

Digital Library

[29]

Jingjin Wu, Xuanxing Xiong, and Zhiling Lan. 2015. Hierarchical task mapping for parallel applications on supercomputers. Journal of Supercomputing 71, 5 (2015), 1776--1802.

Digital Library

[30]

Haili Xiao, Hong Wu, and Xuebin Chi. 2009. SCE: Grid Environment for Scientific Computing. In Networks for Grid Applications. 35--42.

[31]

Jin Zhang, Jidong Zhai, Wenguang Chen, and Weimin Zheng. 2009. Process mapping for mpi collective communications. In European Conference on Parallel Processing. Springer, 81--92.

Digital Library

[32]

Xiaojin Zhu, Zoubin Ghahramani, and John D Lafferty. 2003. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International conference on Machine learning (ICML-03). 912--919.

Digital Library

Cited By

Van Poucke DTavernier WColle D(2024)Network-Centered Resource Management for HPC Networks2024 IEEE 10th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft60951.2024.10588913(235-238)Online publication date: 24-Jun-2024
https://doi.org/10.1109/NetSoft60951.2024.10588913
Agrawal TMalakar P(2022)IPMPI: Improved MPI Communication Logger2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI56604.2022.00009(31-40)Online publication date: Nov-2022
https://doi.org/10.1109/ExaMPI56604.2022.00009
Zhang YAksar BAaziz OSchwaller BBrandt JLeung VEgele MCoskun A(2021)Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622783(1-7)Online publication date: 20-Sep-2021
https://doi.org/10.1109/HPEC49654.2021.9622783

Index Terms

LPMS: A Low-cost Topology-aware Process Mapping Method for Large-scale Parallel Applications on Shared HPC Systems
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Grid computing

Recommendations

Hierarchical task mapping for parallel applications on supercomputers

As the scale of supercomputers grows, so does the size of the interconnect network. Topology-aware task mapping, which maps parallel application processes onto processors to reduce communication cost, becomes increasingly important. Previous works ...
Topology mapping of irregular parallel applications on torus-connected supercomputers

Supercomputers with ever increasing computing power are being built for scientific applications. As the system size scales up, so does the size of interconnect network. As a result, communication in supercomputers becomes increasingly expensive due to ...
Performance evaluation of adaptive MPI
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Processor virtualization via migratable objects is a powerful technique that enables the runtime system to carry out intelligent adaptive optimizations like dynamic resource management. Charm++ is an early language/system that supports migratable ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP Workshops '19: Workshop Proceedings of the 48th International Conference on Parallel Processing

August 2019

241 pages

ISBN:9781450371964

DOI:10.1145/3339186

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP 2019

ICPP 2019: Workshops

August 5 - 8, 2019

Kyoto, Japan

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
166
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Van Poucke DTavernier WColle D(2024)Network-Centered Resource Management for HPC Networks2024 IEEE 10th International Conference on Network Softwarization (NetSoft)10.1109/NetSoft60951.2024.10588913(235-238)Online publication date: 24-Jun-2024
https://doi.org/10.1109/NetSoft60951.2024.10588913
Agrawal TMalakar P(2022)IPMPI: Improved MPI Communication Logger2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI)10.1109/ExaMPI56604.2022.00009(31-40)Online publication date: Nov-2022
https://doi.org/10.1109/ExaMPI56604.2022.00009
Zhang YAksar BAaziz OSchwaller BBrandt JLeung VEgele MCoskun A(2021)Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622783(1-7)Online publication date: 20-Sep-2021
https://doi.org/10.1109/HPEC49654.2021.9622783

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten