OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

Li, Yi-shui; Chen, Xin-hai; Liu, Jie; Yang, Bo; Gong, Chun-ye; Gan, Xin-biao; Li, Sheng-guo; Xu, Han

doi:10.1631/FITEE.1900075

OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

Published: 03 July 2020

Volume 21, pages 939–949, (2020)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Yi-shui Li ORCID: orcid.org/0000-0001-7826-7504¹,
Xin-hai Chen¹,
Jie Liu ORCID: orcid.org/0000-0003-3745-7541¹,
Bo Yang¹,
Chun-ye Gong¹,
Xin-biao Gan¹,
Sheng-guo Li¹ &
…
Han Xu¹

128 Accesses
8 Citations
Explore all metrics

Abstract

With the rapid increase of the size of applications and the complexity of the supercomputer architecture, topology-aware process mapping becomes increasingly important. High communication cost has become a dominant constraint of the performance of applications running on the supercomputer. To avoid a bad mapping strategy which can lead to terrible communication performance, we propose an optimized heuristic topology-aware mapping algorithm (OHTMA). The algorithm attempts to minimize the hop-byte metric that we use to measure the mapping results. OHTMA incorporates a new greedy heuristic method and pair-exchange-based optimization. It reduces the number of long-distance communications and effectively enhances the locality of the communication. Experimental results on the Tianhe-3 exascale supercomputer prototype indicate that OHTMA can significantly reduce the communication costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simulated Annealing: From Basics to Applications

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Quantum-inspired metaheuristic algorithms: comprehensive survey and classification

Article 02 November 2022

References

Agarwal T, Sharma A, Laxmikant A, et al., 2006. Topology-aware task mapping for reducing communication contention on large parallel machines. Proc 20^th IEEE Int Parallel & Distributed Processing Symp, p. 1–10. https://doi.org/10.1109/IPDPS.2006.1639379
Bailey DH, Barszcz E, Barton JT, et al., 1991. The NAS parallel benchmarks—summary and preliminary results. Proc ACM/IEEE Conf on Supercomputing, p.158–165. https://doi.org/10.1145/125826.125925
Bhatele A, 2010. Automating Topology Aware Mapping for Supercomputers. PhD Thesis, University of Illinois at Urbana-Champaign, Urbana, USA.
Google Scholar
Bhatele A, Laxmikant V, 2009. An evaluative study on the effect of contention on message latencies in large supercomputers. Proc IEEE Int Symp on Parallel & Distributed Processing, p.1–8. https://doi.org/10.1109/IPDPS.2009.5161094
Brandfass B, Alrutz T, Gerhold T, 2013. Rank reordering for MPI communication optimization. Comput Fluid, 80:372–380. https://doi.org/10.1016/j.compfluid.2012.01.019
Article Google Scholar
Chen X, Liu J, Li S, et al., 2018. TAMM: a new topology-aware mapping method for parallel applications on the Tianhe-2A supercomputer. Proc 18^th Int Conf on Algorithms and Architectures for Parallel Processing, p.242–256. https://doi.org/10.1007/978-3-030-05051-1_17
Deveci M, Kaya K, Uçar B, et al., 2015. Fast and high quality topology-aware task mapping. Proc IEEE Int Parallel and Distributed Processing Symp, p.197–206. https://doi.org/10.1109/IPDPS.2015.93
Hoefler T, Snir M, 2011. Generic topology mapping strategies for large-scale parallel architectures. Proc Int Conf on Supercomputing, p.75–84. https://doi.org/10.1145/1995896.1995909
Hoefler T, Jeannot E, Mercier G, 2014. An overview of topology mapping algorithms and techniques in highperformance computing. In: Jeannot E, Zilinskas J (Eds.), High-Performance Computing on Complex Environments. Wiley, Hoboken, New Jersey, USA. https://doi.org/10.1002/9781118711897.ch5
Google Scholar
Jeannot E, Mercier G, 2010. Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: D’Ambra P, Guarracino M, Talia D (Eds.), Euro-Par 2010 Parallel Processing. Springer Berlin Heidelberg, Germany, p.199–210. https://doi.org/10.1007/978-3-642-15291-7_20
Chapter Google Scholar
Jeannot E, Mercier G, Tessier F, 2014. Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans Parall Distrib Syst, 25(4):993–1002. https://doi.org/10.1109/TPDS.2013.104
Article Google Scholar
Karypis G, Kumar V, 1998. METIS—A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes and Computing Fill-Reducing Ordering of Sparse Matrices. Technical Report, University of Minnesota, Minneapolis, USA.
Google Scholar
Liao X, Pang Z, Wang K, et al., 2015. High performance interconnect network for Tianhe system. J Comput Sci Technol, 30(2):259–272. https://doi.org/10.1007/s11390-015-1520-7
Article Google Scholar
Mercier G, Clet-Ortega J, 2009. Towards an efficient process placement policy for MPI applications in multicore environments. In: Ropo M, Westerholm J, Dongarra J (Eds.), Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer Berlin Heidelberg, Germany, p.104–115. https://doi.org/10.1007/978-3-642-03770-2_17
Chapter Google Scholar
Mirsadeghi SH, Afsahi A, 2016. PTRAM: a parallel topology-and routing-aware mapping framework for large-scale HPC systems. Proc IEEE Int Parallel and Distributed Processing Symp Workshops, p.386–396. https://doi.org/10.1109/IPDPSW.2016.146
Pellegrini F, Roman J, 1996. SCOTCH: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. Proc Int Conf and Exhibition on High-Performance Computing and Networking, p.493–498. https://doi.org/10.1007/3-540-61142-8_588
Rodrigues E, Madruga F, Navaux P, et al., 2009. Multi-core aware process mapping and its impact on communication overhead of parallel applications. Int Symp on Computers and Communications, p.811–817. https://doi.org/10.1109/ISCC.2009.5202271
Sahni S, Gonzalez T, 1976. P-complete approximation problems. JACM, 23(3):555–565. https://doi.org/10.1145/321958.321975
Article MathSciNet Google Scholar
Sudheer CD, Srinivasan A, 2012. Optimization of the hopbyte metric for effective topology aware mapping. Proc 19^th Int Conf on High Performance Computing, p.1–9. https://doi.org/10.1109/HiPC.2012.6507513
Tuncer O, Leung VJ, Coskun AK, 2015. PaCMap: topology mapping of unstructured communication patterns onto non-contiguous allocations. Proc 29^th ACM on Int Conf on Supercomputing, p.37–46. https://doi.org/10.1145/2751205.2751225
Walshaw C, Cross M, 2007. JOSTLE—parallel multilevel graph-partitioning software: an overview. In: Magoulès F (Ed.), Mesh Partitioning Techniques and Domain Decomposition Methods. Saxe-Coburg Publications, Stirlingshire, UK, p.22–58. https://doi.org/10.4203/csets.17.2
Google Scholar
Wang T, Qing P, Wei D, et al., 2015. Optimization of process-to-core mapping based on clustering analysis. Chin J Comput, 38(5):1044–1055 (in Chinese).
MathSciNet Google Scholar
Wylie BJN, Böhme D, Mohr B, et al., 2010. Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset. Proc IEEE Int Symp on Parallel & Distributed Processing, Workshops and PhD Forum, p.1–8. https://doi.org/10.1109/IPDPSW.2010.5470816
Zerr RJ, Baker RS, 2013. Snap: SN (Discrete Ordinates) Application Proxy-Proxy Description. Technical Report, LA-UR-13–21070, Los Alamos National Laboratory, Los Alamos, USA.
Google Scholar

Download references

Author information

Authors and Affiliations

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, 410073, China
Yi-shui Li, Xin-hai Chen, Jie Liu, Bo Yang, Chun-ye Gong, Xin-biao Gan, Sheng-guo Li & Han Xu

Authors

Yi-shui Li
View author publications
You can also search for this author in PubMed Google Scholar
Xin-hai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chun-ye Gong
View author publications
You can also search for this author in PubMed Google Scholar
Xin-biao Gan
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-guo Li
View author publications
You can also search for this author in PubMed Google Scholar
Han Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yi-shui LI designed the research. Jie LIU guided the research. Xin-hai CHEN helped perform experiments. Yishui LI drafted the manuscript. Bo YANG, Chun-ye GONG, and Xin-biao GAN helped organize the manuscript. Shengguo LI and Han XU helped modify the manuscript. Yi-shui LI revised and finalized the paper.

Corresponding author

Correspondence to Jie Liu.

Additional information

Compliance with ethics guidelines

Yi-shui LI, Xin-hai CHEN, Jie LIU, Bo YANG, Chun-ye GONG, Xin-biao GAN, Sheng-guo LI, and Han XU declare that they have no conflict of interest.

Project supported by the National Key Research and Development Program of China (No. 2017YFB0202104)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Ys., Chen, Xh., Liu, J. et al. OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype. Front Inform Technol Electron Eng 21, 939–949 (2020). https://doi.org/10.1631/FITEE.1900075

Download citation

Received: 09 February 2019
Accepted: 12 July 2019
Published: 03 July 2020
Issue Date: June 2020
DOI: https://doi.org/10.1631/FITEE.1900075

Key words

CLC number

TP319

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

Abstract

Access this article

Similar content being viewed by others

Simulated Annealing: From Basics to Applications

Containerization technologies: taxonomies, applications and challenges

Quantum-inspired metaheuristic algorithms: comprehensive survey and classification

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

Abstract

Access this article

Similar content being viewed by others

Simulated Annealing: From Basics to Applications

Containerization technologies: taxonomies, applications and challenges

Quantum-inspired metaheuristic algorithms: comprehensive survey and classification

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation