Skip to main content
Log in

OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

With the rapid increase of the size of applications and the complexity of the supercomputer architecture, topology-aware process mapping becomes increasingly important. High communication cost has become a dominant constraint of the performance of applications running on the supercomputer. To avoid a bad mapping strategy which can lead to terrible communication performance, we propose an optimized heuristic topology-aware mapping algorithm (OHTMA). The algorithm attempts to minimize the hop-byte metric that we use to measure the mapping results. OHTMA incorporates a new greedy heuristic method and pair-exchange-based optimization. It reduces the number of long-distance communications and effectively enhances the locality of the communication. Experimental results on the Tianhe-3 exascale supercomputer prototype indicate that OHTMA can significantly reduce the communication costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agarwal T, Sharma A, Laxmikant A, et al., 2006. Topology-aware task mapping for reducing communication contention on large parallel machines. Proc 20th IEEE Int Parallel & Distributed Processing Symp, p. 1–10. https://doi.org/10.1109/IPDPS.2006.1639379

  • Bailey DH, Barszcz E, Barton JT, et al., 1991. The NAS parallel benchmarks—summary and preliminary results. Proc ACM/IEEE Conf on Supercomputing, p.158–165. https://doi.org/10.1145/125826.125925

  • Bhatele A, 2010. Automating Topology Aware Mapping for Supercomputers. PhD Thesis, University of Illinois at Urbana-Champaign, Urbana, USA.

    Google Scholar 

  • Bhatele A, Laxmikant V, 2009. An evaluative study on the effect of contention on message latencies in large supercomputers. Proc IEEE Int Symp on Parallel & Distributed Processing, p.1–8. https://doi.org/10.1109/IPDPS.2009.5161094

  • Brandfass B, Alrutz T, Gerhold T, 2013. Rank reordering for MPI communication optimization. Comput Fluid, 80:372–380. https://doi.org/10.1016/j.compfluid.2012.01.019

    Article  Google Scholar 

  • Chen X, Liu J, Li S, et al., 2018. TAMM: a new topology-aware mapping method for parallel applications on the Tianhe-2A supercomputer. Proc 18th Int Conf on Algorithms and Architectures for Parallel Processing, p.242–256. https://doi.org/10.1007/978-3-030-05051-1_17

  • Deveci M, Kaya K, Uçar B, et al., 2015. Fast and high quality topology-aware task mapping. Proc IEEE Int Parallel and Distributed Processing Symp, p.197–206. https://doi.org/10.1109/IPDPS.2015.93

  • Hoefler T, Snir M, 2011. Generic topology mapping strategies for large-scale parallel architectures. Proc Int Conf on Supercomputing, p.75–84. https://doi.org/10.1145/1995896.1995909

  • Hoefler T, Jeannot E, Mercier G, 2014. An overview of topology mapping algorithms and techniques in highperformance computing. In: Jeannot E, Zilinskas J (Eds.), High-Performance Computing on Complex Environments. Wiley, Hoboken, New Jersey, USA. https://doi.org/10.1002/9781118711897.ch5

    Google Scholar 

  • Jeannot E, Mercier G, 2010. Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: D’Ambra P, Guarracino M, Talia D (Eds.), Euro-Par 2010 Parallel Processing. Springer Berlin Heidelberg, Germany, p.199–210. https://doi.org/10.1007/978-3-642-15291-7_20

    Chapter  Google Scholar 

  • Jeannot E, Mercier G, Tessier F, 2014. Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans Parall Distrib Syst, 25(4):993–1002. https://doi.org/10.1109/TPDS.2013.104

    Article  Google Scholar 

  • Karypis G, Kumar V, 1998. METIS—A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes and Computing Fill-Reducing Ordering of Sparse Matrices. Technical Report, University of Minnesota, Minneapolis, USA.

    Google Scholar 

  • Liao X, Pang Z, Wang K, et al., 2015. High performance interconnect network for Tianhe system. J Comput Sci Technol, 30(2):259–272. https://doi.org/10.1007/s11390-015-1520-7

    Article  Google Scholar 

  • Mercier G, Clet-Ortega J, 2009. Towards an efficient process placement policy for MPI applications in multicore environments. In: Ropo M, Westerholm J, Dongarra J (Eds.), Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer Berlin Heidelberg, Germany, p.104–115. https://doi.org/10.1007/978-3-642-03770-2_17

    Chapter  Google Scholar 

  • Mirsadeghi SH, Afsahi A, 2016. PTRAM: a parallel topology-and routing-aware mapping framework for large-scale HPC systems. Proc IEEE Int Parallel and Distributed Processing Symp Workshops, p.386–396. https://doi.org/10.1109/IPDPSW.2016.146

  • Pellegrini F, Roman J, 1996. SCOTCH: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. Proc Int Conf and Exhibition on High-Performance Computing and Networking, p.493–498. https://doi.org/10.1007/3-540-61142-8_588

  • Rodrigues E, Madruga F, Navaux P, et al., 2009. Multi-core aware process mapping and its impact on communication overhead of parallel applications. Int Symp on Computers and Communications, p.811–817. https://doi.org/10.1109/ISCC.2009.5202271

  • Sahni S, Gonzalez T, 1976. P-complete approximation problems. JACM, 23(3):555–565. https://doi.org/10.1145/321958.321975

    Article  MathSciNet  Google Scholar 

  • Sudheer CD, Srinivasan A, 2012. Optimization of the hopbyte metric for effective topology aware mapping. Proc 19th Int Conf on High Performance Computing, p.1–9. https://doi.org/10.1109/HiPC.2012.6507513

  • Tuncer O, Leung VJ, Coskun AK, 2015. PaCMap: topology mapping of unstructured communication patterns onto non-contiguous allocations. Proc 29th ACM on Int Conf on Supercomputing, p.37–46. https://doi.org/10.1145/2751205.2751225

  • Walshaw C, Cross M, 2007. JOSTLE—parallel multilevel graph-partitioning software: an overview. In: Magoulès F (Ed.), Mesh Partitioning Techniques and Domain Decomposition Methods. Saxe-Coburg Publications, Stirlingshire, UK, p.22–58. https://doi.org/10.4203/csets.17.2

    Google Scholar 

  • Wang T, Qing P, Wei D, et al., 2015. Optimization of process-to-core mapping based on clustering analysis. Chin J Comput, 38(5):1044–1055 (in Chinese).

    MathSciNet  Google Scholar 

  • Wylie BJN, Böhme D, Mohr B, et al., 2010. Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset. Proc IEEE Int Symp on Parallel & Distributed Processing, Workshops and PhD Forum, p.1–8. https://doi.org/10.1109/IPDPSW.2010.5470816

  • Zerr RJ, Baker RS, 2013. Snap: SN (Discrete Ordinates) Application Proxy-Proxy Description. Technical Report, LA-UR-13–21070, Los Alamos National Laboratory, Los Alamos, USA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Yi-shui LI designed the research. Jie LIU guided the research. Xin-hai CHEN helped perform experiments. Yishui LI drafted the manuscript. Bo YANG, Chun-ye GONG, and Xin-biao GAN helped organize the manuscript. Shengguo LI and Han XU helped modify the manuscript. Yi-shui LI revised and finalized the paper.

Corresponding author

Correspondence to Jie Liu.

Additional information

Compliance with ethics guidelines

Yi-shui LI, Xin-hai CHEN, Jie LIU, Bo YANG, Chun-ye GONG, Xin-biao GAN, Sheng-guo LI, and Han XU declare that they have no conflict of interest.

Project supported by the National Key Research and Development Program of China (No. 2017YFB0202104)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Ys., Chen, Xh., Liu, J. et al. OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype. Front Inform Technol Electron Eng 21, 939–949 (2020). https://doi.org/10.1631/FITEE.1900075

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1900075

Key words

CLC number

Navigation