Skip to main content
Log in

Hierarchical task mapping for parallel applications on supercomputers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

As the scale of supercomputers grows, so does the size of the interconnect network. Topology-aware task mapping, which maps parallel application processes onto processors to reduce communication cost, becomes increasingly important. Previous works mainly focus on the task mapping between compute nodes (i.e., inter-node mapping), while ignoring the mapping within a node (i.e., intra-node mapping). In this paper, we propose a hierarchical task mapping strategy, which performs both inter-node and intra-node mapping. We consider supercomputers with popular fat-tree and torus network topologies, and introduce two mapping algorithms: (1) a generic recursive tree mapping algorithm, which can handle both inter-node mapping and intra-node mapping; (2) a recursive bipartitioning mapping algorithm for torus topology, which efficiently partitions the compute nodes according to their coordinates. Moreover, a hierarchical task mapping library is developed. Experimental results show that the proposed approach significantly improves the communication performance by up to 77 % with low runtime overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. We use “task mapping” and “topology mapping” interchangeably.

  2. Note that other graph partitioning algorithms may have different time complexities.

  3. On Blue Gene/P systems, if the number of allocated nodes is less than \(512\), then their network topology is a mesh.

  4. LibTopMap failed to derive mapping solutions when the number of processes is larger than 1024.

References

  1. LibTopoMap: A Generic Topology Mapping Library. http://www.unixer.de/research/mpitopo/libtopomap/

  2. MPI: A message-passing interface standard. version 3.0. http://www.mpi-forum.org/

  3. ALCF Intrepid. https://www.alcf.anl.gov/intrepid

  4. METIS: Graph Partitioning Tool. http://glaros.dtc.umn.edu/gkhome/views/metis

  5. NICS Kraken User Guide. https://www.xsede.org/web/guest/nics-kraken

  6. TACC Stampede User Guide. http://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide

  7. Top 500 Supercomputer Sites. http://www.top500.org/

  8. TOPOMap. http://bluesky.cs.iit.edu/topomap/

  9. Abts D (2011) The Cray XT4 and Seastar 3-D Torus interconnect. Encycl Parallel Comput 4:470–477.

  10. Agarwal T, Sharma A, Laxmikant A, Kale LV (2006) Topology-aware task mapping for reducing communication contention on large parallel machines. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS)

  11. Aleliunas R, Rosenberg AL (1982) On embedding rectangular grids in square grids. IEEE Trans Comput C-31(9):907–913. doi:10.1109/TC.1982.1676109

  12. Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Int J Eurogr Assoc (Computer Graphics Forum) 8(1):3–12

    Article  Google Scholar 

  13. Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium, pp 349–357.

  14. Arimilli B, Arimilli R, Chung V, Clark S, Denzel W, Drerup B, Hoefler T, Joyner J, Lewis J, Li J, Ni N, Rajamony R (2010) The PERCS high-performance interconnect. In: Proceedings of the 18th IEEE symposium on high performance interconnects, pp 75–82

  15. Berman F, Snyder L (1987) On mapping parallel algorithms into parallel architectures. J Parallel Distrib Comput 4(5):439–458

    Article  Google Scholar 

  16. Bhatele A (2010) Automating topology aware mapping for supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, Urbana.

  17. Bhatele A, Kale LV (2008) Application-specific topology-aware mapping for three dimensional topologies. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 1–8

  18. Bokhari SH (1981) On the mapping problem. IEEE Trans Comput 30(3):207–214

    Article  MathSciNet  Google Scholar 

  19. Broquedis F, Clet-Ortega J, Moreaud S, Furmento N, Goglin B, Mercier G, Thibault S, Namyst R (2010) hwloc: a generic framework for managing hardware affinities in hpc applications. In: Proceedings of the 18th euromicro international conference on parallel, distributed and network-based processing (PDP), pp 180–186. doi:10.1109/PDP.2010.67

  20. Chockalingam T, Arunkumar S (1992) A randomized heuristics for the mapping problem: the genetic approach. Parallel Comput 18(10):1157–1165

    Article  MATH  Google Scholar 

  21. Chung IH, Lee CR, Zhou J, Chung YC (2011) Hierarchical mapping for HPC applications. In: Proceedings of IEEE international symposium on parallel and distributed processing workshops and Phd forum (IPDPSW), pp 1815–1823

  22. Cuthill E, McKee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 24th National Conference, pp 157–172.

  23. Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25

    MathSciNet  Google Scholar 

  24. Deveci M, Rajamanickam S, Leung VJ, Pedretti K, Olivier SL, Bunde DP, Çatalyürek UV, Devine K (2014) Exploiting geometric partitioning in task mapping for parallel computers. In: Proceedings of the 2014 IEEE 28th international parallel and distributed processing symposium, pp 27–36

  25. Ercal F, Ramanujam J, Sadayappan P (1988) Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proceedings of the third conference on hypercube concurrent computers and applications: architecture, software, computer systems, and general issues, vol 1, no C3P, pp 210–221

  26. Hoefler T, Snir M (2011) Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of the international conference on supercomputing (ICS), pp 75–84

  27. Jeannot E, Mercier G (2010) Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: Proceedings of the 16th International euro-par conference on parallel processing: part II, pp 199–210

  28. Karypis G, Kumar V (1998) Multilevel k-way partitioning scheme for irregular graphs. J Parallel Distrib Comput 48(1):96–129

    Article  MathSciNet  Google Scholar 

  29. Kravtsov AV, Klypin AA, Khokhlov AM (1997) Adaptive refinement tree: a new high-resolution N-body code for cosmological simulations. Astrophys J Suppl Ser 111:73–94

    Article  Google Scholar 

  30. Lee C, Bic L (1989) On the mapping problem using simulated annealing. In: Proceedings of international phoenix conference on computers and communications, pp 40–44. doi:10.1109/PCCC.1989.37357

  31. Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput 34(10):892–901

    Article  Google Scholar 

  32. Mercier G, Jeannot E (2011) Improving MPI applications performance on multicore clusters with rank reordering. In: Proceedings of the 18th European MPI users’ group conference on recent advances in the message passing interface, pp 39–49

  33. Pellegrini F (1994) Static mapping by dual recursive bipartitioning of process architecture graphs. In: Proceedings of the scalable high-performance computing conference, pp 486–493

  34. pellegrini F, Roman J (1996) Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. High-performance computing and networking. Lecture notes in computer science, vol 1067, pp 493–498

  35. Plewa T, Linde T, Weirs VG (2005) Adaptive mesh refinement: theory and applications. Springer, Berlin

    Book  Google Scholar 

  36. Rashti MJ, Green J, Balaji P, Afsahi A, Gropp W (2011) Multi-core and network aware MPI topology functions. In: Proceedings of the 18th European MPI users’ group conference on recent advances in the message passing interface, pp 50–60

  37. Salman A, Ahmad I, Al-Madani S (2002) Particle swarm optimization for task assignment problem. Microprocess Microsyst 26(8):363–371

    Article  Google Scholar 

  38. Subramoni H, Potluri S, Kandalla K, Barth B, Vienne J, Keasler J, Tomko K, Schulz K, Moody A, Panda D (2012) Design of a scalable infiniband topology service to enable network-topology-aware placement of processes. In: Proceedings of international conference on high performance computing, networking, storage and analysis, pp 1–12. doi:10.1109/SC.2012.47

  39. Tang W, Lan Z, Desai N, Buettner D, Yu Y (2011) Reducing fragmentation on torus-connected supercomputers. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 828–839

  40. Tr\(\ddot{a}\)ff JL (2002) Implementing the MPI process topology mechanism. In: Proceedings of ACM/IEEE conference on supercomputing, pp 28:1–28:14

  41. Wu J, Gonzalez RE, Lan Z, Gnedin NY, Kravtsov AV, Rudd DH, Yu Y (2011) Performance emulation of cell-based AMR cosmology simulations. In: Proceedings of IEEE International conference on cluster computing (CLUSTER), pp 8–16

  42. Wu J, Lan Z, Xiong X, Gnedin NY, Kravtsov AV (2012) Hierarchical task mapping of cell-based AMR cosmology simulations. In: Proceedings of international conference on high performance computing, networking, storage and analysis, SC ’12, pp 75:1–75:10

  43. Yu H, Chung IH, Moreira J (2006) Topology mapping for Blue Gene/L supercomputer. In: Proceedings of ACM/IEEE conference on supercomputing, p 52

  44. Yu Y, Rudd DH, Lan Z, Gnedin NY, Kravtsov AV, Wu J (2012) Improving parallel IO performance of cell-based AMR cosmology applications. In: Proceedings of IEEE international symposium on parallel and distributed processing (IPDPS), pp 933–944

  45. Zou H, Yu Y, Tang W, Chen HWM (2014) Flexanalytics: a flexible data analytics framework for big data applications with I/O performance improvement. Big Data Res 1(0): 4–13 (Special Issue on Scalable Computing for Big Data)

Download references

Acknowledgments

The work at Illinois Institute of Technology is supported in part by US National Science Foundation grant CNS-1320125. This work is also supported in part by the National Natural Science Foundation of China grant 61402083. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. URL: http://www.tacc.utexas.edu This material is based upon work supported by the National Science Foundation under Grant numbers 0711134, 0933959, 1041709, and 1041710 and the University of Tennessee through the use of the Kraken computing resource at the National Institute for Computational Sciences (http://www.nics.tennessee.edu). This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1053575.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingjin Wu.

Additional information

The majority of this work was done when Jingjin and Xuanxing were Ph.D. students at Illinois Institute of Technology.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, J., Xiong, X. & Lan, Z. Hierarchical task mapping for parallel applications on supercomputers. J Supercomput 71, 1776–1802 (2015). https://doi.org/10.1007/s11227-014-1324-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1324-5

Keywords

Navigation