Skip to main content
Log in

Efficient and Scalable Routing Algorithms for Collective Communication Operations on 2D All-Port Torus Networks

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Collective Communication Algorithms for 2D torus networks have been investigated quite extensively in the literature and two broad approaches, namely direct methods and indirect (message combining) methods are recognized in the field. While direct methods minimize the volume of data, the indirect methods reduce the number of message start-ups. Consequently, either a suite of algorithms must be employed for efficiency over a wide range of message lengths and communication operations or algorithms should be able to adapt themselves to the current case, possibly by switching between direct and indirect routing modes as appropriate. In this paper, we propose adaptive routing algorithms for all-port, wormhole routed, synchronous, 2D torus networks optimized for one-to-all broadcast,  gossiping and complete exchange collective communication operations. The proposed algorithms employ completely-connected subnetworks where complete exchange amongst the nodes in the subnetwork can be accomplished in one step only. Combined with suitable 2D plane tiling techniques, the proposed algorithms share the same set of primitive operations and yield superior performance compared to previously proposed methods, either pure or hybridized.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bermond J.C.: Broadcasting and NP-completeness. Graph Theory Notes of New York XXII, 8–14 (1992)

    Google Scholar 

  2. Chan E., Heimlich M., Purkayastha A., van de Geijn R.: Collective communication: theory, practice, and experience. Concurr. Comput. Pract. Exp. 19, 1749–1783 (2007)

    Article  Google Scholar 

  3. Edmonds J.: Edge-Disjoint Branchings, Combinatorial Algorithms, pp. 91–96. Algorithmics Press, New York (1972)

    Google Scholar 

  4. Faraj, A., Kumar, S., Smith, B., Mamidala, A., Gunnels, J.: MPI collective communications on the blue gene/P supercomputer: algorithms and optimizations. In: 17th IEEE Symposium on High Performance Interconnects, 2009. HOTI 2009., pp. 63–72, 25–27 (Aug. 2009)

  5. Grama A., Gupta A., Karypis G., Kumar V.: Introduction to Parallel Computing (Chap. 2) 2nd edn. Addison Wesley, Redwood City (2003)

    Google Scholar 

  6. Hwang K.: Advanced Computer Architecture: Parallelism, Scalability, Programabilitiy. McGraw Hill International Editions, New York (1993)

    Google Scholar 

  7. Jain, N., Sabharwal, Y.: Optimal bucket algorithms for large MPI collectives on torus interconnects. In: Proceedings of the 24th ACM International Conference on Supercomputing (ICS ‘10). ACM, New York, NY, USA, 27–36 (2010)

  8. Kumar, S., Sabharwal, Y., Garg, R., Heidelberger, P.: Optimization of all-to-all communication on the blue gene/L supercomputer. In: 37th International Conference on Parallel Processing, 2008. ICPP ‘08., pp. 320–329, 9–12 (Sept. 2008)

  9. Liu, G., Gu, N., Ren, K., Tao, Y.: Optimal all-to-all personalized communication in all-port tori. Computer and Computational Sciences, 2006. IMSCCS ‘06. First International Multi-Symposiums on, vol. 1, no., pp. 369–376, 20–24 (June 2006)

  10. Mamadou, H.N., Nanri, T., Murakami, K.: A robust dynamic optimization for MPI All-to-all operation. In: IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–15, 23–29 (May 2009)

  11. Michallon, P.: Schémas de communications globales dans les réseaux de processeurs: application à la grille torique. Ph.D. Thesis, INPG Grenoble, France (1994)

  12. Michallon P., Trystram D.: Minimum depth arcs-disjoint spanning trees for broadcasting on wrap-around meshes. Proc. Int Conf. Parallel Process. 1, 80–83 (1995)

    Google Scholar 

  13. Peters, J.G., Rapine, C., Trystam, D.: Small Depth Arc-Disjoint Spanning Trees in Two-Dimensional Toroidal Meshes. Technical Report SFU-CMPTR-TR 2002-10, School of Computing Science, Simon Fraser University, (November 2002)

  14. Peters J.G., Syska M.: Circuit-switched broadcasting in torus networks. IEEE Trans. Parallel Distrib. Syst. 7(3), 246–255 (1996)

    Article  Google Scholar 

  15. Soch M., Tvrdík P.: Time-optimal gossip of large packets in noncombining 2D tori and meshes. IEEE Trans. Parallel Distrib. Syst. 10(12), 1252–1261 (1999)

    Article  Google Scholar 

  16. Suh Y., Shin K.G.: All-to-all personalized communication in multidimensional torus and mesh networks. IEEE Trans. Parallel Distrib. Syst. 12(1), 38–59 (2001)

    Article  Google Scholar 

  17. Suh Y., Yalamanchili S.: All-to-all communication with minimum start-up costs in 2D/3D tori and meshes. IEEE Trans. Parallel Distrib. Syst. 9(5), 442–458 (1998)

    Article  Google Scholar 

  18. Sundar N.S., Jayasimha D.N., Panda D.K., Sadayappan P.: Hybrid algorithms for complete exchange in 2D meshes. IEEE Trans. Parallel Distrib. Syst. 12(12), 1201–1218 (2001)

    Article  Google Scholar 

  19. Tsai, Y., McKinley, P.K.: Broadcast in all-port wormhole-routed 3D mesh networks using extended dominating sets. In: Ni, L.M. (Ed.) Proceedings of the 1994 International Conference on Parallel and Distributed Systems, IEEE Computer Society, Washington, DC, pp. 120–127 (December 19–21, 1994)

  20. Tseng Y.: A dilated-diagonal-based scheme for broadcast in a wormhole-routed 2D torus. IEEE Trans. Comput. 46(8), 947–952 (1997)

    Article  MathSciNet  Google Scholar 

  21. Tseng Y., Lin T., Panda D.K., Gupta S.K.: Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks: a diagonal-propagation approach. IEEE Trans. Parallel Distrib. Syst. 8(4), 380–396 (1997)

    Article  Google Scholar 

  22. Tseng Y., Ni S., Sheu J.: Toward optimal complete exchange on wormhole-routed tori. IEEE Trans. Comput. 48(10), 1065–1082 (1999)

    Article  MathSciNet  Google Scholar 

  23. Tseng Y., Wang S., Ho C.: Efficient broadcasting in wormhole-routed multicomputers: a network-partitioning approach. IEEE Trans. Parallel Distrib. Syst. 10(1), 44–61 (1999)

    Article  Google Scholar 

  24. Watts J., Geijn R.: A pipelined broadcast for multidimensional meshes. Parallel Process. Lett. 5(2), 281–292 (1995)

    Article  Google Scholar 

  25. Yang Y., Wang J.: Near-optimal all-to-all broadcast in multidimensional all-port meshes and tori. IEEE Trans. Parallel Distrib. Syst. 13(2), 128–141 (2002)

    Article  Google Scholar 

  26. Zhuang X., Liberatore V.: A recursion-based broadcast paradigm in wormhole routed networks. IEEE Trans. Parallel Distrib. Syst. 16(11), 1034–1052 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kayhan M. İmre.

Rights and permissions

Reprints and permissions

About this article

Cite this article

İmre, K.M., Baransel, C. & Artuner, H. Efficient and Scalable Routing Algorithms for Collective Communication Operations on 2D All-Port Torus Networks. Int J Parallel Prog 39, 746–782 (2011). https://doi.org/10.1007/s10766-011-0169-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-011-0169-2

Keywords

Navigation