Skip to main content
Log in

Hardware supported multicast in fat-tree-based InfiniBand networks

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The multicast operation is a very commonly used operation in parallel applications. It can be used to implement many collective communication operations as well. Therefore, its performance will affect parallel applications and collective communication operations. With the hardware supported multicast of the InfiniBand Architecture (IBA), in this paper, we propose a cyclic multicast scheme for fat-tree-based (m-port n-tree) InfiniBand networks. The basic concept of the proposed cyclic multicast scheme is to find the union sets of the output ports of switches in the paths between the source processing node and each destination processing node in a multicast group. Based on the union sets and the path selection scheme, the forwarding table for a given multicast group can be constructed. We implement the proposed multicast scheme along with the OpenSM multicast scheme and the unicast scheme on an m-port n-tree InfiniBand network simulator. Several one-to-many, many-to-many, many-to-all, and all-to-many multicast cases are simulated. The simulation results show that the proposed multicast scheme outperforms the unicast scheme for all simulated cases. For one-to-many case, the performance of the cyclic multicast scheme is the same as that of the OpenSM multicast scheme. For many-to-many and all-to-many cases, the cyclic multicast scheme outperforms the OpenSM multicast scheme. For many-to-all case, the performance of the cyclic multicast scheme is a little better than that of the OpenSM multicast scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Chiang C-M, Ni LM (1995) Deadlock-free multi-head wormhole routing, In: Proceedings of the first high performance computing-Asia, 1995

  2. Dai D, Panda DK (1993) Reducing cache invalidation overheads in wormhole routed DSMs using multidestination message passing. In: Proceedings of the 5th annual ACM symposium on parallel algorithms and architectures, May 1993, pp 2–13

  3. DeMara RF, Moldovan DI (1991) Performance indices for parallel marker-propagation. In: Proceedings of the 1991 international conference on parallel processing, St. Charles, Illinois, August 12–17, 1991, pp 658–659

  4. Duato J, Yalamanchili S, Ni L (1997) Interconnection networks—an engineering approach. IEEE CS Press

  5. Hwang K (1993) Advanced computer architecture—parallelism, scalability, programmability. McGraw-Hill

  6. InfiniBand trade association (October 2004) InfiniBand architecture specification, vol 1. Release 1.2

  7. Kumar V, Singh V (1991) Scalability of parallel algorithms for the all-pairs shortest path problem. Tech. Rep. ACT-OODS-058-90, Rev. 1, MCC

  8. Kumar S, Kale LV (2004) Scaling all-to-all multicast on fat-tree networks. In: International conference on parallel and distributed systems, July 2004, pp 205–214

  9. Leighton FT (1992) Introduction to parallel algorithms and architectures: arrays, trees, hypercubes Morgan Kaufmann Publishers, San Mateo

    MATH  Google Scholar 

  10. Leiserson CE (1985) Fat-Trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput 3410:892–901

    Google Scholar 

  11. Li K, Schaefer R (1989) A hypercube shared virtual memory. In: Proceedings of the 1989 international conference on parallel processing, vol I, August 1989, pp 125–132

  12. Lin XY, Chung YC, Huang TY (2004) A multiple LID routing scheme for fat-tree-based infiniband networks. In: Proceedings of IEEE international parallel and distributed proceeding symposiums, April 2004 (CD-ROM)

  13. Lin X, McKinley PK, Ni LM (1991) Performance evaluation of multicast wormhole routing in 2D-mesh multicomputers. In: Proceedings of the 1991 international conference on parallel proceeding, August 1991, vol I, pp 435–442

  14. Lin X, Ni LM (1993) Multicast communication in multicomputer networks. IEEE Trans Parallel Distrib Syst 4(10):1104–1117

    Article  Google Scholar 

  15. Linux InfiniBand Project. http://infiniband.sourceforge.net

  16. Liu J, Mamidala AR, Panda DK (2004) Fast and scalable MPI-level broadcast using infiniband’s hardware multicast support. In: Proceedings of IEEE international parallel and distributed proceeding symposiums, April 2004 (CD-ROM)

  17. Littlefield RJ (1992) Charaterizing and tuning communications performance for real applications. In: Proceedings of the first intel DELTA applications workshop, February 1992

  18. López P, Flich J, Duato J 2001 Deadlock-Free Routing in InfiniBand through destination renaming. In: Proceedings of the international conference on parallel processing, ICPP ’01, September 2001, pp 427–434

  19. McKinley PK, Xu H, Kalns E, Ni LM (1992) ComPaSS: efficient communication services for scalable architectures. In: Proceedings of supercomputing’ 92, November 1992, pp. 478–487

  20. Petrini F, Vanneschi M (1997) k-ary n-trees: high performance networks for massively parallel architectures. In: Proceedings of the 11th international parallel processing symposium, IPPS’97, April 1997, pp 87–93

  21. Sancho JC, Robles A, Duato J (2001) Effective strategy to compute forwarding tables for InfiniBand networks. In: Proceedings of the international conference on parallel processing, ICPP ’01, September 2001, pp 48–57

  22. Sancho JC, Robles A, Flich J, López P, Duato J (2002) Effective methodology for deadlock-free minimal routing in infiniband networks. In: Proceedings of the international conference on parallel processing ICPP ’02, August 2002, pp 48-57

  23. Sivaram R, Panda DK, Stunkel CB (1996) Efficient broadcast and multicast on multistage interconnection networks using multiport encoding. In: Proceedings of the 8th IEEE symposium on parallel and distributed proceeding, October 1996, pp 36–45

  24. Valerio M, Moser L, Melliar-Smith P (1994) Recursively scalable fat-trees as interconnection networks. In: Proceedings of the 13th IEEE international phoenix conference on computers and communications, April 1994, pp 40–46

  25. Xu H, McKinley PK, Ni LM (1992) Efficient implementation of barrier synchronization in wormhole-routed hypercube multicomputers. J Parallel Distrib Comput 16:172–184

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiazheng Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, J., Lin, XY. & Chung, YC. Hardware supported multicast in fat-tree-based InfiniBand networks. J Supercomput 40, 333–352 (2007). https://doi.org/10.1007/s11227-006-0019-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-006-0019-y

Keywords

Navigation