Abstract
Broadcasting and multicasting are common operations in parallel and distributed programs. Some modern Network Interface Cards (NICs) have programmable processors which can be used to provide support for these operations. However these processors are 5-15 times slower than the host processor. In this paper we propose a design and an implementation of a multi-send primitive to support efficient broadcast/multicast that requires minimal assistance from the NIC. Our scheme is designed with the idea that as much processing as possible should be done by the host processor. This gives us more flexibility with, for example, creating multicast trees which would be optimal for a particular message size, or choosing a multicast tree dynamically based on requirements of bandwidth versus latency for a particular message. We have designed a multi-send primitive and implemented it as an addition to Fast-Messages (FM) 2.1 running over a Myrinet network. The proposed scheme does less processing at the NIC. The impact of adding such NIC-assisted multicast operation to a run-time system is also very small, less than 500ns for non-multi-send packets. To fully utilize the benefits of this primitive, we propose a method for constructing an optimal multicast tree using the new primitive. We have evaluated this scheme and obtained a speedup factor of up to 1.85 for multicasting 16K messages with 16 nodes.
This research is supported in part by an NSF Career award MIP-9502294, NSF Grant CCR-9704512, and Ameritech Faculty Fellowship Award, and grants from the Ohio Board of Regents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Araki, S., Bilas, A., Dubnicki, C., Elder, J., Konishi, K., Philbin, J.: User-Space Communication: A Quantitative Study. In: Proceedings of the 1998 SC 1998 Conference (November 1998)
Bar-Noy, A., Kipnis, S.: Designing Broadcast Algorithms in the Postal Model for Message-Passing Systems. Mathematical Systems Theory 27(5), 431–452 (1994)
Bhoedjang, R.A.F., Ruhl, T., Bal, H.E.: Efficient Multicast On Myrinet Using Link-Level Flow Control. In: International Conference on Parallel Processing, pp. 381–390 (August 1998)
Bruck, J., De Coster, L., Dewulf, N., Ho, C., Lauwereins, R.: On the Design and Implementation of Broadcast and Global Combine Operations using the Postal Model. IEEE Transactions on Parallel and Distributed Systems 7(3), 256–265 (1996)
Boden, N., Cohen, D., Felderman, R., Kulawik, A., Seitz, C., Seizovic, J., Su, W.: Myrinet: A Gigabit-Per-Second Local Area Network. IEEE Micro 15(1), 29–36 (1995)
Kesavan, R., Panda, D.K.: Optimal Multicast with Packetization and Network Interface Support. In: International Conference on Parallel Processing (ICPP 1997), pp. 370–377 (1977)
Lauria, M., Pakin, S., Chien, A.: Efficient Layering for High Speed Communication: Fast Messages 2.x. In: Proceedings of the 7th High Performance Distributed Computing (HPDC7) Conference, July 28-31 (1998)
Message Passing Interface Forum, MPI-2: Extensions to the Message-Passing Interface (July 1997)
Pakin, S., Lauria, M., Chien, A.: High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet. In: Supercomputing 1995 (1995)
Panda, D.K.: Fast Barrier Synchronization in Wormhole k-ary n-cube Networks with Multidestination Worms. In: International Symposium on High Performance Computer Architecture (HPCA 1995), pp. 200–209 (1995)
Panda, D.K., Singal, S., Kesavan, R.: Multidestination Message Passing in Wormhole Kary n-cube Networks with Base Routing Conformed Paths. IEEE Transactions on Parallel and Distributed Systems 10(1), 76–96 (1999)
Sivaram, R., Kesavan, R., Panda, D.K., Stunkel, C.B.: Where to Provide Support for Efficient Multicasting in Irregular Networks: Network Interface or Switch? In: International Conference on Parallel Processing (ICPP 1998), pp. 452–459 (1998)
Stunkel, C.B., Sivaram, R., Panda, D.K.: Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and their Impact. In: International Symposium on Computer Architecture (ISCA 1997), June 2-4, vol. 25(2), pp. 50–61 (1997)
Using the RDTSC Instruction for Performance Monitoring, available from http://developer.intel.com/drg/pentiumII/appnotes/RDTSCPMl.HTM
Verstoep, K., Langendoen, K., Bal, H.E.: Efficient Reliable Multicast on Myrinet. In: 1996 International Conference on Parallel Processing, vol. 3, pp. 156–165 (August 1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buntinas, D., Panda, D.K., Duato, J., Sadayappan, P. (2000). Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages. In: Falsafi, B., Lauria, M. (eds) Network-Based Parallel Computing. Communication, Architecture, and Applications. CANPC 2000. Lecture Notes in Computer Science, vol 1797. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720115_9
Download citation
DOI: https://doi.org/10.1007/10720115_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67879-3
Online ISBN: 978-3-540-44655-2
eBook Packages: Springer Book Archive