Abstract
We describe our experience of designing, implementing, and evaluating two generations of high performance communication libraries, Fast Messages (FM) for Myrinet. In FM 1, we designed a simple interface and provided guarantees of reliable and in-order delivery, and flow control. While this was a significant improvement over previous systems, it was not enough. Layering MPI atop FM 1 showed that only about 35% of the FM 1 bandwidth could be delivered to higher level communication APIs. Our second generation communication layer, FM 2, addresses the identified problems, providing gather-scatter, interlayer scheduling, receiver flow control, as well as some convenient API features which simplify programming. FM 2 can deliver 55–95% to higher level APIs such as MPI. This is especially impressive as the absolute bandwidths delivered have increased over fourfold to 90 MB/s. We describe general issues encountered in matching two communication layers, and our solutions as embodied in FM 2.
Similar content being viewed by others
References
T.M. Anderson and R.S. Cornelius, High-performance switching with Fibre Channel, in: Digest of Papers Compcon 1992 (IEEE Computer Society Press, Los Alamitos, CA, 1992) pp. 261-268.
N.J. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovic and W.-K. Su, Myrinet — a gigabit-per-second local-area network, IEEE Micro 15(1) (February 1995) 29-36. Available from http://www.myri.com/research/publications/Hot.ps.
J.C. Brustoloni and P. Steenkiste, Effects of buffering semantics on I/O performance, in: Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI), Seattle, Washington (October 1996) pp. 277-291. Available from http://www.cs.cmu.edu/afs/cs/user/jcb/papers/osdi96.ps.
CCITT, SG XVIII, Report R34, Draft Recommendation I.150: BISDN ATM functional characteristics (June 1990).
A. Chien, J. Dolby, B. Ganguly, V. Karamcheti and X. Zhang, Supporting high level programming with high performance: The Illinois Concert system, in: Proceedings of the 2nd International Workshop on High-level Parallel Programming Models and Supportive Environments (April 1997) pp. 15-24.
A. Chien, S. Pakin, M. Lauria, M. Buchanan, K. Hane, L. Giannini and J. Prusakova, High performance virtual machines (HPVM): Clusters with supercomputing APIs and performance, in: Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing, Minneapolis, MN (March 1997). Available from http://www-csag.ucsd.edu/papers/hpvm-siam97.ps.
H.-K.J. Chu, Zero-copy TCP in Solaris, in: Proceedings of the USENIX Annual Technical Conference, San Diego, CA (January 1996) pp. 253-264. Available from http://playground.sun.com/~hkchu/zc-usenix.ps.
D.D. Clark, V. Jacobson, J. Romkey and H. Salwen, An analysis of TCP processing overhead, IEEE Communications Magazine 27(6) (June 1989) 23-29.
P. Druschel and L.L. Peterson, Fbufs: A high-bandwidth cross-domain transfer facility, in: Proceedings of the 14th ACM Symposium on Operating Systems Principles (SOSP), Asheville, NC (December 1993) pp. 189-202. ACM SIGOPS, ACM Press. Available from ftp://ftp.cs.arizona.edu/xkernel/Papers/fbuf.ps.
C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis and K. Li, VMMC-2: efficient support for reliable, connection-oriented communication, in: Proceedings of Hot Interconnects V, IEEE (August 1997). Available from http://www.cs.princeton.edu/shrimp/Papers/hotIC97VMMC2.ps.
Fiber-distributed data interface (FDDI) — Token ring media access control (MAC), American National Standard for Information Systems ANSI X3.139-1987, American National Standards Institute (July 1987).
L.A. Giannini and A.A. Chien, A software architecture for global address space communication on clusters: Put/Get on Fast Messages, in: Proceedings of High-Performance Distributed Computing Conference (1998). Available from http://www-csag.ucsd.edu/papers/hpdc7-giannini.ps.
R. Gusella, A measurement study of diskless workstation traffic on Ethernet, IEEE Transactions on Communications 38(9) (September 1990) 1557-1568.
V. Karamcheti and A. Chien, Software overhead in messaging layers: Where does the time go? in: Proceedings of the 6th Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), San Jose, CA, Association for Computing Machinery (October 1994) pp. 51-60. Available from http://www-csag.ucsd.edu/papers/asplos94.ps.
V. Karamcheti and A.A. Chien, A comparison of architectural support for messaging on the TMC CM-5 and the Cray T3D, in: Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95), Santa Margherita Ligure, Italy (June 1995) pp. 298-307. Available from http://www-csag.ucsd.edu/papers/cm5-t3d-messaging.ps.
V. Karamcheti, J. Plevyak and A.A. Chien, Runtime mechanisms for efficient dynamic multithreading, Journal of Parallel and Distributed Computing 37(1) (1996) 21-40. Available from http://www-csag.ucsd.edu/papers/rtperf.ps.
J. Kay and J. Pasquale, The importance of non-data touching processing overheads in TCP/IP, in: Proceedings of the ACM Communications Architectures and Protocols Conference (SIGCOMM), San Francisco, CA (September 1993) pp. 259-269. Available from http://www-csl.ucsd.edu/CSL/pubs/conf/sigcomm93.ps.
J. Kay and J. Pasquale, Profiling and reducing processing overheads in TCP/IP, in: IEEE/ACM Transactions on Networking (December 1996). Available from http://www-cse.ucsd.edu/users/pasquale/Papers/profTCP96.ps.
M. Lauria and A. Chien, MPI-FM: High performance MPI on workstation clusters, Journal of Parallel and Distributed Computing 40(1) (January 1997) 4-18. Available from http://www-csag.ucsd.edu/papers/jpdc97-normal.ps.
M. Liu, J. Hsieh, D. Hu, J. Thomas and J. MacDonald, Distributed network computing over Local ATM Networks, in: Supercomputing '94 (1995).
S. Pakin, V. Karamcheti and A.A. Chien, Fast Messages: Efficient, portable communication for workstation clusters and MPPs, IEEE Concurrency 5(2) (April–June 1997) 60-73. Available from http://www-csag.ucsd.edu/papers/fm-pdt.ps.
S. Pakin, M. Lauria and A. Chien, High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet, in: Proceedings of the 1995 ACM/IEEE Supercomputing Conference, Vol. 2, San Diego, CA (December 1995) pp. 1528-1557. Available from http://www-csag.ucsd.edu/papers/myrinet-fm-sc95.ps.
J. Pasquale, E.W. Anderson and K. Muller, Container Shipping: Operating system support for I/O-intensive applications, IEEE Computer 27(3) (March 1994) 84-93.
J. Postel, User datagram protocol, RFC 768, Internet Engineering Task Force (August 1980). Available from ftp://ds.internic.net/rfc/rfc768.txt.
J. Postel, Transmission control protocol, RFC 793, Internet Engineering Task Force (September 1981). Available from ftp://ds.internic.net/rfc/rfc793.txt.
L. Prylli and B. Tourancheau, Protocol design for high performance networking: a Myrinet experience, Technical Report N. 97-22, LIP, Ecole Normale Superieure de Lyon (July 1997). Available from http://www-bip.univ-lyon1.fr/.
S. Rodrigues, T. Anderson and D. Culler, High-performance local-area communication using Fast Socket, in: Proceedings of the USENIX 1997 Technical Conference, San Diego, CA (USENIX Association, January 1997). Available from http://now.cs.berkeley.edu/Papers2/.
W.T. Strayer, B.J. Dempsey and A.C. Weaver, XTP: The XPress Tranfer Protocol (Addison-Wesley, Reading, MA, 1992).
H. Tezuka, A. Hori and Y. Ishikawa, PM: A high-performance communication library for multi-user parallel environments, Technical Report TR-96-015, Tsukuba Research Center, Real World Computing Partnership (November 1996). Available from http://www.rwcp.or.jp/papers/1996/mpsoft/tr96015.ps.gz.
T. von Eicken, D. Culler, S. Goldstein and K. Schauser, Active Messages: a mechanism for integrated communication and computation, in: Proceedings of the International Symposium on Computer Architecture (1992) pp. 256-266.
T. von Eicken, A. Basu, V. Buch and W. Vogels, U-Net: A user-level network interface for parallel and distributed computing, in: Proceedings of the 15th ACM Symposium on Operating Systems Principles (December 1995) pp. 40-53. Available from http://www2.cs.cornell.edu/U-Net/papers/sosp.pdf.
M. Welsh, A. Basu and T. von Eicken, Incorporating memory management into user-level network interfaces, in: Hot Interconnects V, Stanford, CA (August 1997). Available from http://www.cs.cornell.edu/U-Net/papers/hoti97.ps.
K.G. Yocum, J.S. Chase, A.J. Gallatin and A.R. Lebeck, Cutthrough delivery in Trapeze: an exercise in low-latency messaging, in: HPDC-6, Portland, OR (August 1997).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lauria, M., Pakin, S. & Chien, A. Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience. Cluster Computing 2, 107–116 (1999). https://doi.org/10.1023/A:1019018423211
Issue Date:
DOI: https://doi.org/10.1023/A:1019018423211