Skip to main content
Log in

Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

We describe our experience of designing, implementing, and evaluating two generations of high performance communication libraries, Fast Messages (FM) for Myrinet. In FM 1, we designed a simple interface and provided guarantees of reliable and in-order delivery, and flow control. While this was a significant improvement over previous systems, it was not enough. Layering MPI atop FM 1 showed that only about 35% of the FM 1 bandwidth could be delivered to higher level communication APIs. Our second generation communication layer, FM 2, addresses the identified problems, providing gather-scatter, interlayer scheduling, receiver flow control, as well as some convenient API features which simplify programming. FM 2 can deliver 55–95% to higher level APIs such as MPI. This is especially impressive as the absolute bandwidths delivered have increased over fourfold to 90 MB/s. We describe general issues encountered in matching two communication layers, and our solutions as embodied in FM 2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. T.M. Anderson and R.S. Cornelius, High-performance switching with Fibre Channel, in: Digest of Papers Compcon 1992 (IEEE Computer Society Press, Los Alamitos, CA, 1992) pp. 261-268.

    Google Scholar 

  2. N.J. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovic and W.-K. Su, Myrinet — a gigabit-per-second local-area network, IEEE Micro 15(1) (February 1995) 29-36. Available from http://www.myri.com/research/publications/Hot.ps.

    Article  Google Scholar 

  3. J.C. Brustoloni and P. Steenkiste, Effects of buffering semantics on I/O performance, in: Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI), Seattle, Washington (October 1996) pp. 277-291. Available from http://www.cs.cmu.edu/afs/cs/user/jcb/papers/osdi96.ps.

  4. CCITT, SG XVIII, Report R34, Draft Recommendation I.150: BISDN ATM functional characteristics (June 1990).

  5. A. Chien, J. Dolby, B. Ganguly, V. Karamcheti and X. Zhang, Supporting high level programming with high performance: The Illinois Concert system, in: Proceedings of the 2nd International Workshop on High-level Parallel Programming Models and Supportive Environments (April 1997) pp. 15-24.

  6. A. Chien, S. Pakin, M. Lauria, M. Buchanan, K. Hane, L. Giannini and J. Prusakova, High performance virtual machines (HPVM): Clusters with supercomputing APIs and performance, in: Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing, Minneapolis, MN (March 1997). Available from http://www-csag.ucsd.edu/papers/hpvm-siam97.ps.

  7. H.-K.J. Chu, Zero-copy TCP in Solaris, in: Proceedings of the USENIX Annual Technical Conference, San Diego, CA (January 1996) pp. 253-264. Available from http://playground.sun.com/~hkchu/zc-usenix.ps.

  8. D.D. Clark, V. Jacobson, J. Romkey and H. Salwen, An analysis of TCP processing overhead, IEEE Communications Magazine 27(6) (June 1989) 23-29.

    Article  Google Scholar 

  9. P. Druschel and L.L. Peterson, Fbufs: A high-bandwidth cross-domain transfer facility, in: Proceedings of the 14th ACM Symposium on Operating Systems Principles (SOSP), Asheville, NC (December 1993) pp. 189-202. ACM SIGOPS, ACM Press. Available from ftp://ftp.cs.arizona.edu/xkernel/Papers/fbuf.ps.

  10. C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis and K. Li, VMMC-2: efficient support for reliable, connection-oriented communication, in: Proceedings of Hot Interconnects V, IEEE (August 1997). Available from http://www.cs.princeton.edu/shrimp/Papers/hotIC97VMMC2.ps.

  11. Fiber-distributed data interface (FDDI) — Token ring media access control (MAC), American National Standard for Information Systems ANSI X3.139-1987, American National Standards Institute (July 1987).

  12. L.A. Giannini and A.A. Chien, A software architecture for global address space communication on clusters: Put/Get on Fast Messages, in: Proceedings of High-Performance Distributed Computing Conference (1998). Available from http://www-csag.ucsd.edu/papers/hpdc7-giannini.ps.

  13. R. Gusella, A measurement study of diskless workstation traffic on Ethernet, IEEE Transactions on Communications 38(9) (September 1990) 1557-1568.

    Article  Google Scholar 

  14. V. Karamcheti and A. Chien, Software overhead in messaging layers: Where does the time go? in: Proceedings of the 6th Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), San Jose, CA, Association for Computing Machinery (October 1994) pp. 51-60. Available from http://www-csag.ucsd.edu/papers/asplos94.ps.

    Google Scholar 

  15. V. Karamcheti and A.A. Chien, A comparison of architectural support for messaging on the TMC CM-5 and the Cray T3D, in: Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95), Santa Margherita Ligure, Italy (June 1995) pp. 298-307. Available from http://www-csag.ucsd.edu/papers/cm5-t3d-messaging.ps.

    Google Scholar 

  16. V. Karamcheti, J. Plevyak and A.A. Chien, Runtime mechanisms for efficient dynamic multithreading, Journal of Parallel and Distributed Computing 37(1) (1996) 21-40. Available from http://www-csag.ucsd.edu/papers/rtperf.ps.

    Article  Google Scholar 

  17. J. Kay and J. Pasquale, The importance of non-data touching processing overheads in TCP/IP, in: Proceedings of the ACM Communications Architectures and Protocols Conference (SIGCOMM), San Francisco, CA (September 1993) pp. 259-269. Available from http://www-csl.ucsd.edu/CSL/pubs/conf/sigcomm93.ps.

  18. J. Kay and J. Pasquale, Profiling and reducing processing overheads in TCP/IP, in: IEEE/ACM Transactions on Networking (December 1996). Available from http://www-cse.ucsd.edu/users/pasquale/Papers/profTCP96.ps.

  19. M. Lauria and A. Chien, MPI-FM: High performance MPI on workstation clusters, Journal of Parallel and Distributed Computing 40(1) (January 1997) 4-18. Available from http://www-csag.ucsd.edu/papers/jpdc97-normal.ps.

    Article  Google Scholar 

  20. M. Liu, J. Hsieh, D. Hu, J. Thomas and J. MacDonald, Distributed network computing over Local ATM Networks, in: Supercomputing '94 (1995).

  21. S. Pakin, V. Karamcheti and A.A. Chien, Fast Messages: Efficient, portable communication for workstation clusters and MPPs, IEEE Concurrency 5(2) (April–June 1997) 60-73. Available from http://www-csag.ucsd.edu/papers/fm-pdt.ps.

    Article  Google Scholar 

  22. S. Pakin, M. Lauria and A. Chien, High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet, in: Proceedings of the 1995 ACM/IEEE Supercomputing Conference, Vol. 2, San Diego, CA (December 1995) pp. 1528-1557. Available from http://www-csag.ucsd.edu/papers/myrinet-fm-sc95.ps.

    Google Scholar 

  23. J. Pasquale, E.W. Anderson and K. Muller, Container Shipping: Operating system support for I/O-intensive applications, IEEE Computer 27(3) (March 1994) 84-93.

    Google Scholar 

  24. J. Postel, User datagram protocol, RFC 768, Internet Engineering Task Force (August 1980). Available from ftp://ds.internic.net/rfc/rfc768.txt.

  25. J. Postel, Transmission control protocol, RFC 793, Internet Engineering Task Force (September 1981). Available from ftp://ds.internic.net/rfc/rfc793.txt.

  26. L. Prylli and B. Tourancheau, Protocol design for high performance networking: a Myrinet experience, Technical Report N. 97-22, LIP, Ecole Normale Superieure de Lyon (July 1997). Available from http://www-bip.univ-lyon1.fr/.

  27. S. Rodrigues, T. Anderson and D. Culler, High-performance local-area communication using Fast Socket, in: Proceedings of the USENIX 1997 Technical Conference, San Diego, CA (USENIX Association, January 1997). Available from http://now.cs.berkeley.edu/Papers2/.

  28. W.T. Strayer, B.J. Dempsey and A.C. Weaver, XTP: The XPress Tranfer Protocol (Addison-Wesley, Reading, MA, 1992).

    Google Scholar 

  29. H. Tezuka, A. Hori and Y. Ishikawa, PM: A high-performance communication library for multi-user parallel environments, Technical Report TR-96-015, Tsukuba Research Center, Real World Computing Partnership (November 1996). Available from http://www.rwcp.or.jp/papers/1996/mpsoft/tr96015.ps.gz.

  30. T. von Eicken, D. Culler, S. Goldstein and K. Schauser, Active Messages: a mechanism for integrated communication and computation, in: Proceedings of the International Symposium on Computer Architecture (1992) pp. 256-266.

  31. T. von Eicken, A. Basu, V. Buch and W. Vogels, U-Net: A user-level network interface for parallel and distributed computing, in: Proceedings of the 15th ACM Symposium on Operating Systems Principles (December 1995) pp. 40-53. Available from http://www2.cs.cornell.edu/U-Net/papers/sosp.pdf.

  32. M. Welsh, A. Basu and T. von Eicken, Incorporating memory management into user-level network interfaces, in: Hot Interconnects V, Stanford, CA (August 1997). Available from http://www.cs.cornell.edu/U-Net/papers/hoti97.ps.

  33. K.G. Yocum, J.S. Chase, A.J. Gallatin and A.R. Lebeck, Cutthrough delivery in Trapeze: an exercise in low-latency messaging, in: HPDC-6, Portland, OR (August 1997).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lauria, M., Pakin, S. & Chien, A. Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience. Cluster Computing 2, 107–116 (1999). https://doi.org/10.1023/A:1019018423211

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1019018423211

Keywords

Navigation