Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience

Lauria, Mario; Pakin, Scott; Chien, Andrew

doi:10.1023/A:1019018423211

Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience

Published: September 1999

Volume 2, pages 107–116, (1999)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Mario Lauria¹,
Scott Pakin² &
Andrew Chien³

37 Accesses
5 Citations
Explore all metrics

Abstract

We describe our experience of designing, implementing, and evaluating two generations of high performance communication libraries, Fast Messages (FM) for Myrinet. In FM 1, we designed a simple interface and provided guarantees of reliable and in-order delivery, and flow control. While this was a significant improvement over previous systems, it was not enough. Layering MPI atop FM 1 showed that only about 35% of the FM 1 bandwidth could be delivered to higher level communication APIs. Our second generation communication layer, FM 2, addresses the identified problems, providing gather-scatter, interlayer scheduling, receiver flow control, as well as some convenient API features which simplify programming. FM 2 can deliver 55–95% to higher level APIs such as MPI. This is especially impressive as the absolute bandwidths delivered have increased over fourfold to 90 MB/s. We describe general issues encountered in matching two communication layers, and our solutions as embodied in FM 2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A brief introduction to distributed systems

Article Open access 16 August 2016

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

A survey on the evolution of stream processing systems

Article Open access 22 November 2023

References

T.M. Anderson and R.S. Cornelius, High-performance switching with Fibre Channel, in: Digest of Papers Compcon 1992 (IEEE Computer Society Press, Los Alamitos, CA, 1992) pp. 261-268.
Google Scholar
N.J. Boden, D. Cohen, R.E. Felderman, A.E. Kulawik, C.L. Seitz, J.N. Seizovic and W.-K. Su, Myrinet — a gigabit-per-second local-area network, IEEE Micro 15(1) (February 1995) 29-36. Available from http://www.myri.com/research/publications/Hot.ps.
Article Google Scholar
J.C. Brustoloni and P. Steenkiste, Effects of buffering semantics on I/O performance, in: Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI), Seattle, Washington (October 1996) pp. 277-291. Available from http://www.cs.cmu.edu/afs/cs/user/jcb/papers/osdi96.ps.
CCITT, SG XVIII, Report R34, Draft Recommendation I.150: BISDN ATM functional characteristics (June 1990).
A. Chien, J. Dolby, B. Ganguly, V. Karamcheti and X. Zhang, Supporting high level programming with high performance: The Illinois Concert system, in: Proceedings of the 2nd International Workshop on High-level Parallel Programming Models and Supportive Environments (April 1997) pp. 15-24.
A. Chien, S. Pakin, M. Lauria, M. Buchanan, K. Hane, L. Giannini and J. Prusakova, High performance virtual machines (HPVM): Clusters with supercomputing APIs and performance, in: Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing, Minneapolis, MN (March 1997). Available from http://www-csag.ucsd.edu/papers/hpvm-siam97.ps.
H.-K.J. Chu, Zero-copy TCP in Solaris, in: Proceedings of the USENIX Annual Technical Conference, San Diego, CA (January 1996) pp. 253-264. Available from http://playground.sun.com/~hkchu/zc-usenix.ps.
D.D. Clark, V. Jacobson, J. Romkey and H. Salwen, An analysis of TCP processing overhead, IEEE Communications Magazine 27(6) (June 1989) 23-29.
Article Google Scholar
P. Druschel and L.L. Peterson, Fbufs: A high-bandwidth cross-domain transfer facility, in: Proceedings of the 14th ACM Symposium on Operating Systems Principles (SOSP), Asheville, NC (December 1993) pp. 189-202. ACM SIGOPS, ACM Press. Available from ftp://ftp.cs.arizona.edu/xkernel/Papers/fbuf.ps.
C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis and K. Li, VMMC-2: efficient support for reliable, connection-oriented communication, in: Proceedings of Hot Interconnects V, IEEE (August 1997). Available from http://www.cs.princeton.edu/shrimp/Papers/hotIC97VMMC2.ps.
Fiber-distributed data interface (FDDI) — Token ring media access control (MAC), American National Standard for Information Systems ANSI X3.139-1987, American National Standards Institute (July 1987).
L.A. Giannini and A.A. Chien, A software architecture for global address space communication on clusters: Put/Get on Fast Messages, in: Proceedings of High-Performance Distributed Computing Conference (1998). Available from http://www-csag.ucsd.edu/papers/hpdc7-giannini.ps.
R. Gusella, A measurement study of diskless workstation traffic on Ethernet, IEEE Transactions on Communications 38(9) (September 1990) 1557-1568.
Article Google Scholar
V. Karamcheti and A. Chien, Software overhead in messaging layers: Where does the time go? in: Proceedings of the 6th Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), San Jose, CA, Association for Computing Machinery (October 1994) pp. 51-60. Available from http://www-csag.ucsd.edu/papers/asplos94.ps.
Google Scholar
V. Karamcheti and A.A. Chien, A comparison of architectural support for messaging on the TMC CM-5 and the Cray T3D, in: Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95), Santa Margherita Ligure, Italy (June 1995) pp. 298-307. Available from http://www-csag.ucsd.edu/papers/cm5-t3d-messaging.ps.
Google Scholar
V. Karamcheti, J. Plevyak and A.A. Chien, Runtime mechanisms for efficient dynamic multithreading, Journal of Parallel and Distributed Computing 37(1) (1996) 21-40. Available from http://www-csag.ucsd.edu/papers/rtperf.ps.
Article Google Scholar
J. Kay and J. Pasquale, The importance of non-data touching processing overheads in TCP/IP, in: Proceedings of the ACM Communications Architectures and Protocols Conference (SIGCOMM), San Francisco, CA (September 1993) pp. 259-269. Available from http://www-csl.ucsd.edu/CSL/pubs/conf/sigcomm93.ps.
J. Kay and J. Pasquale, Profiling and reducing processing overheads in TCP/IP, in: IEEE/ACM Transactions on Networking (December 1996). Available from http://www-cse.ucsd.edu/users/pasquale/Papers/profTCP96.ps.
M. Lauria and A. Chien, MPI-FM: High performance MPI on workstation clusters, Journal of Parallel and Distributed Computing 40(1) (January 1997) 4-18. Available from http://www-csag.ucsd.edu/papers/jpdc97-normal.ps.
Article Google Scholar
M. Liu, J. Hsieh, D. Hu, J. Thomas and J. MacDonald, Distributed network computing over Local ATM Networks, in: Supercomputing '94 (1995).
S. Pakin, V. Karamcheti and A.A. Chien, Fast Messages: Efficient, portable communication for workstation clusters and MPPs, IEEE Concurrency 5(2) (April–June 1997) 60-73. Available from http://www-csag.ucsd.edu/papers/fm-pdt.ps.
Article Google Scholar
S. Pakin, M. Lauria and A. Chien, High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet, in: Proceedings of the 1995 ACM/IEEE Supercomputing Conference, Vol. 2, San Diego, CA (December 1995) pp. 1528-1557. Available from http://www-csag.ucsd.edu/papers/myrinet-fm-sc95.ps.
Google Scholar
J. Pasquale, E.W. Anderson and K. Muller, Container Shipping: Operating system support for I/O-intensive applications, IEEE Computer 27(3) (March 1994) 84-93.
Google Scholar
J. Postel, User datagram protocol, RFC 768, Internet Engineering Task Force (August 1980). Available from ftp://ds.internic.net/rfc/rfc768.txt.
J. Postel, Transmission control protocol, RFC 793, Internet Engineering Task Force (September 1981). Available from ftp://ds.internic.net/rfc/rfc793.txt.
L. Prylli and B. Tourancheau, Protocol design for high performance networking: a Myrinet experience, Technical Report N. 97-22, LIP, Ecole Normale Superieure de Lyon (July 1997). Available from http://www-bip.univ-lyon1.fr/.
S. Rodrigues, T. Anderson and D. Culler, High-performance local-area communication using Fast Socket, in: Proceedings of the USENIX 1997 Technical Conference, San Diego, CA (USENIX Association, January 1997). Available from http://now.cs.berkeley.edu/Papers2/.
W.T. Strayer, B.J. Dempsey and A.C. Weaver, XTP: The XPress Tranfer Protocol (Addison-Wesley, Reading, MA, 1992).
Google Scholar
H. Tezuka, A. Hori and Y. Ishikawa, PM: A high-performance communication library for multi-user parallel environments, Technical Report TR-96-015, Tsukuba Research Center, Real World Computing Partnership (November 1996). Available from http://www.rwcp.or.jp/papers/1996/mpsoft/tr96015.ps.gz.
T. von Eicken, D. Culler, S. Goldstein and K. Schauser, Active Messages: a mechanism for integrated communication and computation, in: Proceedings of the International Symposium on Computer Architecture (1992) pp. 256-266.
T. von Eicken, A. Basu, V. Buch and W. Vogels, U-Net: A user-level network interface for parallel and distributed computing, in: Proceedings of the 15th ACM Symposium on Operating Systems Principles (December 1995) pp. 40-53. Available from http://www2.cs.cornell.edu/U-Net/papers/sosp.pdf.
M. Welsh, A. Basu and T. von Eicken, Incorporating memory management into user-level network interfaces, in: Hot Interconnects V, Stanford, CA (August 1997). Available from http://www.cs.cornell.edu/U-Net/papers/hoti97.ps.
K.G. Yocum, J.S. Chase, A.J. Gallatin and A.R. Lebeck, Cutthrough delivery in Trapeze: an exercise in low-latency messaging, in: HPDC-6, Portland, OR (August 1997).

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0114, USA
Mario Lauria
Department of Computer Science, University of Illinois at Urbana-Champaign, 1304 W. Springfield Avenue, Urbana, IL, 61801, USA
Scott Pakin
Department of Computer Science and, Engineering University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0114, USA
Andrew Chien (Science Applications International Corporation Chair Professor)

Authors

Mario Lauria
View author publications
You can also search for this author in PubMed Google Scholar
Scott Pakin
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Chien
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lauria, M., Pakin, S. & Chien, A. Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience. Cluster Computing 2, 107–116 (1999). https://doi.org/10.1023/A:1019018423211

Download citation

Issue Date: September 1999
DOI: https://doi.org/10.1023/A:1019018423211

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience

Abstract

Access this article

Similar content being viewed by others

A brief introduction to distributed systems

Can GPU performance increase faster than the code error rate?

A survey on the evolution of stream processing systems

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience

Abstract

Access this article

Similar content being viewed by others

A brief introduction to distributed systems

Can GPU performance increase faster than the code error rate?

A survey on the evolution of stream processing systems

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation