Skip to main content

Efficient communication mechanisms for cluster based parallel computing

  • Conference paper
  • First Online:
Book cover Communication and Architectural Support for Network-Based Parallel Computing (CANPC 1997)

Abstract

The key to crafting an effective scalable parallel computing system lies in minimizing the delays imposed by the system. Of particular importance are communications delays, since parallel algorithms must communicate frequently. The communication delay is a system-imposed latency. The existence of relatively inexpensive high performance workstations and emerging high performance interconnect options provide compelling economic motivation to investigate NOW/COW (network/cluster of workstation) architectures. However, these commercial components have been designed for generality. Cluster nodes are connected by longer physical wire paths than found in special-purpose supercomputer systems. Both effects tend to impose intractable latencies on communication. Even larger system-imposed delays result from the overhead of sending and receiving messages. This overhead can come in several forms, including CPU occupancy by protocol and device code as well as interference with CPU access to various levels of the memory hierarchy. Access contention becomes even more onerous when the nodes in the system are themselves symmetric multiprocessors. Additional delays are incurred if the communication mechanism requires processes to run concurrently in order to communicate with acceptable efficiency. This paper presents the approach taken by the Utah Avalanche project which spans user level code, operating system support, and network interface hardware. The result minimizes the constraining effects of latency, overhead, and loosely coupled scheduling that are common characteristics in NOW-based architectures.

This work was supported by SPAWAR contract #N00039-94-C-0018 and ARPA order #B990.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, A., Bianchini, R., Chaiken, D., and Johnson, K. The MIT Alewife Machine: Architecture and Performance. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (June 1995), pp. 2–13.

    Google Scholar 

  2. Basu, A., Buch, V., Vogels, W., and von Eicken, T. U-Net: A User-Level Network Interface for Parallel and Distributed Computing. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (December 1995).

    Google Scholar 

  3. Blumrich, M., et al. Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer. In Proceedings of the 21st Annual International Symposium on Computer Architecture (April 1994), pp. 142–153.

    Google Scholar 

  4. Boden, N., et al. Myrinet — A Gigabit-per-second Local-Area Network. IEEE MICRO 15, 1 (February 1995), 29–36.

    Google Scholar 

  5. Bryg, W., Chan, K., and Fiduccia, N. A High-Performance, Low-Cost Multiprocessor Bus for Workstations and Midrange Servers. Hewlett-Packard Journal 47, 1 (February 1996), 18–24.

    Google Scholar 

  6. Buzzard, G., et al. An implementation of the Hamlyn sender-managed interface architecture. In Proceedings of the Second Symposium on Operating System Design and Implementation (October 1996).

    Google Scholar 

  7. Chan, K., et al. Design of the HP PA 7200 CPU. Hewlett-Packard Journal 47, 1 (February 1996), 25–33.

    Google Scholar 

  8. Chandra, S., Larus, J., and Rogers, A. Where is Time Spent in Message-Passing and Shared-Memory Programs? In Proceedings of the 6th Symposium on Architectural Support for Programming Languages and Operating Systems (Oct. 1994), pp. 61–73.

    Google Scholar 

  9. Culler, D. E., et al. Parallel Programming in Split-C. In Proceedings of Supercompuling '93 (Nov. 1993), pp. 262–273.

    Google Scholar 

  10. Dalton, C., et al. Afterburner: A Network-Independent Card Provides Architectural Support for High-Performance Protocols. IEEE Network (July 1993), 36–43.

    Google Scholar 

  11. Dubnicki, C., Iftode, L., Felten, E., and Li, K. Software Support of Virtual Memory Mapped Communication. In 10th International Parallel Processing Symposium (Apr. 1996).

    Google Scholar 

  12. Gillett, R., and Kaufmann, R. Experience Using the First-Generation Memory Channel for PCI Network. In HOT Interconnects Symposium IV (Aug. 1996).

    Google Scholar 

  13. Heinrich, M., et al. The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor. In Proceedings of the 6th Symposium on Architectural Support for Programming Languages and Operating Systems (Oct. 1994), pp. 274–285.

    Google Scholar 

  14. Hewlett-Packard Co.PA-RISC 1.1 Architecture and Instruction Set Reference Manual, February 1994.

    Google Scholar 

  15. Hunt, D. Advanced Performance Features of the 64-bit PA-8000. In COMPCON '95 (1995), pp. 123–128.

    Google Scholar 

  16. Paikin, S., Lauria, and Chien, A. High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet. In Proceedings of Supercomputing '88 (1995).

    Google Scholar 

  17. Stoller, L., Kuramkote, R., and Swanson, M. PAINT-PA Instruction Set Interpreter. Tech. Rep. UUCS-96-009, University of Utah — Computer Science Department, September 1996.

    Google Scholar 

  18. Stoller, L., and Swanson, M. Direct Deposit: A Basic User-Level Protocol for Carpet Clusters. Tech. Rep. UUCS-95-003, University of Utah-Computer Science Department, March 1995.

    Google Scholar 

  19. Swanson, M., and Stoller, L. Low Latency Workstation Cluster Communications Using Sender-Based Protocols — Computer Science Department. Tech. Rep. UUCS-96-001, University of Utah, March 1996.

    Google Scholar 

  20. Thekkath, A., and Levy, H. Limits to Low-Latency Communications on High-Speed Networks. acm Transactions on Computer Systems 11, 2 (May 1993), 179–203.

    Google Scholar 

  21. von Eicken, T., Culler, D. E., Goldstein, S. C., and Schauser, K. E. Active Messages: a Mechanism for Integrated Communication and Computation,. In Proceedings of the 19th Annual International Symposium on Computer Architecture (May 1992), pp. 256–266.

    Google Scholar 

  22. Wilkes, J. Hamlyn — an interface for sender-based communication. Tech. Rep. HPL-OSR-92-13, Hewlett-Packard Research Laboratory, Nov. 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dhabaleswar K. Panda Craig B. Stunkel

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Davis, A., Swanson, M., Parker, M. (1997). Efficient communication mechanisms for cluster based parallel computing. In: Panda, D.K., Stunkel, C.B. (eds) Communication and Architectural Support for Network-Based Parallel Computing. CANPC 1997. Lecture Notes in Computer Science, vol 1199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62573-9_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-62573-9_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62573-5

  • Online ISBN: 978-3-540-68085-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics