Skip to main content
Log in

Adaptive granularity: Transparent integration of fine- and coarse-grain communication

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The granularity of shared data is one of the key factors affecting the performance of distributed shared memory machines (DSM). Given that programs exhibit quite different sharing patterns, providing only one or two fixed granularities cannot result in an efficient use of resources. On the other hand, supporting arbitrarily granularity sizes significantly increases not only hardware complexity but software overhead as well. Furthermore. the efficient use of arbitrarily granularities put the burden on users to provide information about program behavior to compilers and/or runtime systems. These kind of requirements tend to restrict the programmability of the shared memory model. In this paper, we present a new communication scheme, calledAdaptive Granularity (AG). Adaptive Granularity makes it possible to transparently integrate bulk transfer into the shared memory model by supporting variable-size granularity and memory replication. It consists of two protocols: one for small data and another for large data. For small size data, the standard hardware DSM protocol is used and the granularity is fixed to the size of a cache line. For large array data, the protocol for bulk data is used instead, and the granularity varies depending on the runtime sharing behavior of the applications. Simulation results show that AG improves performance up to 43% over the hardware implementation of DSM (e.g., DASH, Alewife). Compared with an equivalent architecture that supports fine-grain memory replication at the fixed granularity of a cache line (e.g., Typhoon), AG reduces execution time up to 35%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Kranz, K. L. Johnson, A. Agarwal, J. Kubiatowicz, and B.-H. Lim, Integrating Message-Passing and Shared-Memory: Early Experience,Proc. Third ACM SIGPLAN Symp. Principles & Practice of Parallel Progr., pp. 54–63 (May 1993).

  2. D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hannessy, The Directory-Dased Cache Coherence Protocol for the DASH Multiprocessors,Proc. 17th Ann. Intl. Symp. Comput. Architect., pp. 148–159 (June 1990).

  3. S. C. Woo, J. P. Singh, and J. L. Hennessy, The Performance Advantages of Intergrating Block Data Transfer in Cache-Coherent Multiprocessors,Proc. Sixth Intl. Conf. Architectural Support Progr. Lang. Oper. Syst., pp. 219–229 (October 1994).

  4. J. Heinlein, K. Gharachorloo, S. Dresser, and A. Gupta, Integration of Message Passing and Shared Memory in the Stanford FLASH Multiprocessor,Proc. 6th Intl. Conf. Architect. Support Progr. Lang. Oper. Syst., pp. 38–50 (February 1994).

  5. R. Chandra, K. Gharachorloo, V. Soundararajan, and A. Gupta, Performance Evaluation of Hybrid Hardware and Software Distributed Shared Memory Protocol,Proc. 8th ACM Intl. Conf. Supercomputing, pp. 274–288 (July 1994).

  6. J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy, The Stanford FLASH Multiprocessor,Proc. 21th Ann. Intl. Symp. Comput. Architect., pp. 302–313 (April 1994).

  7. S. K. Reinhardt, J. R. Larus, and D. A. Wood, Typhoon and Tempest: User-Level Shared Memory,Proc. 21st Ann. Intl. Symp. Comput. Architect., pp. 325–336 (April 1994).

  8. P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel, TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems,Proc. Winter USENIX Conf., pp. 115–131 (January 1994).

  9. B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon, The Midway Distributed Shared Memory System,Proc. 1993 IEEE CompCon Conf., pp. 528–537 (February 1993).

  10. P. Keleher, A. L. Cox, and W. Zwaenepoel, Laze Release Consistency for Software Distributed Shared Memory,Proc. 19th Ann. Intl. Symp. Comp. Architect. pp. 13–21 (May 1992).

  11. K. Li and P. Hudak, Memory Coherence in Shared Virtual Memory System,ACM Trans. Comput. Syst. 7(4):321–359 (November 1989).

    Article  Google Scholar 

  12. J. K. Bennett, J. B. Carter, and W. Zwaenepoel, Munin: Distributed Shared Memory Based on Type-Specific Memory Coherence,Proc. 2nd ACM SIGPLAN Symp. Principles & Practice of Parallel Progr. pp. 168–176 (March 1990).

  13. J. S. Chase, F. G. Amador, E. D. Lazowska, H. M. Levy, and R. J. Littlefied, The Amber System: Parallel Programming on a Network of Multiprocessors,Proc. 12th ACM Symp. Oper. Syst. Principles, pp. 147–168 (December 1989).

  14. H. S. Sandhu, B. Gamsa, and S. Zhou, The Shared Regions Approach to Software Cache Coherence on Multiprocessors,Proc. Third ACM SIGPLAN Symp. Principles & Practice of Parallel Progr., pp. 229–238 (May 1993).

  15. E. Jul. H. M. Levy, N. Hutchinson, and A. Black, Fine-Grained Mobility in the Emerald System,ACM Trans. Computer Syst. 6(1): 109–133 (February 1988).

    Article  Google Scholar 

  16. S. Chandra, J. R. Larus, and A. Rogers, Where is Time Spent in Message-Passing and Shared-Memory Programs?Proc. Sixth Intl. Conf. Architect. Support Progr. Lang. Oper. Syst., pp. 61–73 (November 1994).

  17. A. S. Tanenbaum,Modern Operating Systems. Prentice Hall (1992).

  18. M. Dubois, C. Scheurich, and F. A. Briggs, Synchronization, Coherence, and Event Ordering in Multiprocessors,IEEE Computer 21(2):9–21 (February 1988).

    Google Scholar 

  19. D. Park and R. H. Saavedra, Trojan: High-Performance Simulator for Parallel Shared-Memory Architecture,Proc. 29th Ann. Simulation Symp., pp. 44–53 (April 1990).

  20. E. Hagersten, A. Landin, and S. Haridi, DDM—Cache-Only Memory Architecture,IEEE Computer 25(9):44–54 (September 1992).

    Google Scholar 

  21. C. Dubnicki and T. LeBlanc, Adjustable Block Size Coherent Caches,Proc. 19th Ann. Intl. Symp. Computer Architect., pp. 170–180 (May 1992).

  22. A. W. Wilson, Jr., and R. P. LaRowe, Jr., Hiding Shared Memory Reference Latency on the Galactica Net Distributed Shared Memory Architecture,J. Parallel Distrib. Comput. 15(4):351–367 (August 1992).

    Article  Google Scholar 

  23. D. Yeung, J. Kubiatowicz, and A. Agarwal, MGS: A Multigrain Shared Memory System,Proc. 23rd Ann. Intl. Symp. Comput. Architect., pp. 44–55 (May 1996).

Download references

Author information

Authors and Affiliations

Authors

Additional information

This research was supported in part by NSF under grant CCR-9308981 by ARPA under Rome Laboratories Contract F30602-91-C-0146, and by the USC Zumberge Fund. Computing resources were provided in part by NSF infrastructure grant CDA-9216321.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, D., Saavedra, R.H. & Moon, S. Adaptive granularity: Transparent integration of fine- and coarse-grain communication. Int J Parallel Prog 25, 419–446 (1997). https://doi.org/10.1007/BF02699885

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02699885

Key words

Navigation