Adaptive granularity: Transparent integration of fine- and coarse-grain communication

Park, Daeyeon; Saavedra, Rafael H.; Moon, Sungdo

doi:10.1007/BF02699885

Adaptive granularity: Transparent integration of fine- and coarse-grain communication

Published: October 1997

Volume 25, pages 419–446, (1997)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Daeyeon Park¹,
Rafael H. Saavedra² &
Sungdo Moon²

66 Accesses
2 Citations
Explore all metrics

Abstract

The granularity of shared data is one of the key factors affecting the performance of distributed shared memory machines (DSM). Given that programs exhibit quite different sharing patterns, providing only one or two fixed granularities cannot result in an efficient use of resources. On the other hand, supporting arbitrarily granularity sizes significantly increases not only hardware complexity but software overhead as well. Furthermore. the efficient use of arbitrarily granularities put the burden on users to provide information about program behavior to compilers and/or runtime systems. These kind of requirements tend to restrict the programmability of the shared memory model. In this paper, we present a new communication scheme, calledAdaptive Granularity (AG). Adaptive Granularity makes it possible to transparently integrate bulk transfer into the shared memory model by supporting variable-size granularity and memory replication. It consists of two protocols: one for small data and another for large data. For small size data, the standard hardware DSM protocol is used and the granularity is fixed to the size of a cache line. For large array data, the protocol for bulk data is used instead, and the granularity varies depending on the runtime sharing behavior of the applications. Simulation results show that AG improves performance up to 43% over the hardware implementation of DSM (e.g., DASH, Alewife). Compared with an equivalent architecture that supports fine-grain memory replication at the fixed granularity of a cache line (e.g., Typhoon), AG reduces execution time up to 35%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Locality-Based Optimizations in the Chapel Compiler

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Article Open access 06 April 2024

RAMCI: a novel asynchronous memory copying mechanism based on I/OAT

Article 04 March 2021

References

D. Kranz, K. L. Johnson, A. Agarwal, J. Kubiatowicz, and B.-H. Lim, Integrating Message-Passing and Shared-Memory: Early Experience,Proc. Third ACM SIGPLAN Symp. Principles & Practice of Parallel Progr., pp. 54–63 (May 1993).
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hannessy, The Directory-Dased Cache Coherence Protocol for the DASH Multiprocessors,Proc. 17th Ann. Intl. Symp. Comput. Architect., pp. 148–159 (June 1990).
S. C. Woo, J. P. Singh, and J. L. Hennessy, The Performance Advantages of Intergrating Block Data Transfer in Cache-Coherent Multiprocessors,Proc. Sixth Intl. Conf. Architectural Support Progr. Lang. Oper. Syst., pp. 219–229 (October 1994).
J. Heinlein, K. Gharachorloo, S. Dresser, and A. Gupta, Integration of Message Passing and Shared Memory in the Stanford FLASH Multiprocessor,Proc. 6th Intl. Conf. Architect. Support Progr. Lang. Oper. Syst., pp. 38–50 (February 1994).
R. Chandra, K. Gharachorloo, V. Soundararajan, and A. Gupta, Performance Evaluation of Hybrid Hardware and Software Distributed Shared Memory Protocol,Proc. 8th ACM Intl. Conf. Supercomputing, pp. 274–288 (July 1994).
J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy, The Stanford FLASH Multiprocessor,Proc. 21th Ann. Intl. Symp. Comput. Architect., pp. 302–313 (April 1994).
S. K. Reinhardt, J. R. Larus, and D. A. Wood, Typhoon and Tempest: User-Level Shared Memory,Proc. 21st Ann. Intl. Symp. Comput. Architect., pp. 325–336 (April 1994).
P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel, TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems,Proc. Winter USENIX Conf., pp. 115–131 (January 1994).
B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon, The Midway Distributed Shared Memory System,Proc. 1993 IEEE CompCon Conf., pp. 528–537 (February 1993).
P. Keleher, A. L. Cox, and W. Zwaenepoel, Laze Release Consistency for Software Distributed Shared Memory,Proc. 19th Ann. Intl. Symp. Comp. Architect. pp. 13–21 (May 1992).
K. Li and P. Hudak, Memory Coherence in Shared Virtual Memory System,ACM Trans. Comput. Syst. 7(4):321–359 (November 1989).
Article Google Scholar
J. K. Bennett, J. B. Carter, and W. Zwaenepoel, Munin: Distributed Shared Memory Based on Type-Specific Memory Coherence,Proc. 2nd ACM SIGPLAN Symp. Principles & Practice of Parallel Progr. pp. 168–176 (March 1990).
J. S. Chase, F. G. Amador, E. D. Lazowska, H. M. Levy, and R. J. Littlefied, The Amber System: Parallel Programming on a Network of Multiprocessors,Proc. 12th ACM Symp. Oper. Syst. Principles, pp. 147–168 (December 1989).
H. S. Sandhu, B. Gamsa, and S. Zhou, The Shared Regions Approach to Software Cache Coherence on Multiprocessors,Proc. Third ACM SIGPLAN Symp. Principles & Practice of Parallel Progr., pp. 229–238 (May 1993).
E. Jul. H. M. Levy, N. Hutchinson, and A. Black, Fine-Grained Mobility in the Emerald System,ACM Trans. Computer Syst. 6(1): 109–133 (February 1988).
Article Google Scholar
S. Chandra, J. R. Larus, and A. Rogers, Where is Time Spent in Message-Passing and Shared-Memory Programs?Proc. Sixth Intl. Conf. Architect. Support Progr. Lang. Oper. Syst., pp. 61–73 (November 1994).
A. S. Tanenbaum,Modern Operating Systems. Prentice Hall (1992).
M. Dubois, C. Scheurich, and F. A. Briggs, Synchronization, Coherence, and Event Ordering in Multiprocessors,IEEE Computer 21(2):9–21 (February 1988).
Google Scholar
D. Park and R. H. Saavedra, Trojan: High-Performance Simulator for Parallel Shared-Memory Architecture,Proc. 29th Ann. Simulation Symp., pp. 44–53 (April 1990).
E. Hagersten, A. Landin, and S. Haridi, DDM—Cache-Only Memory Architecture,IEEE Computer 25(9):44–54 (September 1992).
Google Scholar
C. Dubnicki and T. LeBlanc, Adjustable Block Size Coherent Caches,Proc. 19th Ann. Intl. Symp. Computer Architect., pp. 170–180 (May 1992).
A. W. Wilson, Jr., and R. P. LaRowe, Jr., Hiding Shared Memory Reference Latency on the Galactica Net Distributed Shared Memory Architecture,J. Parallel Distrib. Comput. 15(4):351–367 (August 1992).
Article Google Scholar
D. Yeung, J. Kubiatowicz, and A. Agarwal, MGS: A Multigrain Shared Memory System,Proc. 23rd Ann. Intl. Symp. Comput. Architect., pp. 44–55 (May 1996).

Download references

Author information

Authors and Affiliations

Department of Control and Instrumentation Engineering, Hankuk University of Foreign Studies, Yongin, 449-791, Kyoungkido, Republic of Korea
Daeyeon Park
Department of Computer Science, SAL-300, University of Southern California, 90089-0781, Los Angeles, California
Rafael H. Saavedra & Sungdo Moon

Authors

Daeyeon Park
View author publications
You can also search for this author inPubMed Google Scholar
Rafael H. Saavedra
View author publications
You can also search for this author inPubMed Google Scholar
Sungdo Moon
View author publications
You can also search for this author inPubMed Google Scholar

Additional information

This research was supported in part by NSF under grant CCR-9308981 by ARPA under Rome Laboratories Contract F30602-91-C-0146, and by the USC Zumberge Fund. Computing resources were provided in part by NSF infrastructure grant CDA-9216321.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, D., Saavedra, R.H. & Moon, S. Adaptive granularity: Transparent integration of fine- and coarse-grain communication. Int J Parallel Prog 25, 419–446 (1997). https://doi.org/10.1007/BF02699885

Download citation

Issue Date: October 1997
DOI: https://doi.org/10.1007/BF02699885

Key words

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive granularity: Transparent integration of fine- and coarse-grain communication

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Locality-Based Optimizations in the Chapel Compiler

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

RAMCI: a novel asynchronous memory copying mechanism based on I/OAT

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Subscribe and save

Buy Now