Abstract
Distributed shared memory (DSM) systems provide a simple programming paradigm for networks of workstations, which are gaining popularity due to their cost-effective high computing power. However, DSM systems usually exhibit poor performance due to the large communication delay between the nodes; and a lot of different memory consistency models have been proposed to mask the network delay. In this paper, we propose an asynchronous protocol for the release consistent memory model, which we call an Asynchronous Release Consistency (ARC) protocol. Unlike other protocols where the communication adheres to the synchronous request/receive paradigm, the ARC protocol is asynchronous, such that the necessary pages are broadcast before they are requested. Hence, the network delay can be reduced by proper prefetching of necessary pages. We have also compared the performance of the ARC protocol with the lazy release protocol by running standard benchmark programs; and the experimental results showed that the ARC protocol achieves a performance improvement of up to 29%.
Similar content being viewed by others
References
S. Adve, A. L. Cox, S. Dwarkadas, R. Rajamony, and W. Zwaenepoel. A comparison of entry consistency and lazy release consistency implementations. In Proc. 2nd High Performance Computer Architecture Conference, pp. 26–37, 1996.
S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. Technical Report WRL-TR 95/7, Digital Western Research Laboratory, 1995.
S. V. Adve and M. D. Hill. Weak ordering-a new definition. In Proc. 17th Annual Int'l Symp. on Computer Architecture, pp. 2–14, 1990.
M. Ahamad, R. A. Bazzi, R. John, P. Kohli, and G. Neiger. The power of processor consistency. In Proc. 5th ACM Annual Symp. on Parallel Algorithms and Architectures, pp. 251–260, 1993.
B. N. Bershad. The Midway distributed shared memory system. In Proc. IEEE CompCon Conference, pp. 528–537, 1993.
P. Bitar. The weakest memory-access order. Journal of Parallel and Distributed Computing, 15:305–331, 1992.
J. B. Carter, J. K. Bennett, and W. Zwaenepoel. Implementation and performance of Munin. In Proc. 13th ACM Symp. Operating Systems Principles, pp. 152–164, 1991.
M. Dubois, C. Scheurich, and F. A. Briggs. Memory access buffering in multiprocessors. In Proc. 13th Annual Int'l Symp. on Computer Architecture, pp. 434–442, 1986.
K. Gharachorloo, D. E. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. L. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In In Proc. 17th Annual Int'l Symp. on Computer Architecture, pp. 15–26, 1990.
J. R. Goodman and P. J. Woest. The wisconsin multicube: A new large-scale cache-coherent multiprocessor. In Proc. 15th Annual Int'l Symp. on Computer Architecture, pp. 422–431, 1988.
P. W. Hutto and M. Ahamad. Slow memory: Weakening consistency to enhance concurrency in distributed shared memories. In Proc. 10th Int'l Conf. on Distributed Computing Systems, pp. 302–311, 1990.
L. Iftode, J. P. Singh, and K. Li. Scope consistency: A bridge between release consistency and entry consistency. In Proc. 8th ACM Annual Symp. on Parallel Algorithms and Architectures, pp. 277–287, 1996.
P. Keleher. Lazy release consistency for distributed shared memory. In Proc. 18th Annual Int'l Symp. on Computer Architecture, pp. 13–21, 1992.
P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. An evaluation of software-based release consistent protocols. Technical Report CS-TR-3543, University of Maryland, Computer Science Department, 1995.
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690–691, 1979.
D. E. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. L. Hennessy. The directory-based cache coherence protocol for the dash multiprocessor. In Proc. 17th Annual Int'l Symp. on Computer Architecture, pp. 148–159, May 1990.
D. E. Lenoski, J. Ludon, K. Gharachorloo, W. D. Weber, A. Gupta, J. L. Hennessy, M. Horowitz, and M. S. Lam. The stanford dash multiprocessor. IEEE Computer, 25(3):63–79, 1992.
K. Li and P. Hudak. Memory coherence in shared virtual memory systems. ACM Transactions on Computer Systems, 7(4):321–359, 1989.
B. H. Lim and R. Bianchini. Limits on the performance benefits of multithreading and prefetching. In Proc. Int'l Conf. on the Measurement and Modeling of Computer Systems, May 1996.
H. Lu, S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. Quantifying the performance differences between pvm and treadmarks. Journal of Parallel and Distributed Computation, 43:65–78, 1997.
B. Nitzberg and V. Lo. Distributed shared memory: A survey of issues and algorithms. IEEE Computer, 24(8):52–60, 1991.
E. W. Parsons, M. Brorsson, and K. C. Sevcik. Predicting the performance of distributed virtual shared-memory applications. IBM Systems Journal, 36(4), 1997.
P. Stenstrom. A survey of cache coherence schemes for multiprocessors. IEEE Computer, 23(6):12–24, 1990.
K. Thitikamol and P. Keleher. Multi-threading and remote latency in software DSMs. In Proc. 17th Int'l Conf. on Distributed Computing Systems, pp. 296–304, 1997.
K. Thitikamol and P. Keleher. Per-node multi-threading and remote latency. IEEE Transactions on Computers, 1998.
J. E. Veenstra and R. Fowler. MINT tutorial and user manual. Rochester University, 1993.
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. 22nd Annual Int'l Symp. on Computer Architecture, 1995.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Yeo, J., Yeom, H.Y. & Park, T. An Asynchronous Protocol for Release Consistent Distributed Shared Memory Systems. The Journal of Supercomputing 24, 25–41 (2003). https://doi.org/10.1023/A:1020937425960
Issue Date:
DOI: https://doi.org/10.1023/A:1020937425960