Relaxed consistency and coherence granularity in DSM systems: a performance evaluation

Authors:
Yuanyuan Zhou

Computer Science Department, Princeton University, Princeton, NJ

Computer Science Department, Princeton University, Princeton, NJ
View Profile

,
Liviu Iftode

Computer Science Department, Princeton University, Princeton, NJ

Computer Science Department, Princeton University, Princeton, NJ
View Profile

,
Jaswinder Pal Sing

Computer Science Department, Princeton University, Princeton, NJ

Computer Science Department, Princeton University, Princeton, NJ
View Profile

,
Kai Li

Computer Science Department, Princeton University, Princeton, NJ

Computer Science Department, Princeton University, Princeton, NJ
View Profile

,
Brian R. Toonen

Computer Sciences Department, University of Wisconsin, Madison, Madison, WI

Computer Sciences Department, University of Wisconsin, Madison, Madison, WI
View Profile

,
Ioannis Schoinas

Computer Sciences Department, University of Wisconsin, Madison, Madison, WI

Computer Sciences Department, University of Wisconsin, Madison, Madison, WI
View Profile

,
Mark D. Hill

Computer Sciences Department, University of Wisconsin, Madison, Madison, WI

Computer Sciences Department, University of Wisconsin, Madison, Madison, WI
View Profile

,
David A. Wood

Computer Sciences Department, University of Wisconsin, Madison, Madison, WI

Computer Sciences Department, University of Wisconsin, Madison, Madison, WI
View Profile

PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programmingJune 1997Pages 193–205https://doi.org/10.1145/263764.263788

Published:21 June 1997Publication History

PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 193–205

ABSTRACT

During the past few years, two main approaches have been taken to improve the performance of software shared memory implementations: relaxing consistency models and providing fine-grained access control. Their performance tradeoffs, however, we not well understood. This paper studies these tradeoffs on a platform that provides access control in hardware but runs coherence protocols in software, We compare the performance of three protocols across four coherence granularities, using 12 applications on a 16-node cluster of workstations. Our results show that no single combination of protocol and granularity performs best for all the applications. The combination of a sequentially consistent (SC) protocol and fine granularity works well with 7 of the 12 applications. The combination of a multiple-writer, home-based lazy release consistency (HLRC) protocol and page granularity works well with 8 out of the 12 applications. For applications that suffer performance losses in moving to coarser granularity under sequential consistency, the performance can usually be regained quite effectively using relaxed protocols, particularly HLRC. We also find that the HLRC protocol performs substantially better than a single-writer lazy release consistent (SW-LRC) protocol at coase granularity for many irregular applications. For our applications and platform, when we use the original versions of the applications ported directly from hardware-coherent shared memory, we find that the SC protocol with 256-byte granularity performs best on average. However, when the best versions of the applications are compared, the balance shifts in favor of HLRC at page granularity.

References

1.J.K. Bennett, J.B. Carter, and W. Zwaenepoel. Adaptive Software Cache Management for Distributed Shared Memory Architectures. In Proceedings of the 17th Annual Symposium on Computer Architecture, pages 125-134, May 1990.]] Google ScholarDigital Library
2.B.N. Bershad, M.J. Zekauskas, and W.A. Sawdon. The Midway Distributed Shared Memory System. in Proceedings of the IEEE COMPCON '93 Conference, February 1993.]]Google ScholarCross Ref
3.Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro, 15(1):29-36, February 1995.]] Google ScholarDigital Library
4.L. Borrmann and M. tierdieckerhoff. A Coherency Model for Virtual Shared Memory. In Proceedings of the l Oth International Parallel Processing Symposium, June 1990.]]Google Scholar
5.J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Implementation and Performance of Munin. In Proceedings of the Thirteenth Symposium on Operating Systems Principles, pages 152-164, October 1991.]] Google ScholarDigital Library
6.David Culler, Lok Tin Liu, Richard Martin, and Chad Yoshikawa. LogP Performance Assessment of Fast Network Interfaces. IEEE Micro, pages 35-43, February 1996.]] Google ScholarDigital Library
7.David Culler, Lok Tin Liu, Richard Martin, and Chad Yoshil#wa. LogP Performance Assessment of Fast Network Interfaces. IEEE Micro, pages 35-43, February 1996.]] Google ScholarDigital Library
8.M. Dubois, J.C. Wang, L.A. Barroso, K. Lee, and Y-S Chen. Delayed Consistency and Its Effects on the Miss Rate of Parallel Programs. In Supercompu#ing '91, pages 197-206, 1991.]] Google ScholarDigital Library
9.A. Erlichson, N. Nuckolls, G. Chesson, and J. Hennessy. Soft- FLASH: Analyzing the Performance of Clustered Distributed Virtual Shared Memory. In The 6th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996.]] Google ScholarDigital Library
10.K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In Proceedings of the 17th Annual Symposium on Computer Architecture, pages 15-26, May 1990.]] Google ScholarDigital Library
11.L. Iftode, C. Dubnicki, E. W. Felten, and Kai Li. Improving Release-Consistent Shared Virtual Memory using Automatic Update. In The 2nd IEEE Symposium on High-Performance Computer Architecture, February 1996.]] Google ScholarDigital Library
12.L. Iftode, J. P. Singh, and Kai Li. Understanding Application Performance on Shared Virtual Memory. in Proceedings of the Z3rd Annual Symposium on Computer Architecture, May 1996.]] Google ScholarDigital Library
13.L. lftode, J.P. Singh, and K. Li. Scope Consistency: a Bridge Between Release Consistency and Entry Consistency. In Proceedings of the 8th Annual A CM Symposium on Parallel Algorithms and Architectures, June 1996.]] Google ScholarDigital Library
14.P. Keleher, A.L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proceedings of the Winter USENIX Conference, pages 115-132, January 1994.]] Google ScholarDigital Library
15.P. Keleher, A.L. Cox, and W. Zwaenepoel. Lazy Consistency for Software Distributed Shared Memory. In Proceedings of the i gth Annual Symposium on Computer Architecture, pages 13-21, May 1992.]] Google ScholarDigital Library
16.P.J. Keleher. The Relative Importance of Concurrent Writers and Weak Consistency Models. in Proceedings of the IEEE COMPCON '96 Conference, February 1996.]] Google ScholarDigital Library
17.L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocessor Programs. IEEE Transactions on Computers, C-28(9):690--691, 1979.]]Google ScholarDigital Library
18.Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Wolf-Dietrich Weber, Anoop Gupta, John Hennessy, Mark Horowitz, and Monica Lain. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63-79, March 1992.]] Google ScholarDigital Library
19.K. Li. IVY: A Shared Virtual Memory System for Parallel Computing. In Proceedings of the 1988 International Con- }erence on Parallel Processing, volume II Software, pages 94-101, August 1988.]]Google Scholar
20.K. Li and P. Hudak. Memory Coherence in Shared Virtual Memory Systems. In Proceedings of the 5th Annual A CM Symposium on Principles of Distributed Computing, pages 229-239, August 1986.]] Google ScholarDigital Library
21.Scott Pakin, Mario Laura, and Andrew Chien. High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet. In Proceedings o,f Supercomputing '95, 1995.]] Google ScholarDigital Library
22.Robert W. Pfile. Typhoon-Zero Implementation: The Vortex Module. Technical report, Wisconsin University, CS department, 1995.]]Google Scholar
23.S. K. Reinhard, R. W. Pfile, and D. A. Wood. Decoupled Hardware Support for Distributed Shared Memory. In Proceedings of the P3rd Annual Symposium on Computer Architecture, May 1996.]] Google ScholarDigital Library
24.S.K. Reinhardt, J.R. Larus, and D.A. Wood. Tempest and Typhoon: User-Level Shared Memory. In Proceedings of the 21st Annual Symposium on Computer Architecture, pages 325-336, April 1994.]] Google ScholarDigital Library
25.I#OSS Technology, Inc. SPARC RISC User's Guide: hyper- SPARC Edition, September 1993.]]Google Scholar
26.D.J. Scales, K. Gharachorloo, and C.A. Thekkath. Shasta: A Low Overhead, SOftware-Only Approach for Supporting Fine-Grain Shared Memory. In The 6th International Conference on Architectural Support .for Programming Languages and Operating Systems, October 1996.]] Google ScholarDigital Library
27.I. Schoinas, B. Falsafi, A.R. Lebeck, S.K. Reinhardt, J.R. Larus, and D.A. Wood. Fine-grain Access for Distributed Shared Memory. In The 6th International Conference on Architectural Support ,for Programming Languages and Operating Systems, pages 297-306, October 1994.]] Google ScholarDigital Library
28.Ioannis Schoinas, Babak Falsafi, Mark D. Hill, James R. Larus, Christopher E. Lucas, Shubhendu S. Mukherjee, Steven K. Reinhardt, Eric Schnarr, and David A. Wood. Implementing Fine-Grain Distributed Shared Memory On Commodity SMP Workstations. Technical Report 1307, March 1996.]]Google Scholar
29.W. Weber and A. Gupta. Analysis of Cache Invalidation Patterns in Multiprocessors. In The Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 243-256, April 1989.]] Google ScholarDigital Library
30.Y. Zhou, L. Iftode, and K. Li. Performance Evaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems. In Proceedings of the Operating Systems Design and Implementation Symposium, October 1996.]] Google ScholarDigital Library

Index Terms

Recommendations

Relaxed consistency and coherence granularity in DSM systems: a performance evaluation

During the past few years, two main approaches have been taken to improve the performance of software shared memory implementations: relaxing consistency models and providing fine-grained access control. Their performance tradeoffs, however, we not well ...
Read More
Hiding Relaxed Memory Consistency with Compilers
PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques

We present a compiler technique, which is based on Shasha and Snir's delay set analysis, to hide the underlying relaxed memory consistency model for an optimizing compiler for explicitly parallel programs. The compiler presents programmers with a ...
Read More
Boosting performance of directory-based cache coherence protocols with coherence bypass at subpage granularity and a novel on-chip page table
CF '16: Proceedings of the ACM International Conference on Computing Frontiers

Chip multiprocessors (CMPs) require effective cache coherence protocols as well as fast virtual-to-physical address translation mechanisms for high performance. Directory-based cache coherence protocols are the state-of-the-art approaches in many-core ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
June 1997
287 pages
ISBN:0897919068
DOI:10.1145/263764
Chairmen:
Rob Schreiber
Hewlett-Packard Labs, Palo Alto, CA
,
Keshav Pingali
Cornell Univ., Ithaca, NY
,
Editor:
Michael A. Berman
ACM SIGPLAN Notices Volume 32, Issue 7
July 1997
287 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/263767
Chairmen:
Rob Schreiber
Hewlett-Packard Labs, Palo Alto, CA
,
Keshav Pingali
Cornell Univ., Ithaca, NY
,
Editor:
A. Michael Berman
Issue’s Table of Contents
Copyright © 1997 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 June 1997
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
PPOPP '97 Paper Acceptance Rate26of86submissions,30%Overall Acceptance Rate230of1,014submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 491
  Total Downloads
- Downloads (Last 12 months)56
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Relaxed consistency and coherence granularity in DSM systems: a performance evaluation

PPOPP '97: Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming

ABSTRACT

References

Cited By

Index Terms

Recommendations

Relaxed consistency and coherence granularity in DSM systems: a performance evaluation

Hiding Relaxed Memory Consistency with Compilers

Boosting performance of directory-based cache coherence protocols with coherence bypass at subpage granularity and a novel on-chip page table