research-article

Efficient sequential consistency using conditional fences

Authors:
Changhui Lin

CSE Department, University of California, Riverside, CA, USA

CSE Department, University of California, Riverside, CA, USA
View Profile

,
Vijay Nagarajan

School of Informatics, University of Edinburgh, Edinburgh, United Kingdom

School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Rajiv Gupta

CSE Department, University of California, Riverside, CA, USA

CSE Department, University of California, Riverside, CA, USA
View Profile

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesSeptember 2010Pages 295–306https://doi.org/10.1145/1854273.1854312

Published:11 September 2010Publication History

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Pages 295–306

ABSTRACT

Among the various memory consistency models, the sequential consistency (SC) model, in which memory operations appear to take place in the order specified by the program, is the most intuitive and enables programmers to reason about their parallel programs the best. Nevertheless, processor designers often choose to support relaxed memory consistency models because the weaker ordering constraints imposed by such models allow for more instructions to be reordered and enable higher performance. Programs running on machines supporting weaker consistency models, can be transformed into ones in which SC is enforced. The compiler does this by computing a minimal set of memory access pairs whose ordering automatically guarantees SC. To ensure that these memory access pairs are not reordered, memory fences are inserted. Unfortunately, insertion of such memory fences can significantly slowdown the program.

We observe that the ordering of the minimal set of memory accesses that the compiler strives to enforce, is typically already enforced in the normal course of program execution. A study we conducted on programs with compiler inserted memory fences shows that only 8% of the executed instances of the memory fences are really necessary to ensure SC. Motivated by this study we propose the conditional fence mechanism (C-Fence) that utilizes compiler information to decide dynamically if there is a need to stall at each fence. Our experiments with SPLASH-2 benchmarks show that, with C-Fences, programs can be transformed to enforce SC incurring only 12% slowdown, as opposed to 43% slowdown using normal fence instructions. Our approach requires very little hardware support (<300 bytes of on-chip-storage) and it avoids the use of speculation and its associated costs.

References

}}S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66--76, 1995. Google ScholarDigital Library
}}W. Ahn, S. Qi, M. Nicolaides, J. Torrellas, J.-W. Lee, X. Fang, S. Midkiff, and D. Wong. BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support. In Proceedings of MICRO-42, pages 133--144, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
}}C. Blundell, M. M. Martin, and T. F. Wenisch. Invisifence: performance-transparent memory ordering in conventional multiprocessors. In Proceedings of ISCA-36, pages 233--244, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
}}L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk enforcement of sequential consistency. In Proceedings of ISCA-34, pages 278--289, 2007. Google ScholarDigital Library
}}H. Chafi, J. Casper, B. D. Carlstrom, A. McDonald, C. C. Minh, W. Baek, C. Kozyrakis, and K. Olukotun. A scalable, non-blocking approach to transactional memory. In HPCA-13, pages 97--108, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
}}W.-Y. Chen, A. Krishnamurthy, and K. Yelick. Polynomial-time algorithms for enforcing sequential consistency in SPMD programs with arrays. In LCPC, pages 2--4. Springer-Verlag, 2003.Google Scholar
}}E. W. Dijkstra. Cooperating sequential processes. The origin of concurrent programming: from semaphores to remote procedure calls, pages 65--138, 2002. Google ScholarDigital Library
}}Y. Duan, X. Feng, L. Wang, C. Zhang, and P.-C. Yew. Detecting and eliminating potential violations of sequential consistency for concurrent C/C++ programs. In CGO '09: Proceedings of the 2009 International Symposium on Code Generation and Optimization, pages 25--34, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
}}X. Fang, J. Lee, and S. P. Midkiff. Automatic fence insertion for shared memory multiprocessing. In ICS '03: Proceedings of the 17th annual international conference on Supercomputing, pages 285--294, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
}}K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing, pages 355--364, 1991.Google Scholar
}}C. Gniady and B. Falsafi. Speculative sequential consistency with little custom storage. In PACT '02: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, pages 179--188, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
}}C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC + ILP = RC? In Proceedings of ISCA-26, pages 162--171, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarDigital Library
}}L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. SIGARCH Comput. Archit. News, 32(2):102, 2004. Google ScholarDigital Library
}}A. Kamil, J. Su, and K. Yelick. Making sequential consistency practical in titanium. In SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 15, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
}}A. Krishnamurthy and K. Yelick. Optimizing parallel programs with explicit synchronization. In Proceedings of the ACM SIGPLAN '95 Conference on Programming Language Design and Implementation, pages 196--204, 1995. Google ScholarDigital Library
}}A. Krishnamurthy and K. Yelick. Analyses and optimizations for shared address space programs. Journal of Parallel and Distributed Computing, 38, 1996. Google ScholarDigital Library
}}L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess progranm. IEEE Trans. Comput., 28(9):690--691, 1979. Google ScholarDigital Library
}}J. Lee and D. A. Padua. Hiding relaxed memory consistency with a compiler. IEEE Trans. Comput., 50(8):824--833, 2001. Google ScholarDigital Library
}}K. Lee, X. Fang, and S. P. Midkiff. Practical escape analyses: how good are they? In VEE '07: Proceedings of the 3rd international conference on Virtual execution environments, pages 180--190, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
}}S. P. Midkiff. Dependence analysis in parallel loops with i±k subscripts. In LCPC, pages 331--345, 1995. Google ScholarDigital Library
}}S. P. Midkiff and D. A. Padua. Issues in the optimization of parallel programs. In Proceedings of the 1990 International Conference on Parallel Processing, Volume 2: Software, pages 105--113, Urbana-Champaign, IL, USA, 1990.Google Scholar
}}P. Ranganathan, V. Pai, and S. Adve. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures, page pages, 1997. Google ScholarDigital Library
}}J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.Google Scholar
}}D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282--312, 1988. Google ScholarDigital Library
}}Z. Sura, X. Fang, C.-L. Wong, S. P. Midkiff, J. Lee, and D. Padua. Compiler techniques for high performance sequentially consistent Java programs. In PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 2--13, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
}}T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for store-wait-free multiprocessors. In Proceedings of ISCA-34, pages 266--277, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
}}S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proceedings of ISCA-22, pages 24--36, New York, NY, USA, 1995. ACM. Google ScholarDigital Library

Index Terms

Efficient sequential consistency using conditional fences

Recommendations

Efficient sequential consistency via conflict ordering
ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

Although the sequential consistency (SC) model is the most intuitive, processor designers often choose to support relaxed memory consistency models for higher performance. This is because SC implementations that match the performance of relaxed memory ...
Read More
WeeFence: toward making fences free in TSO
ICSA '13

Although fences are designed for low-overhead concurrency coordination, they can be expensive in current machines. If fences were largely free, faster fine-grained concurrent algorithms could be devised, and compilers could guarantee Sequential ...
Read More
Efficient sequential consistency via conflict ordering
ASPLOS '12

Although the sequential consistency (SC) model is the most intuitive, processor designers often choose to support relaxed memory consistency models for higher performance. This is because SC implementations that match the performance of relaxed memory ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
September 2010
596 pages
ISBN:9781450301787
DOI:10.1145/1854273
General Chair:
Valentina Salapura
IBM TJ Watson Research Center
,
Program Chairs:
Michael Gschwind
IBM Systems & Technology Group
,
Jens Knoop
Technische Universität Wien
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 September 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active table
associates
conditional fences
interprocessor delay
memory consistency
sequential consistency
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 621
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.