ABSTRACT
Among the various memory consistency models, the sequential consistency (SC) model, in which memory operations appear to take place in the order specified by the program, is the most intuitive and enables programmers to reason about their parallel programs the best. Nevertheless, processor designers often choose to support relaxed memory consistency models because the weaker ordering constraints imposed by such models allow for more instructions to be reordered and enable higher performance. Programs running on machines supporting weaker consistency models, can be transformed into ones in which SC is enforced. The compiler does this by computing a minimal set of memory access pairs whose ordering automatically guarantees SC. To ensure that these memory access pairs are not reordered, memory fences are inserted. Unfortunately, insertion of such memory fences can significantly slowdown the program.
We observe that the ordering of the minimal set of memory accesses that the compiler strives to enforce, is typically already enforced in the normal course of program execution. A study we conducted on programs with compiler inserted memory fences shows that only 8% of the executed instances of the memory fences are really necessary to ensure SC. Motivated by this study we propose the conditional fence mechanism (C-Fence) that utilizes compiler information to decide dynamically if there is a need to stall at each fence. Our experiments with SPLASH-2 benchmarks show that, with C-Fences, programs can be transformed to enforce SC incurring only 12% slowdown, as opposed to 43% slowdown using normal fence instructions. Our approach requires very little hardware support (<300 bytes of on-chip-storage) and it avoids the use of speculation and its associated costs.
- }}S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66--76, 1995. Google ScholarDigital Library
- }}W. Ahn, S. Qi, M. Nicolaides, J. Torrellas, J.-W. Lee, X. Fang, S. Midkiff, and D. Wong. BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support. In Proceedings of MICRO-42, pages 133--144, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- }}C. Blundell, M. M. Martin, and T. F. Wenisch. Invisifence: performance-transparent memory ordering in conventional multiprocessors. In Proceedings of ISCA-36, pages 233--244, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- }}L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk enforcement of sequential consistency. In Proceedings of ISCA-34, pages 278--289, 2007. Google ScholarDigital Library
- }}H. Chafi, J. Casper, B. D. Carlstrom, A. McDonald, C. C. Minh, W. Baek, C. Kozyrakis, and K. Olukotun. A scalable, non-blocking approach to transactional memory. In HPCA-13, pages 97--108, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- }}W.-Y. Chen, A. Krishnamurthy, and K. Yelick. Polynomial-time algorithms for enforcing sequential consistency in SPMD programs with arrays. In LCPC, pages 2--4. Springer-Verlag, 2003.Google Scholar
- }}E. W. Dijkstra. Cooperating sequential processes. The origin of concurrent programming: from semaphores to remote procedure calls, pages 65--138, 2002. Google ScholarDigital Library
- }}Y. Duan, X. Feng, L. Wang, C. Zhang, and P.-C. Yew. Detecting and eliminating potential violations of sequential consistency for concurrent C/C++ programs. In CGO '09: Proceedings of the 2009 International Symposium on Code Generation and Optimization, pages 25--34, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
- }}X. Fang, J. Lee, and S. P. Midkiff. Automatic fence insertion for shared memory multiprocessing. In ICS '03: Proceedings of the 17th annual international conference on Supercomputing, pages 285--294, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- }}K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing, pages 355--364, 1991.Google Scholar
- }}C. Gniady and B. Falsafi. Speculative sequential consistency with little custom storage. In PACT '02: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, pages 179--188, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
- }}C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC + ILP = RC? In Proceedings of ISCA-26, pages 162--171, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarDigital Library
- }}L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. SIGARCH Comput. Archit. News, 32(2):102, 2004. Google ScholarDigital Library
- }}A. Kamil, J. Su, and K. Yelick. Making sequential consistency practical in titanium. In SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 15, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- }}A. Krishnamurthy and K. Yelick. Optimizing parallel programs with explicit synchronization. In Proceedings of the ACM SIGPLAN '95 Conference on Programming Language Design and Implementation, pages 196--204, 1995. Google ScholarDigital Library
- }}A. Krishnamurthy and K. Yelick. Analyses and optimizations for shared address space programs. Journal of Parallel and Distributed Computing, 38, 1996. Google ScholarDigital Library
- }}L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess progranm. IEEE Trans. Comput., 28(9):690--691, 1979. Google ScholarDigital Library
- }}J. Lee and D. A. Padua. Hiding relaxed memory consistency with a compiler. IEEE Trans. Comput., 50(8):824--833, 2001. Google ScholarDigital Library
- }}K. Lee, X. Fang, and S. P. Midkiff. Practical escape analyses: how good are they? In VEE '07: Proceedings of the 3rd international conference on Virtual execution environments, pages 180--190, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- }}S. P. Midkiff. Dependence analysis in parallel loops with i±k subscripts. In LCPC, pages 331--345, 1995. Google ScholarDigital Library
- }}S. P. Midkiff and D. A. Padua. Issues in the optimization of parallel programs. In Proceedings of the 1990 International Conference on Parallel Processing, Volume 2: Software, pages 105--113, Urbana-Champaign, IL, USA, 1990.Google Scholar
- }}P. Ranganathan, V. Pai, and S. Adve. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures, page pages, 1997. Google ScholarDigital Library
- }}J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.Google Scholar
- }}D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282--312, 1988. Google ScholarDigital Library
- }}Z. Sura, X. Fang, C.-L. Wong, S. P. Midkiff, J. Lee, and D. Padua. Compiler techniques for high performance sequentially consistent Java programs. In PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 2--13, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- }}T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for store-wait-free multiprocessors. In Proceedings of ISCA-34, pages 266--277, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- }}S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proceedings of ISCA-22, pages 24--36, New York, NY, USA, 1995. ACM. Google ScholarDigital Library
Index Terms
- Efficient sequential consistency using conditional fences
Recommendations
Efficient sequential consistency via conflict ordering
ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating SystemsAlthough the sequential consistency (SC) model is the most intuitive, processor designers often choose to support relaxed memory consistency models for higher performance. This is because SC implementations that match the performance of relaxed memory ...
WeeFence: toward making fences free in TSO
ICSA '13Although fences are designed for low-overhead concurrency coordination, they can be expensive in current machines. If fences were largely free, faster fine-grained concurrent algorithms could be devised, and compilers could guarantee Sequential ...
Efficient sequential consistency via conflict ordering
ASPLOS '12Although the sequential consistency (SC) model is the most intuitive, processor designers often choose to support relaxed memory consistency models for higher performance. This is because SC implementations that match the performance of relaxed memory ...
Comments