skip to main content
10.1145/1854273.1854312acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Efficient sequential consistency using conditional fences

Authors Info & Claims
Published:11 September 2010Publication History

ABSTRACT

Among the various memory consistency models, the sequential consistency (SC) model, in which memory operations appear to take place in the order specified by the program, is the most intuitive and enables programmers to reason about their parallel programs the best. Nevertheless, processor designers often choose to support relaxed memory consistency models because the weaker ordering constraints imposed by such models allow for more instructions to be reordered and enable higher performance. Programs running on machines supporting weaker consistency models, can be transformed into ones in which SC is enforced. The compiler does this by computing a minimal set of memory access pairs whose ordering automatically guarantees SC. To ensure that these memory access pairs are not reordered, memory fences are inserted. Unfortunately, insertion of such memory fences can significantly slowdown the program.

We observe that the ordering of the minimal set of memory accesses that the compiler strives to enforce, is typically already enforced in the normal course of program execution. A study we conducted on programs with compiler inserted memory fences shows that only 8% of the executed instances of the memory fences are really necessary to ensure SC. Motivated by this study we propose the conditional fence mechanism (C-Fence) that utilizes compiler information to decide dynamically if there is a need to stall at each fence. Our experiments with SPLASH-2 benchmarks show that, with C-Fences, programs can be transformed to enforce SC incurring only 12% slowdown, as opposed to 43% slowdown using normal fence instructions. Our approach requires very little hardware support (<300 bytes of on-chip-storage) and it avoids the use of speculation and its associated costs.

References

  1. }}S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29:66--76, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}W. Ahn, S. Qi, M. Nicolaides, J. Torrellas, J.-W. Lee, X. Fang, S. Midkiff, and D. Wong. BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support. In Proceedings of MICRO-42, pages 133--144, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}C. Blundell, M. M. Martin, and T. F. Wenisch. Invisifence: performance-transparent memory ordering in conventional multiprocessors. In Proceedings of ISCA-36, pages 233--244, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk enforcement of sequential consistency. In Proceedings of ISCA-34, pages 278--289, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}H. Chafi, J. Casper, B. D. Carlstrom, A. McDonald, C. C. Minh, W. Baek, C. Kozyrakis, and K. Olukotun. A scalable, non-blocking approach to transactional memory. In HPCA-13, pages 97--108, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}W.-Y. Chen, A. Krishnamurthy, and K. Yelick. Polynomial-time algorithms for enforcing sequential consistency in SPMD programs with arrays. In LCPC, pages 2--4. Springer-Verlag, 2003.Google ScholarGoogle Scholar
  7. }}E. W. Dijkstra. Cooperating sequential processes. The origin of concurrent programming: from semaphores to remote procedure calls, pages 65--138, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}Y. Duan, X. Feng, L. Wang, C. Zhang, and P.-C. Yew. Detecting and eliminating potential violations of sequential consistency for concurrent C/C++ programs. In CGO '09: Proceedings of the 2009 International Symposium on Code Generation and Optimization, pages 25--34, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}X. Fang, J. Lee, and S. P. Midkiff. Automatic fence insertion for shared memory multiprocessing. In ICS '03: Proceedings of the 17th annual international conference on Supercomputing, pages 285--294, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}K. Gharachorloo, A. Gupta, and J. Hennessy. Two techniques to enhance the performance of memory consistency models. In Proceedings of the 1991 International Conference on Parallel Processing, pages 355--364, 1991.Google ScholarGoogle Scholar
  11. }}C. Gniady and B. Falsafi. Speculative sequential consistency with little custom storage. In PACT '02: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, pages 179--188, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC + ILP = RC? In Proceedings of ISCA-26, pages 162--171, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. SIGARCH Comput. Archit. News, 32(2):102, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}A. Kamil, J. Su, and K. Yelick. Making sequential consistency practical in titanium. In SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 15, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}A. Krishnamurthy and K. Yelick. Optimizing parallel programs with explicit synchronization. In Proceedings of the ACM SIGPLAN '95 Conference on Programming Language Design and Implementation, pages 196--204, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}A. Krishnamurthy and K. Yelick. Analyses and optimizations for shared address space programs. Journal of Parallel and Distributed Computing, 38, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess progranm. IEEE Trans. Comput., 28(9):690--691, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}J. Lee and D. A. Padua. Hiding relaxed memory consistency with a compiler. IEEE Trans. Comput., 50(8):824--833, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}K. Lee, X. Fang, and S. P. Midkiff. Practical escape analyses: how good are they? In VEE '07: Proceedings of the 3rd international conference on Virtual execution environments, pages 180--190, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}S. P. Midkiff. Dependence analysis in parallel loops with i±k subscripts. In LCPC, pages 331--345, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}S. P. Midkiff and D. A. Padua. Issues in the optimization of parallel programs. In Proceedings of the 1990 International Conference on Parallel Processing, Volume 2: Software, pages 105--113, Urbana-Champaign, IL, USA, 1990.Google ScholarGoogle Scholar
  22. }}P. Ranganathan, V. Pai, and S. Adve. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures, page pages, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator, January 2005. http://sesc.sourceforge.net.Google ScholarGoogle Scholar
  24. }}D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282--312, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. }}Z. Sura, X. Fang, C.-L. Wong, S. P. Midkiff, J. Lee, and D. Padua. Compiler techniques for high performance sequentially consistent Java programs. In PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 2--13, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. }}T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for store-wait-free multiprocessors. In Proceedings of ISCA-34, pages 266--277, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. }}S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In Proceedings of ISCA-22, pages 24--36, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient sequential consistency using conditional fences

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
          September 2010
          596 pages
          ISBN:9781450301787
          DOI:10.1145/1854273

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 September 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate121of471submissions,26%

          Upcoming Conference

          PACT '24
          International Conference on Parallel Architectures and Compilation Techniques
          October 14 - 16, 2024
          Southern California , CA , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader