skip to main content
10.1145/2694344.2694388acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Asymmetric Memory Fences: Optimizing Both Performance and Implementability

Published: 14 March 2015 Publication History

Abstract

There have been several recent efforts to improve the performance of fences. The most aggressive designs allow post-fence accesses to retire and complete before the fence completes. Unfortunately, such designs present implementation difficulties due to their reliance on global state and structures.
This paper's goal is to optimize both the performance and the implementability of fences. We start-off with a design like the most aggressive ones but without the global state. We call it Weak Fence or wF. Since the concurrent execution of multiple wFs can deadlock, we combine wFs with a conventional fence (i.e., Strong Fence or sF) for the less performance-critical thread(s). We call the result an Asymmetric fence group. We also propose a taxonomy of Asymmetric fence groups under TSO. Compared to past aggressive fences, Asymmetric fence groups both are substantially easier to implement and have higher average performance. The two main designs presented (WS+ and W+) speed-up workloads under TSO by an average of 13% and 21%, respectively, over conventional fences.

References

[1]
Rochester Software Transactional Memory. http://www.cs.rochester.edu/research/synchronization/rstm/.
[2]
ARM. ARMv8-A Reference Manual, Issue A.d. http://infocenter.arm.com.
[3]
Colin Blundell, Milo M. K. Martin, and Thomas F. Wenisch. InvisiFence: Performance-Transparent Memory Ordering in Conventional Multiprocessors. In International Symposium on Computer Architecture, June 2009.
[4]
Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas. BulkSC: Bulk Enforcement of Sequential Consistency. In International Symposium on Computer Architecture, June 2007.
[5]
Dave Dice, Mark Moir, and William Scherer. Quickly Reacquirable Locks. Technical Report, Sun Microsystems Inc., 2003.
[6]
Dave Dice and Nir Shavit. TLRW: Return of the Read-write Lock. In Symposium on Parallelism in Algorithms and Architectures, June 2010.
[7]
Yuelu Duan, Xiaobing Feng, Lei Wang, Chao Zhang, and Pen-Chung Yew. Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs. In International Symposium on Code Generation and Optimization, March 2009.
[8]
Yuelu Duan, Abdullah Muzahid, and Josep Torrellas. WeeFence: Toward Making Fences Free in TSO. In International Symposium on Computer Architecture, June 2013.
[9]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Conference on Programming Language Design and Implementation, June 1998.
[10]
Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. Memory Consistency and Event Ordering in Scalable Shared-memory Multi-processors. In International Symposium on Computer Architecture, June 1990.
[11]
Chris Gniady, Babak Falsafi, and T. N. Vijaykumar. Is SC + ILP = RC? In International Symposium on Computer Architecture, June 1999.
[12]
Intel. Intel Itanium Architecture Software Developer's Manual, Revision 2.3. http://www.intel.com/design/itanium/manuals/iiasdmanual.htm, May 2010.
[13]
Intel Corp. IA-32 Intel Architecture Software Developer Manual, Volume 2: Instruction Set Reference. 2002.
[14]
Kiyokuni Kawachiya, Akira Koseki, and Tamiya Onodera. Lock Reservation: Java Locks Can Mostly Do without Atomic Operations. In Conference on Object-Oriented Programming, Systems, Language, and Applications, November 2002.
[15]
Edya Ladan-Mozes, I-Ting Angelina Lee, and Dmitry Vyukov. Location-Based Memory Fences. In Symposium on Parallelism in Algorithms and Architectures, June 2011.
[16]
L. Lamport. A New Solution of Dijkstra's Concurrent Programming Problem. Communications of the ACM, August 1974.
[17]
L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, July 1979.
[18]
Jaejin Lee and D.A. Padua. Hiding Relaxed Memory Consistency with Compilers. In International Conference on Parallel Architectures and Compilation Techniques, October 2000.
[19]
C. Lin, V. Nagarajan, and R. Gupta. Address-aware Fences. In International Conference on Supercomputing, June 2013.
[20]
Changhui Lin, Vijay Nagarajan, and Rajiv Gupta. Efficient Sequential Consistency using Conditional Fences. In International Conference on Parallel Architectures and Compilation Techniques, September 2010.
[21]
Changhui Lin, Vijay Nagarajan, Rajiv Gupta, and Bharghava Rajaram. Efficient Sequential Consistency via Conflict Ordering. In International Conference on Architectural Support for Programming Languages and Operating Systems, March 2012.
[22]
Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. A Case for an SC-preserving Compiler. In Conference on Programming Language Design and Implementation, June 2011.
[23]
Chi Cao Minh, Jaewoong Chung, Christos Kozyrakis, and Kunle Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In International Symposium on Work- load Characterization, September 2008.
[24]
Abdullah Muzahid, Shanxiang Qi, and Josep Torrellas. Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically. In International Symposium on Microarchitecture, December 2012.
[25]
Xuehai Qian, Benjamin Sahelices, Josep Torrellas, and Depei Qian. Volition: Scalable and Precise Sequential Consistency Violation Detection. In International Conference on Architectural Support for Programming Languages and Operating Systems, March 2013.
[26]
Parthasarathy Ranganathan, Vijay S. Pai, and Sarita V. Adve. Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap Between Memory Consistency Models. In Symposium on Parallelism in Algorithms and Architectures, June 1997.
[27]
James Reinders. Intel Threading Building Blocks. O'Reilly & Associates, Inc., 2007.
[28]
Douglas C. Schmidt and Tim Harrison. Double-Checked Locking: An Optimization Pattern for Efficiently Initializing and Accessing Thread-Safe Objects. In Conference on Pattern Languages of Programming, 1996.
[29]
D. Shasha and M. Snir. Efficient and Correct Execution of Parallel Programs that Share Memory. ACM Transactions on Programming Languages and Systems, April 1988.
[30]
Nir Shavit and Dan Touitou. Software Transactional Memory. In Symposium on Principles of Distributed Computing, August 1995.
[31]
Abhayendra Singh, Satish Narayanasamy, Daniel Marino, Todd D. Millstein, and Madanlal Musuvathi. End-to-End Sequential Consistency. In International Symposium on Computer Architecture, June 2012.
[32]
SPARC International, Inc. The SPARC Architecture Manual (Version 9). 1994.
[33]
Zehra Sura, Xing Fang, Chi-Leung Wong, Samuel P. Midkiff, Jaejin Lee, and David Padua. Compiler Techniques for High Performance Sequentially Consistent Java Programs. In Symposium on Principles and Practice of Parallel Programming, June 2005.

Cited By

View all
  • (2022)Total Store Order and the x86 Memory ModelA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01764-3_4(39-53)Online publication date: 28-Mar-2022
  • (2021)Execution Dependence Extension (EDE): ISA Support for Eliminating Fences2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00043(456-469)Online publication date: Jun-2021
  • (2020)A Primer on Memory Consistency and Cache Coherence, Second EditionSynthesis Lectures on Computer Architecture10.2200/S00962ED2V01Y201910CAC04915:1(1-294)Online publication date: 4-Feb-2020
  • Show More Cited By

Index Terms

  1. Asymmetric Memory Fences: Optimizing Both Performance and Implementability

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems
    March 2015
    720 pages
    ISBN:9781450328357
    DOI:10.1145/2694344
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 March 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. fences
    2. parallel programming
    3. sequential consistency
    4. shared-memory machines
    5. synchronization

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ASPLOS '15

    Acceptance Rates

    ASPLOS '15 Paper Acceptance Rate 48 of 287 submissions, 17%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Total Store Order and the x86 Memory ModelA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01764-3_4(39-53)Online publication date: 28-Mar-2022
    • (2021)Execution Dependence Extension (EDE): ISA Support for Eliminating Fences2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA52012.2021.00043(456-469)Online publication date: Jun-2021
    • (2020)A Primer on Memory Consistency and Cache Coherence, Second EditionSynthesis Lectures on Computer Architecture10.2200/S00962ED2V01Y201910CAC04915:1(1-294)Online publication date: 4-Feb-2020
    • (2018)Constructing a weak memory modelProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00021(124-137)Online publication date: 2-Jun-2018
    • (2017)Fence-Free Synchronization with Dynamically Serialized Synchronization VariablesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.263335328:12(3486-3500)Online publication date: 1-Dec-2017
    • (2021)Execution dependence extension (EDE)Proceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00043(456-469)Online publication date: 14-Jun-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media