skip to main content
10.1145/1989493.1989503acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Location-based memory fences

Published: 04 June 2011 Publication History

Abstract

Traditional memory fences are program-counter (PC) based. That is, a memory fence enforces a serialization point in the program instruction stream --- it ensures that all memory references before the fence in the program order have taken effect before the execution continues onto instructions after the fence. Such PC-based memory fences always cause the processor to stall, even when the synchronization is unnecessary during a particular execution. We propose the concept of location-based memory fences, which aim to reduce the cost of synchronization due to the latency of memory fence execution in parallel algorithms.
Unlike a PC-based memory fence, a location-based memory fence serializes the instruction stream of the executing thread T1 only when a different thread T2 attempts to read the memory location which is guarded by the location-based memory fence. In this work, we describe a hardware mechanism for location-based memory fences, prove its correctness, and evaluate its potential performance benefit. Our experimental results are based on a software simulation of the proposed location-based memory fence, and thus expected to incur higher overhead than the proposed hardware mechanism would. Nevertheless, our software experiments show that applications can benefit from using location-based memory fences, but they do not scale as well in some cases, due to the software overhead. These results suggest that a hardware support for location-based memory fences is worth considering.

References

[1]
Advanced Micro Devices. AMD64 Architecture Programmer's Manual Volume 2: System Programming, June 2010.
[2]
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA '98, pages 119--129, June 1998.
[3]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. In PPoPP '05, pages 207--216, July 1995.
[4]
Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5):720--748, September 1999.
[5]
Robert D. Blumofe and Dionisios Papadopoulos. Hood: A user-level threads library for multiprogrammed multiprocessors. Technical Report, University of Texas at Austin, 1999.
[6]
Dave Dice, Hui Huang, and Mingyao Yang. Asymmetric dekker synchronization. Technical report, Sun Microsystems Inc., July 2001.
[7]
Dave Dice, Mark Moir, and William Scherer III. Quickly reacquirable locks. Technical report, Sun Microsystems Inc., 2003.
[8]
David Dice. David Dice's Weblog: http://blogs.sun.com/dave/entry/biased_locking_in_hotspot#comments, 2006.
[9]
E. W. Dijkstra. Solution of a problem in concurrent programming control. Communications of the ACM, 8(9):569, September 1965.
[10]
E. W. Dijkstra. Co-operating sequential processes. In Programming Languages. 1968.
[11]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI '98, pages 212--223, 1998.
[12]
Robert H. Halstead, Jr. Multilisp: A language for concurrent symbolic computation. ACM TOPLAS, 7(4):501--538, October 1985.
[13]
John L. Hennessy and David A. Patterson. Computer Architecture: a Quantitative Approach. Morgan Kaufmann, San Francisco, CA, fourth edition, 2007.
[14]
Intel Corporation. A Formal Specification of Intel Itanium Processor Family Memory Ordering, October 2011.
[15]
Intel Corporation. Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1, January 2011.
[16]
Kiyokuni Kawachiya, Akira Koseki, and Tamiya Onodera. Lock reservation: Java locks can mostly do without atomic operations. In OOPSLA '02, pages 130--141, 2002.
[17]
David A. Kranz, Robert H. Halstead, Jr., and Eric Mohr. Mul-T: A high-performance parallel Lisp. In PLDI '89, pages 81--90, June 1989.
[18]
Leslie Lamport. A new solution of dijkstra's concurrent programming problem. Communications of the ACM, 17(8):453--455, 1974.
[19]
Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690--691, September 1979.
[20]
Changhui Lin, Vijay Nagarajan, and Rajiv Gupta. Efficient sequential consistency using conditional fences. In PACT '10, pages 295--306, 2010.
[21]
Tamiya Onodera, Kiyokuni Kawachiya, and Akira Koseki. Lock reservation for java reconsidered. In ECOOP '04, pages 559--583, 2004.
[22]
G. L. Peterson. Myths about the mutual exclusion problem. Information Processing Letters, 12(3):115--116, June 1981.
[23]
Nalini Vasudevan, Kedar S. Namjoshi, and Stephen A. Edwards. Simple and fast biased locks. In PACT '10, pages 65--74, 2010.
[24]
David L. Weaver and Tom Germond, editors. The SPARC Architecture Manual, Version 9. PTR Prentice Hall, 1994.

Cited By

View all
  • (2020)Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00025(173-186)Online publication date: May-2020
  • (2017)Fence-Free Synchronization with Dynamically Serialized Synchronization VariablesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.263335328:12(3486-3500)Online publication date: 1-Dec-2017
  • (2016)Fast and Robust Memory Reclamation for Concurrent Data StructuresProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935790(349-359)Online publication date: 11-Jul-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
June 2011
404 pages
ISBN:9781450307437
DOI:10.1145/1989493
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. asymmetric synchronization
  2. biased locks
  3. location-based memory fences
  4. memory fences
  5. the dekker duality
  6. the dekker protocol

Qualifiers

  • Research-article

Conference

SPAA '11

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25
37th ACM Symposium on Parallelism in Algorithms and Architectures
July 28 - August 1, 2025
Portland , OR , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00025(173-186)Online publication date: May-2020
  • (2017)Fence-Free Synchronization with Dynamically Serialized Synchronization VariablesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.263335328:12(3486-3500)Online publication date: 1-Dec-2017
  • (2016)Fast and Robust Memory Reclamation for Concurrent Data StructuresProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935790(349-359)Online publication date: 11-Jul-2016
  • (2015)Asymmetric Memory FencesACM SIGARCH Computer Architecture News10.1145/2786763.269438843:1(531-543)Online publication date: 14-Mar-2015
  • (2015)Asymmetric Memory FencesACM SIGPLAN Notices10.1145/2775054.269438850:4(531-543)Online publication date: 14-Mar-2015
  • (2015)Asymmetric Memory FencesProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2694344.2694388(531-543)Online publication date: 14-Mar-2015
  • (2014)Fence scopingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.14(105-116)Online publication date: 16-Nov-2014
  • (2013)Fast RMWs for TSOACM SIGPLAN Notices10.1145/2499370.246219648:6(61-72)Online publication date: 16-Jun-2013
  • (2013)Fast RMWs for TSOProceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2491956.2462196(61-72)Online publication date: 16-Jun-2013
  • (2013)Address-aware fencesProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2465015(313-324)Online publication date: 10-Jun-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media