research-article

Location-based memory fences

Authors:

Edya Ladan-Mozes,

I-Ting Angelina Lee,

Dmitry VyukovAuthors Info & Claims

SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures

Pages 75 - 84

https://doi.org/10.1145/1989493.1989503

Published: 04 June 2011 Publication History

Abstract

Traditional memory fences are program-counter (PC) based. That is, a memory fence enforces a serialization point in the program instruction stream --- it ensures that all memory references before the fence in the program order have taken effect before the execution continues onto instructions after the fence. Such PC-based memory fences always cause the processor to stall, even when the synchronization is unnecessary during a particular execution. We propose the concept of location-based memory fences, which aim to reduce the cost of synchronization due to the latency of memory fence execution in parallel algorithms.

Unlike a PC-based memory fence, a location-based memory fence serializes the instruction stream of the executing thread T₁ only when a different thread T₂ attempts to read the memory location which is guarded by the location-based memory fence. In this work, we describe a hardware mechanism for location-based memory fences, prove its correctness, and evaluate its potential performance benefit. Our experimental results are based on a software simulation of the proposed location-based memory fence, and thus expected to incur higher overhead than the proposed hardware mechanism would. Nevertheless, our software experiments show that applications can benefit from using location-based memory fences, but they do not scale as well in some cases, due to the software overhead. These results suggest that a hardware support for location-based memory fences is worth considering.

References

[1]

Advanced Micro Devices. AMD64 Architecture Programmer's Manual Volume 2: System Programming, June 2010.

[2]

Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA '98, pages 119--129, June 1998.

Digital Library

[3]

Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: An efficient multithreaded runtime system. In PPoPP '05, pages 207--216, July 1995.

Digital Library

[4]

Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. JACM, 46(5):720--748, September 1999.

Digital Library

[5]

Robert D. Blumofe and Dionisios Papadopoulos. Hood: A user-level threads library for multiprogrammed multiprocessors. Technical Report, University of Texas at Austin, 1999.

[6]

Dave Dice, Hui Huang, and Mingyao Yang. Asymmetric dekker synchronization. Technical report, Sun Microsystems Inc., July 2001.

[7]

Dave Dice, Mark Moir, and William Scherer III. Quickly reacquirable locks. Technical report, Sun Microsystems Inc., 2003.

[8]

David Dice. David Dice's Weblog: http://blogs.sun.com/dave/entry/biased_locking_in_hotspot#comments, 2006.

[9]

E. W. Dijkstra. Solution of a problem in concurrent programming control. Communications of the ACM, 8(9):569, September 1965.

Digital Library

[10]

E. W. Dijkstra. Co-operating sequential processes. In Programming Languages. 1968.

[11]

Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI '98, pages 212--223, 1998.

Digital Library

[12]

Robert H. Halstead, Jr. Multilisp: A language for concurrent symbolic computation. ACM TOPLAS, 7(4):501--538, October 1985.

Digital Library

[13]

John L. Hennessy and David A. Patterson. Computer Architecture: a Quantitative Approach. Morgan Kaufmann, San Francisco, CA, fourth edition, 2007.

Digital Library

[14]

Intel Corporation. A Formal Specification of Intel Itanium Processor Family Memory Ordering, October 2011.

[15]

Intel Corporation. Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1, January 2011.

[16]

Kiyokuni Kawachiya, Akira Koseki, and Tamiya Onodera. Lock reservation: Java locks can mostly do without atomic operations. In OOPSLA '02, pages 130--141, 2002.

Digital Library

[17]

David A. Kranz, Robert H. Halstead, Jr., and Eric Mohr. Mul-T: A high-performance parallel Lisp. In PLDI '89, pages 81--90, June 1989.

Digital Library

[18]

Leslie Lamport. A new solution of dijkstra's concurrent programming problem. Communications of the ACM, 17(8):453--455, 1974.

Digital Library

[19]

Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690--691, September 1979.

Digital Library

[20]

Changhui Lin, Vijay Nagarajan, and Rajiv Gupta. Efficient sequential consistency using conditional fences. In PACT '10, pages 295--306, 2010.

Digital Library

[21]

Tamiya Onodera, Kiyokuni Kawachiya, and Akira Koseki. Lock reservation for java reconsidered. In ECOOP '04, pages 559--583, 2004.

[22]

G. L. Peterson. Myths about the mutual exclusion problem. Information Processing Letters, 12(3):115--116, June 1981.

[23]

Nalini Vasudevan, Kedar S. Namjoshi, and Stephen A. Edwards. Simple and fast biased locks. In PACT '10, pages 65--74, 2010.

Digital Library

[24]

David L. Weaver and Tom Germond, editors. The SPARC Architecture Manual, Version 9. PTR Prentice Hall, 1994.

Digital Library

Cited By

Wang MTa TCheng LBatten C(2020)Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00025(173-186)Online publication date: May-2020
https://doi.org/10.1109/ISCA45697.2020.00025
Hong YZheng YGuan HZang BChen H(2017)Fence-Free Synchronization with Dynamically Serialized Synchronization VariablesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.263335328:12(3486-3500)Online publication date: 1-Dec-2017
https://doi.org/10.1109/TPDS.2016.2633353
Balmau OGuerraoui RHerlihy MZablotchi IScheideler CGilbert S(2016)Fast and Robust Memory Reclamation for Concurrent Data StructuresProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935790(349-359)Online publication date: 11-Jul-2016
https://dl.acm.org/doi/10.1145/2935764.2935790
Show More Cited By

Recommendations

Temporally Bounding TSO for Fence-Free Asymmetric Synchronization
ASPLOS'15

This paper introduces a temporally bounded total store ordering (TBTSO) memory model, and shows that it enables nonblocking fence-free solutions to asymmetric synchronization problems, such as those arising in memory reclamation and biased locking.

...
Temporally Bounding TSO for Fence-Free Asymmetric Synchronization
ASPLOS '15

This paper introduces a temporally bounded total store ordering (TBTSO) memory model, and shows that it enables nonblocking fence-free solutions to asymmetric synchronization problems, such as those arising in memory reclamation and biased locking.

...
Temporally Bounding TSO for Fence-Free Asymmetric Synchronization
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems

This paper introduces a temporally bounded total store ordering (TBTSO) memory model, and shows that it enables nonblocking fence-free solutions to asymmetric synchronization problems, such as those arising in memory reclamation and biased locking.

...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '11: Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures

June 2011

404 pages

ISBN:9781450307437

DOI:10.1145/1989493

Co-chairs:
Friedhelm Meyer auf der Heide
University of Paderborn, Germany
,
Rajmohan Rajaraman
Northeastern University, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SPAA '11

Sponsor:

SPAA '11: 23rd ACM Symposium on Parallelism in Algorithms and Architectures

June 4 - 6, 2011

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
177
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang MTa TCheng LBatten C(2020)Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA45697.2020.00025(173-186)Online publication date: May-2020
https://doi.org/10.1109/ISCA45697.2020.00025
Hong YZheng YGuan HZang BChen H(2017)Fence-Free Synchronization with Dynamically Serialized Synchronization VariablesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.263335328:12(3486-3500)Online publication date: 1-Dec-2017
https://doi.org/10.1109/TPDS.2016.2633353
Balmau OGuerraoui RHerlihy MZablotchi IScheideler CGilbert S(2016)Fast and Robust Memory Reclamation for Concurrent Data StructuresProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/2935764.2935790(349-359)Online publication date: 11-Jul-2016
https://dl.acm.org/doi/10.1145/2935764.2935790
Duan YHonarmand NTorrellas J(2015)Asymmetric Memory FencesACM SIGARCH Computer Architecture News10.1145/2786763.269438843:1(531-543)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2786763.2694388
Duan YHonarmand NTorrellas J(2015)Asymmetric Memory FencesACM SIGPLAN Notices10.1145/2775054.269438850:4(531-543)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2775054.2694388
Duan YHonarmand NTorrellas JOzturk OEbcioglu KDwarkadas S(2015)Asymmetric Memory FencesProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2694344.2694388(531-543)Online publication date: 14-Mar-2015
https://dl.acm.org/doi/10.1145/2694344.2694388
Lin CNagarajan VGupta RDamkroger TDongarra J(2014)Fence scopingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.14(105-116)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/SC.2014.14
Rajaram BNagarajan VSarkar SElver M(2013)Fast RMWs for TSOACM SIGPLAN Notices10.1145/2499370.246219648:6(61-72)Online publication date: 16-Jun-2013
https://dl.acm.org/doi/10.1145/2499370.2462196
Rajaram BNagarajan VSarkar SElver MBoehm HFlanagan C(2013)Fast RMWs for TSOProceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2491956.2462196(61-72)Online publication date: 16-Jun-2013
https://dl.acm.org/doi/10.1145/2491956.2462196
Lin CNagarajan VGupta RMalony ANemirovsky MMidkiff S(2013)Address-aware fencesProceedings of the 27th international ACM conference on International conference on supercomputing10.1145/2464996.2465015(313-324)Online publication date: 10-Jun-2013
https://dl.acm.org/doi/10.1145/2464996.2465015
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten