Article

Lightweight lock-free synchronization methods for multithreading

Authors:

Arun Kejariwal,

Utpal Banerjee,

Alexandru Nicolau,

Constantine D. PolychronopoulosAuthors Info & Claims

ICS '06: Proceedings of the 20th annual international conference on Supercomputing

Pages 361 - 371

https://doi.org/10.1145/1183401.1183452

Published: 28 June 2006 Publication History

Abstract

Emergence of chip multiprocessors has created a need for exploitation of beyond DOALL-type thread-level parallelism (TLP). This calls for development of efficient thread synchronization techniques to exploit TLP in general parallel programs with dependences. For this, several thread synchronization techniques have been proposed in the past. However, these limit the exploitation of fine-grain TLP due to large run-time overhead. Furthermore, the existing approaches can potentially result in (i) deadlocks between the different threads and (ii) non-deterministic run-time execution behavior as these techniques are oblivious of the underlying memory model. In this paper, we propose lightweight lock-free thread synchronization methods to exploit TLP in general parallel programs with dependences. Each synchronization method intrinsically guarantees the following in a multithreaded program: (a) sequential consistency, (b) atomicity of writes to the shared synchronization construct and (c) absence of deadlocks. This reduces the programming effort considerably, thereby easing the development of software for multithreaded systems. For each method we formally prove that there cannot occur a deadlock between the different threads. This obviates the cumbersome and time-consuming process of detecting and eliminating deadlocks from the programmer. Experiments show that our synchronization methods incur a minimal overhead of 7.16% on an average. Further, we achieve performance speedups upto 3.39x on kernels extracted from the industry standard SPEC OMPM 2001 benchmarks, on a dedicated Intel® Xeon® 2.78 GHz 4-way multiprocessor.

References

[1]

S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, C-36(12):1485--1495, December 1987.

Digital Library

[2]

J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-III), pages 64--75, Boston, MA, 1989.

Digital Library

[3]

G. Granunke and S. Thakkar. Synchronization algorithms for shared-memory multiprocessors. IEEE Computer, 23(6):60--69, 1990.

Digital Library

[4]

Z. Li. Compiler algorithms for event variable synchronization. In Proceedings of the 1991 ACM International Conference on Supercomputing, Cologne, Germany, June 1991.

Digital Library

[5]

A. Krishnamurthy and K. Yelick. Optimizing parallel programs with explicit synchronization. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, pages 196--204, La Jolla, CA, 1995.

Digital Library

[6]

A. Kagi. Mechanism for Efficient Shared-Memory Lock-based Synchronization. PhD thesis, Department of Computer Science, University of Wisconsin-Madison, 1999.

Digital Library

[7]

D. S. Nikolopoulos and T. S. Papatheodorou. Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives. In Proceedings of the 14th International Parallel and Distributed Processing Symposium, pages 711--720, Cancun, Mexico, 2000.

Digital Library

[8]

J. Martinez and J. Torrellas. Speculative synchronization: Programmability and performance for parallel codes. IEEE Micro, 23(6):126--134, December 2003.

Digital Library

[9]

D. F. Bacon, R. Konuru, C. Murthy, and M. J. Serrano. Thin locks: Featherweight synchronization for java. ACM SIGPLAN Notices, 39(4):583--595, 2004.

Digital Library

[10]

A. Aiken and D. Gay. Barrier inference. In Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 342--354, San Diego, CA, 1998.

Digital Library

[11]

L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690--691, 1979.

Digital Library

[12]

M. D. Hill. Multiprocessors should support simple memory-consistency models. IEEE Computer, 31(8):28--34, 1998.

Digital Library

[13]

S. Midkiff and D. Padua. Issues in the optimization of parallel programs. In Proceedings of the 1990 International Conference on Parallel Processing, pages 105--113, 1986.

[14]

V. Sarkar and B. Simons. Parallel program graphs and their classification. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.

Digital Library

[15]

SPEC OMP. http://www.spec.org/omp.

[16]

D. Kuck. The Structure of Computers and Computations, VOLUME 1. John Wiley and Sons, New York, NY, 1978.

Digital Library

[17]

R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, IL, August 1986.

[18]

T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990.

Digital Library

[19]

M. Girkar and C. D. Polychronopoulos. The hierarchical task graph as a univeral intermediate representation. International Journal of Parallel Programming, 22(5):519--551, 1994.

Digital Library

[20]

A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. John Wiley and Sons, 6 edition, 2001.

Digital Library

[21]

S. Owicki and L. Lamport. Proving liveness properties of concurrent programs. ACM Transactions on Programming Languages and Systems, 4(3):455--495, 1982.

Digital Library

[22]

Y.-S. Kwong. On Reductions and Livelocks in Asynchronous Parallel Computation. UMI Research Press, 1982.

[23]

J. Zahorjan, E. D. Lazowska, and D. L. Eager. Spinning versus blocking in parallel systems with uncertainty. In Proceedings of the International Seminar on Performance of Distributed and Parallel Systems, pages 455--472, December 1988.

[24]

A. R. Karlin, K. Li, M. S. Manasse, and S. Owicki. Empirical studies of competitve spinning for a shared-memory multiprocessor. In Proceedings of the Thirteenth ACM symposium on Operating systems principles, pages 41--55, Pacific Grove, CA, 1991.

Digital Library

[25]

L. Boguslavsky, K. Harzallah, A. Kreinen, K. Sevcik, and A. Vainshtein. Optimal strategies for spinning and blocking. Journal of Parallel and Distributed Computing, 21(2):246--254, 1994.

Digital Library

[26]

IA-32 Intel Architecture Software Developer's Manual, Volume 2B: Instruction Set Reference. ftp://download.intel.com/design/Pentium4/manuals/25366717.pdf.

[27]

AP-949 using spin-loops on Intel Pentium 4 and Intel Xeon processor, version 2.1. http://cache-www.intel.com/cd/00/00/01/76/17689_w_spinlock.pdf.

[28]

R. H. B. Netzer and B. P. Miller. What are race conditions?: Some issues and formalizations. ACM Letters on Programming Languages and Systems, 1(1):74--88, 1992.

Digital Library

[29]

C. Flanagan and S. Qadeer. Types for atomicity. In Proceedings of the 2003 ACM SIGPLAN International Workshop on Types in Languages Design and Implementation, pages 1--12, New Orleans, LA, 2003.

Digital Library

[30]

L. Wang and S. D. Stoller. Static analysis of atomicity for programs with non-blocking synchronization. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 61--71, Chicago, IL, 2005.

Digital Library

[31]

IA-32 Intel Architecture Software Developer's Manual, Volume 3: System Programming Guide. ftp://download.intel.com/design/Pentium4/manuals/25366817.pdf.

[32]

M. Herlihy. A methodology for implementing highly concurrent data strucutres. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 197--206, Seattle, WA, 1990.

Digital Library

[33]

S. V. Adve and M. D. Hill. Weak ordering -- a new definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 2--14, Seattle, WA, 1990.

Digital Library

[34]

K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15--26, Seattle, WA, 1990.

Digital Library

[35]

J. R. Goodman. Cache consistency and sequential consistency. Technical Report TR91-1006, Department of Computer Science, University of Illinois at Urbana-Champaign, February 1991.

[36]

OpenMP Specification, version 2.5. http://www.openmp.org/drupal/mp-documents/spec25.pdf.

[37]

X. Tian, M. Girkar, A. Bik, and H. Saito. Practical compiler techniques on efficient multithreaded code generation for OpenMP programs. The Computer Journal, 2005.

Digital Library

[38]

How to avoid false sharing. http://www.intel.com/cd/ids/developer/asmo-na/eng/43813.htm.

[39]

Intel® VTune#8482; Performance Analyzer 8.0 for Linux. http://www.intel.com/cd/software/products/asmo-na/eng/vtune/vlin/index.htm.

[40]

D.-K. Chen, H.-M. Su, and P.-C. Yew. The impact of synchronization and granularity on parallel systems. In Proceedings of the 17th International Symposium on Computer Architecture, pages 239--248, Seattle, WA, 1990.

Digital Library

[41]

D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Transactions on Programming Languages and Systems, 10(2):282--312, 1988.

Digital Library

[42]

A. Kamil and K. Yelick. Concurrency ananlysis for parallel programs with txtually aligned barriers. In Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing, Hawthorne, NY, October 2005.

Digital Library

[43]

J. Lee and D. A. Padua. Hiding relaxed memory consistency with a compiler. IEEE Transactions on Computers, 50(8):824--833, August 2001.

Digital Library

[44]

X. Fang, J. Lee, and S. P. Midkiff. Automatic fence insertion for shared memory multiprocessing. In Proceedings of the 17th Annual International Conference on Supercomputing, pages 285--294, San Francisco, CA, 2003.

Digital Library

[45]

B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proceedings of SPIE - Real-Time Signal Processing IV, pages 241--248, 1981.

[46]

S. L. Scott. Synchronization and communication in the T3E multiprocessor. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pages 26--36, Cambridge, MA, October 1996.

Digital Library

Cited By

Lin HBae HMidkiff SEigenmann RKim S(2013)A Study of the Usefulness of Producer/Consumer SynchronizationLanguages and Compilers for Parallel Computing10.1007/978-3-642-36036-7_10(141-155)Online publication date: 2013
https://doi.org/10.1007/978-3-642-36036-7_10
Midkiff S(2012)Automatic Parallelization: An Overview of Fundamental Compiler TechniquesSynthesis Lectures on Computer Architecture10.2200/S00340ED1V01Y201201CAC0197:1(1-169)Online publication date: 28-Jan-2012
https://doi.org/10.2200/S00340ED1V01Y201201CAC019
Ributzka JHayashi YManzano JGao GLowenthal Dde Supinski BMcKee S(2011)The elephant and the miceProceedings of the international conference on Supercomputing10.1145/1995896.1995948(338-347)Online publication date: 31-May-2011
https://dl.acm.org/doi/10.1145/1995896.1995948
Show More Cited By

Index Terms

Recommendations

Speculation-based techniques for transactional lock-free execution of lock-based programs
Transactional lock-free execution of lock-based programs
Special Issue: Proceedings of the 10th annual conference on Architectural Support for Programming Languages and Operating Systems

This paper is motivated by the difficulty in writing correct high-performance programs. Writing shared-memory multi-threaded programs imposes a complex trade-off between programming ease and performance, largely due to subtleties in coordinating access ...
Transactional lock-free execution of lock-based programs

This paper is motivated by the difficulty in writing correct high-performance programs. Writing shared-memory multi-threaded programs imposes a complex trade-off between programming ease and performance, largely due to subtleties in coordinating access ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '06: Proceedings of the 20th annual international conference on Supercomputing

June 2006

385 pages

ISBN:1595932828

DOI:10.1145/1183401

General Chairs:
Greg Egan
Monash University
,
Yoichi Muraoka
Waseda University

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICS06

Sponsor:

ICS06: International Conference on Supercomputing 2006

June 28 - July 1, 2006

Queensland, Cairns, Australia

Acceptance Rates

ICS '06 Paper Acceptance Rate 37 of 141 submissions, 26%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
851
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lin HBae HMidkiff SEigenmann RKim S(2013)A Study of the Usefulness of Producer/Consumer SynchronizationLanguages and Compilers for Parallel Computing10.1007/978-3-642-36036-7_10(141-155)Online publication date: 2013
https://doi.org/10.1007/978-3-642-36036-7_10
Midkiff S(2012)Automatic Parallelization: An Overview of Fundamental Compiler TechniquesSynthesis Lectures on Computer Architecture10.2200/S00340ED1V01Y201201CAC0197:1(1-169)Online publication date: 28-Jan-2012
https://doi.org/10.2200/S00340ED1V01Y201201CAC019
Ributzka JHayashi YManzano JGao GLowenthal Dde Supinski BMcKee S(2011)The elephant and the miceProceedings of the international conference on Supercomputing10.1145/1995896.1995948(338-347)Online publication date: 31-May-2011
https://dl.acm.org/doi/10.1145/1995896.1995948
Nicolau ALi GKejariwal A(2009)Techniques for efficient placement of synchronization primitivesACM SIGPLAN Notices10.1145/1594835.150420744:4(199-208)Online publication date: 14-Feb-2009
https://dl.acm.org/doi/10.1145/1594835.1504207
Nicolau ALi GVeidenbaum AKejariwal AGschwind MNicolau ASalapura VMoreira J(2009)Synchronization optimizations for efficient execution on multi-coresProceedings of the 23rd international conference on Supercomputing10.1145/1542275.1542303(169-180)Online publication date: 8-Jun-2009
https://dl.acm.org/doi/10.1145/1542275.1542303
Nicolau ALi GKejariwal AReed DSarkar V(2009)Techniques for efficient placement of synchronization primitivesProceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/1504176.1504207(199-208)Online publication date: 14-Feb-2009
https://dl.acm.org/doi/10.1145/1504176.1504207
Wu GWang MDou YXia F(2009)Exploiting Fine-Grained Pipeline Parallelism for Wavefront Computations on Multicore PlatformsProceedings of the 2009 International Conference on Parallel Processing Workshops10.1109/ICPPW.2009.15(402-408)Online publication date: 22-Sep-2009
https://dl.acm.org/doi/10.1109/ICPPW.2009.15
McCool M(2008)Scalable Programming Models for Massively Multicore ProcessorsProceedings of the IEEE10.1109/JPROC.2008.91773196:5(816-831)Online publication date: May-2008
https://doi.org/10.1109/JPROC.2008.917731
Zhu WSreedhar VHu ZGao G(2007)Synchronization state bufferACM SIGARCH Computer Architecture News10.1145/1273440.125066835:2(35-45)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1273440.1250668
Zhu WSreedhar VHu ZGao GTullsen DCalder B(2007)Synchronization state bufferProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250668(35-45)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1250662.1250668

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten