skip to main content
10.1145/1183401.1183452acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Lightweight lock-free synchronization methods for multithreading

Published: 28 June 2006 Publication History

Abstract

Emergence of chip multiprocessors has created a need for exploitation of beyond DOALL-type thread-level parallelism (TLP). This calls for development of efficient thread synchronization techniques to exploit TLP in general parallel programs with dependences. For this, several thread synchronization techniques have been proposed in the past. However, these limit the exploitation of fine-grain TLP due to large run-time overhead. Furthermore, the existing approaches can potentially result in (i) deadlocks between the different threads and (ii) non-deterministic run-time execution behavior as these techniques are oblivious of the underlying memory model. In this paper, we propose lightweight lock-free thread synchronization methods to exploit TLP in general parallel programs with dependences. Each synchronization method intrinsically guarantees the following in a multithreaded program: (a) sequential consistency, (b) atomicity of writes to the shared synchronization construct and (c) absence of deadlocks. This reduces the programming effort considerably, thereby easing the development of software for multithreaded systems. For each method we formally prove that there cannot occur a deadlock between the different threads. This obviates the cumbersome and time-consuming process of detecting and eliminating deadlocks from the programmer. Experiments show that our synchronization methods incur a minimal overhead of 7.16% on an average. Further, we achieve performance speedups upto 3.39x on kernels extracted from the industry standard SPEC OMPM 2001 benchmarks, on a dedicated Intel® Xeon® 2.78 GHz 4-way multiprocessor.

References

[1]
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, C-36(12):1485--1495, December 1987.
[2]
J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-III), pages 64--75, Boston, MA, 1989.
[3]
G. Granunke and S. Thakkar. Synchronization algorithms for shared-memory multiprocessors. IEEE Computer, 23(6):60--69, 1990.
[4]
Z. Li. Compiler algorithms for event variable synchronization. In Proceedings of the 1991 ACM International Conference on Supercomputing, Cologne, Germany, June 1991.
[5]
A. Krishnamurthy and K. Yelick. Optimizing parallel programs with explicit synchronization. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, pages 196--204, La Jolla, CA, 1995.
[6]
A. Kagi. Mechanism for Efficient Shared-Memory Lock-based Synchronization. PhD thesis, Department of Computer Science, University of Wisconsin-Madison, 1999.
[7]
D. S. Nikolopoulos and T. S. Papatheodorou. Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives. In Proceedings of the 14th International Parallel and Distributed Processing Symposium, pages 711--720, Cancun, Mexico, 2000.
[8]
J. Martinez and J. Torrellas. Speculative synchronization: Programmability and performance for parallel codes. IEEE Micro, 23(6):126--134, December 2003.
[9]
D. F. Bacon, R. Konuru, C. Murthy, and M. J. Serrano. Thin locks: Featherweight synchronization for java. ACM SIGPLAN Notices, 39(4):583--595, 2004.
[10]
A. Aiken and D. Gay. Barrier inference. In Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 342--354, San Diego, CA, 1998.
[11]
L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690--691, 1979.
[12]
M. D. Hill. Multiprocessors should support simple memory-consistency models. IEEE Computer, 31(8):28--34, 1998.
[13]
S. Midkiff and D. Padua. Issues in the optimization of parallel programs. In Proceedings of the 1990 International Conference on Parallel Processing, pages 105--113, 1986.
[14]
V. Sarkar and B. Simons. Parallel program graphs and their classification. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
[15]
SPEC OMP. http://www.spec.org/omp.
[16]
D. Kuck. The Structure of Computers and Computations, VOLUME 1. John Wiley and Sons, New York, NY, 1978.
[17]
R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, IL, August 1986.
[18]
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. The MIT Press, Cambridge, MA, 1990.
[19]
M. Girkar and C. D. Polychronopoulos. The hierarchical task graph as a univeral intermediate representation. International Journal of Parallel Programming, 22(5):519--551, 1994.
[20]
A. Silberschatz, P. B. Galvin, and G. Gagne. Operating System Concepts. John Wiley and Sons, 6 edition, 2001.
[21]
S. Owicki and L. Lamport. Proving liveness properties of concurrent programs. ACM Transactions on Programming Languages and Systems, 4(3):455--495, 1982.
[22]
Y.-S. Kwong. On Reductions and Livelocks in Asynchronous Parallel Computation. UMI Research Press, 1982.
[23]
J. Zahorjan, E. D. Lazowska, and D. L. Eager. Spinning versus blocking in parallel systems with uncertainty. In Proceedings of the International Seminar on Performance of Distributed and Parallel Systems, pages 455--472, December 1988.
[24]
A. R. Karlin, K. Li, M. S. Manasse, and S. Owicki. Empirical studies of competitve spinning for a shared-memory multiprocessor. In Proceedings of the Thirteenth ACM symposium on Operating systems principles, pages 41--55, Pacific Grove, CA, 1991.
[25]
L. Boguslavsky, K. Harzallah, A. Kreinen, K. Sevcik, and A. Vainshtein. Optimal strategies for spinning and blocking. Journal of Parallel and Distributed Computing, 21(2):246--254, 1994.
[26]
IA-32 Intel Architecture Software Developer's Manual, Volume 2B: Instruction Set Reference. ftp://download.intel.com/design/Pentium4/manuals/25366717.pdf.
[27]
AP-949 using spin-loops on Intel Pentium 4 and Intel Xeon processor, version 2.1. http://cache-www.intel.com/cd/00/00/01/76/17689_w_spinlock.pdf.
[28]
R. H. B. Netzer and B. P. Miller. What are race conditions?: Some issues and formalizations. ACM Letters on Programming Languages and Systems, 1(1):74--88, 1992.
[29]
C. Flanagan and S. Qadeer. Types for atomicity. In Proceedings of the 2003 ACM SIGPLAN International Workshop on Types in Languages Design and Implementation, pages 1--12, New Orleans, LA, 2003.
[30]
L. Wang and S. D. Stoller. Static analysis of atomicity for programs with non-blocking synchronization. In Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 61--71, Chicago, IL, 2005.
[31]
IA-32 Intel Architecture Software Developer's Manual, Volume 3: System Programming Guide. ftp://download.intel.com/design/Pentium4/manuals/25366817.pdf.
[32]
M. Herlihy. A methodology for implementing highly concurrent data strucutres. In Proceedings of the Second ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 197--206, Seattle, WA, 1990.
[33]
S. V. Adve and M. D. Hill. Weak ordering -- a new definition. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 2--14, Seattle, WA, 1990.
[34]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15--26, Seattle, WA, 1990.
[35]
J. R. Goodman. Cache consistency and sequential consistency. Technical Report TR91-1006, Department of Computer Science, University of Illinois at Urbana-Champaign, February 1991.
[36]
OpenMP Specification, version 2.5. http://www.openmp.org/drupal/mp-documents/spec25.pdf.
[37]
X. Tian, M. Girkar, A. Bik, and H. Saito. Practical compiler techniques on efficient multithreaded code generation for OpenMP programs. The Computer Journal, 2005.
[38]
How to avoid false sharing. http://www.intel.com/cd/ids/developer/asmo-na/eng/43813.htm.
[39]
Intel® VTune#8482; Performance Analyzer 8.0 for Linux. http://www.intel.com/cd/software/products/asmo-na/eng/vtune/vlin/index.htm.
[40]
D.-K. Chen, H.-M. Su, and P.-C. Yew. The impact of synchronization and granularity on parallel systems. In Proceedings of the 17th International Symposium on Computer Architecture, pages 239--248, Seattle, WA, 1990.
[41]
D. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Transactions on Programming Languages and Systems, 10(2):282--312, 1988.
[42]
A. Kamil and K. Yelick. Concurrency ananlysis for parallel programs with txtually aligned barriers. In Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing, Hawthorne, NY, October 2005.
[43]
J. Lee and D. A. Padua. Hiding relaxed memory consistency with a compiler. IEEE Transactions on Computers, 50(8):824--833, August 2001.
[44]
X. Fang, J. Lee, and S. P. Midkiff. Automatic fence insertion for shared memory multiprocessing. In Proceedings of the 17th Annual International Conference on Supercomputing, pages 285--294, San Francisco, CA, 2003.
[45]
B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proceedings of SPIE - Real-Time Signal Processing IV, pages 241--248, 1981.
[46]
S. L. Scott. Synchronization and communication in the T3E multiprocessor. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pages 26--36, Cambridge, MA, October 1996.

Cited By

View all
  • (2013)A Study of the Usefulness of Producer/Consumer SynchronizationLanguages and Compilers for Parallel Computing10.1007/978-3-642-36036-7_10(141-155)Online publication date: 2013
  • (2012)Automatic Parallelization: An Overview of Fundamental Compiler TechniquesSynthesis Lectures on Computer Architecture10.2200/S00340ED1V01Y201201CAC0197:1(1-169)Online publication date: 28-Jan-2012
  • (2011)The elephant and the miceProceedings of the international conference on Supercomputing10.1145/1995896.1995948(338-347)Online publication date: 31-May-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '06: Proceedings of the 20th annual international conference on Supercomputing
June 2006
385 pages
ISBN:1595932828
DOI:10.1145/1183401
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ICS06
Sponsor:
ICS06: International Conference on Supercomputing 2006
June 28 - July 1, 2006
Queensland, Cairns, Australia

Acceptance Rates

ICS '06 Paper Acceptance Rate 37 of 141 submissions, 26%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2013)A Study of the Usefulness of Producer/Consumer SynchronizationLanguages and Compilers for Parallel Computing10.1007/978-3-642-36036-7_10(141-155)Online publication date: 2013
  • (2012)Automatic Parallelization: An Overview of Fundamental Compiler TechniquesSynthesis Lectures on Computer Architecture10.2200/S00340ED1V01Y201201CAC0197:1(1-169)Online publication date: 28-Jan-2012
  • (2011)The elephant and the miceProceedings of the international conference on Supercomputing10.1145/1995896.1995948(338-347)Online publication date: 31-May-2011
  • (2009)Techniques for efficient placement of synchronization primitivesACM SIGPLAN Notices10.1145/1594835.150420744:4(199-208)Online publication date: 14-Feb-2009
  • (2009)Synchronization optimizations for efficient execution on multi-coresProceedings of the 23rd international conference on Supercomputing10.1145/1542275.1542303(169-180)Online publication date: 8-Jun-2009
  • (2009)Techniques for efficient placement of synchronization primitivesProceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming10.1145/1504176.1504207(199-208)Online publication date: 14-Feb-2009
  • (2009)Exploiting Fine-Grained Pipeline Parallelism for Wavefront Computations on Multicore PlatformsProceedings of the 2009 International Conference on Parallel Processing Workshops10.1109/ICPPW.2009.15(402-408)Online publication date: 22-Sep-2009
  • (2008)Scalable Programming Models for Massively Multicore ProcessorsProceedings of the IEEE10.1109/JPROC.2008.91773196:5(816-831)Online publication date: May-2008
  • (2007)Synchronization state bufferACM SIGARCH Computer Architecture News10.1145/1273440.125066835:2(35-45)Online publication date: 9-Jun-2007
  • (2007)Synchronization state bufferProceedings of the 34th annual international symposium on Computer architecture10.1145/1250662.1250668(35-45)Online publication date: 9-Jun-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media