Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++

Zakian, Christopher S.; Zakian, Timothy A. K.; Kulkarni, Abhishek; Chamith, Buddhika; Newton, Ryan R.

doi:10.1007/978-3-319-29778-1_5

Christopher S. Zakian¹⁶,
Timothy A. K. Zakian¹⁶,
Abhishek Kulkarni¹⁶,
Buddhika Chamith¹⁶ &
…
Ryan R. Newton¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9519))

Included in the following conference series:

Languages and Compilers for Parallel Computing

796 Accesses
5 Citations

Abstract

Library and language support for scheduling non-blocking tasks has greatly improved, as have lightweight (user) threading packages. However, there is a significant gap between the two developments. In previous work—and in today’s software packages—lightweight thread creation incurs much larger overheads than tasking libraries, even on tasks that end up never blocking. This limitation can be removed. To that end, we describe an extension to the Intel Cilk Plus runtime system, Concurrent Cilk, where tasks are lazily promoted to threads. Concurrent Cilk removes the overhead of thread creation on threads which end up calling no blocking operations, and is the first system to do so for C/C++ with legacy support (standard calling conventions and stack representations). We demonstrate that Concurrent Cilk adds negligible overhead to existing Cilk programs, while its promoted threads remain more efficient than OS threads in terms of context-switch overhead and blocking communication. Further, it enables development of blocking data structures that create non-fork-join dependence graphs—which can expose more parallelism, and better supports data-driven computations waiting on results from remote devices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The OCR-Vx experience: lessons learned from designing and implementing a task-based runtime system

Article Open access 02 March 2022

Concurrent Programming

Enhancing OpenMP Tasking Model: Performance and Portability

Notes

1.
This handle is similar to a [parallel] one-shot continuation. Continuations are well studied control constructs [9, 17] and known to be sufficient to build cooperative threading (coroutines) [9] as well as blocking data structures that enable, for example, stream-processing with back-pressure.
2.
Cilk is a work first system, which means that the thread that executes spawn f will begin executing f immediately; it is the continuation of spawn that is exposed for stealing.
3.
Ray tracing follows an imaginary line from each pixel in the image into the scene to see what objects are encountered, rather than starting with the objects and drawing (rasterizing) them onto the screen.
4.
In other words, manually converting the application to continuation passing style (CPS).
5.
Here, and in the rest of this paper, we omit the prefix which is found in most of the symbols in CilkPlus, and our fork, Concurrent Cilk https://github.com/iu-parfunc/concurrent_cilk.
6.
A Cilk worker represents a thread local state which sits on top of an OS level thread.
7.
The original proof of Cilk’s space and time bounds relies on the critical path of the computation remaining always accessible in this way. Non-uniform probabilities in work-stealing are a concern to some authors of Cilk.
8.
However, the specific, narrow case of linear, synchronous dataflow graphs is addressed by recent work on extending Cilk with pipeline parallelism via a new looping construct [10].
9.
Using all four cores of an Intel Westmere processor (i5-2400 at 3.10 GHz), 4 Gb memory, Linux 2.6.32, GHC 7.4.2 and Go 1.0.3.

References

Intel Cilk Plus. http://software.intel.com/en-us/articles/intel-cilk-plus/
Intel Cilk Plus Application Binary Interface Specification. https://www.cilkplus.org/sites/default/files/open_specifications/CilkPlusABI_1.1.pdf
Agrawal, K., Leiserson, C., Sukha, J.: Executing task graphs using work-stealing. In: IPDPS, pp. 1–12, April 2010
Google Scholar
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. SIGPLAN Not. 30, 207–216 (1995)
Article Google Scholar
Fluet, M., Rainey, M., Reppy, J., Shaw, A., Xiao, Y.: Manticore: a heterogeneous parallel language. In: 2007 Workshop on Declarative Aspects of Multicore Programming, DAMP 2007, pp. 37–44. ACM, New York (2007)
Google Scholar
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. SIGPLAN Not. 33(5), 212–223 (1998)
Article Google Scholar
Goldstein, S.C., Schauser, K.E., Culler, D.E.: Lazy threads: implementing a fast parallel call. J. Parallel Distrib. Comput. 37(1), 5–20 (1996)
Article Google Scholar
Google. The Go Programming Language. https://golang.org
Haynes, C.T., Friedman, D.P., Wand, M.: Obtaining coroutines with continuations. Comput. Lang. 11(3.4), 143–153 (1986)
Article MATH Google Scholar
Lee, I., Angelina, T., Leiserson, C.E., Schardl, T.B., Sukha, J., Zhang, Z.: On-the-fly pipeline parallelism. In: Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 140–151. ACM (2013)
Google Scholar
Leijen, D., Schulte, W., Burckhardt, S.: The design of a task parallel library. SIGPLAN Not. 44, 227–242 (2009)
Article Google Scholar
Marlow, S., Jones, S.P., Thaller, W.: Extending the haskell foreign function interface with concurrency. In: Proceedings of the ACM SIGPLAN Workshop on Haskell, pp. 22–32. ACM (2004)
Google Scholar
Marlow, S., Peyton Jones, S., Singh, S.: Runtime support for multicore haskell. In: International Conference on Functional Programming, ICFP 2009, pp. 65–78. ACM, New York (2009)
Google Scholar
Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing, PODC 1996, pp. 267–275. ACM, New York (1996)
Google Scholar
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media, Sebastopol (2007)
Google Scholar
Reppy, J.H.: Concurrent ML: design, application and semantics. In: Lauer, P.E. (ed.) Functional Programming, Concurrency, Simulation and Automated Reasoning. LNCS, vol. 693, pp. 165–198. Springer, Heidelberg (1993)
Chapter Google Scholar
Rompf, T., Maier, I., Odersky, M.: Implementing first-class polymorphic delimited continuations by a type-directed selective cps-transform. SIGPLAN Not. 44, 317–328 (2009)
Article Google Scholar
Sivaramakrishnan, K., Ziarek, L., Prasad, R., Jagannathan, S.: Lightweight asynchrony using parasitic threads. In: Workshop on Declarative Aspects of Multicore Programming, DAMP 2010, pp. 63–72. ACM, New York (2010)
Google Scholar
von Behren, R., Condit, J., Zhou, F., Necula, G.C., Brewer, E.: Capriccio: scalable threads for internet services. SIGOPS Oper. Syst. Rev. 37(5), 268–281 (2003)
Article Google Scholar
Wheeler, K.B., Murphy, R.C., Thain, D.: Qthreads: an api for programming with millions of lightweight threads. In: IPDPS, pp. 1–8. IEEE (2008)
Google Scholar

Download references

Acknowledgements

This material is based in part upon work supported by the Department of Energy under Award Number DE-SC0008809, and by the National Science Foundation under Grant No. 1337242.

Author information

Authors and Affiliations

Indiana University Bloomington, Bloomington, USA
Christopher S. Zakian, Timothy A. K. Zakian, Abhishek Kulkarni, Buddhika Chamith & Ryan R. Newton

Authors

Christopher S. Zakian
View author publications
You can also search for this author in PubMed Google Scholar
Timothy A. K. Zakian
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar
Buddhika Chamith
View author publications
You can also search for this author in PubMed Google Scholar
Ryan R. Newton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher S. Zakian .

Editor information

Editors and Affiliations

North Carolina State University, Raleigh, North Carolina, USA
Xipeng Shen
North Carolina State University, Raleigh, North Carolina, USA
Frank Mueller
North Carolina State University, Raleigh, North Carolina, USA
James Tuck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zakian, C.S., Zakian, T.A.K., Kulkarni, A., Chamith, B., Newton, R.R. (2016). Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++ . In: Shen, X., Mueller, F., Tuck, J. (eds) Languages and Compilers for Parallel Computing. LCPC 2015. Lecture Notes in Computer Science(), vol 9519. Springer, Cham. https://doi.org/10.1007/978-3-319-29778-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-29778-1_5
Published: 20 February 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29777-4
Online ISBN: 978-3-319-29778-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics