Abstract
The number of cores in future multi-core systems are expected to increase by 100 fold over the next decade. The fine-grained synchronization methods found in wait-free algorithm designs makes them desirable for these future systems. Unfortunately, such designs are often inhibited by the limitations of portable atomic hardware primitives. Typically these primitives can only operate on a single address at a time, while concurrent algorithms often need to operate on multiple addresses. To support such algorithms we present a practical wait-free Multi-word-compare-and-swap. The wait-free property ensures that each thread completes its operation in a finite number of steps, even if it is continuously interrupted. Our approach uses a progress assurance scheme that allows a blocked thread to announce that it is unable to make progress. This differs from traditional lock-free helping techniques where a thread will only help complete an operation that is in conflict with its own. Our design is practical in that it is built from only portable atomic operations, it is efficient in its utilization of memory (i.e. requiring only a single bit to be reserved from each word, not requiring use of explicit memory barriers, and requiring only four words per address in the operation), and has a wait-free progress guarantee. When tested in a high contention scenario with 64 threads executing updates on a single multi-word object, our wait-free design performs on average 77.1 % more operations than other practical approaches. Over all tested scenarios, our design performs on average 8.3 % more operations.
Similar content being viewed by others
Notes
Strong scaling is the scenario when the total problem size stays fixed while the number of processing elements are increased. The challenge is how to synchronize the work of the processing elements in a correct and efficient manner without “wasting” too many cycles on parallelism overhead. In weak scaling, the problem size assigned to each processing element remains constant while the total problem size may increase. In this case, the main challenge is how to add new processing elements to the existing system.
An operation with infinite consensus number in the wait-free/lock-free hierarchy
An object that allows an interrupting thread to help an interrupted thread to complete successfully [10].
Load-link, Validate, Store Conditional; used to ensure the value at an address has not been unknowingly modified.
See Sect. 4.3 for more details.
This requires a sequential consistent memory model.
An object is considered thread-local if only one thread holds a reference to that object.
See Sect. 4.3 for details
Incrementing by 16 ensures that the two least significant bits are always 0.
An MCAS read function is designed to return the logical value of a descriptor object that may be at an address.
References
Shalf, J., Dosanjh, S., Morrison, J.: In: Proceedings of the 9th International Conference on High Performance Computing for Computational Science, pp. 1–25. Springer-Verlag, Berlin, Heidelberg, VECPAR’10 (2011). http://dl.acm.org/citation.cfm?id=1964238.1964240
Herlihy, M.: A methodology for implementing highly concurrent data objects. ACM Trans. Prog. Lang. Syst. 15(5), 745 (1993). doi:10.1145/161468.161469
Steven Feldman, D.D., LaBorde, P.: In: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 155–163 (2013)
Timnat, S., Braginsky, A., Kogan, A., Petrank, E.: Wait-free linked-lists. SIGPLAN Not. 47(8), 309 (2012). doi:10.1145/2370036.2145869
Meawad, F., Schoeberl, M., Iyer, K., Vitek, J.: In: Proceedings of the 9th International Workshop on Java Technologies for Real-Time and Embedded Systems, pp. 1–10. ACM, New York, NY, USA, JTRES ’11 (2011). doi:10.1145/2043910.2043912
Harris, T.L., Fraser, K., Pratt, I.A.: In: Proceedings of the 16th International Conference on Distributed Computing, pp. 265–279. Springer-Verlag, London, UK, DISC ’02 (2002). http://dl.acm.org/citation.cfm?id=645959.676137
Purcell, C., Harris, T.: In: Proceedings of the 19th International Conference on Distributed Computing, pp. 108–121. Springer-Verlag, Berlin, Heidelberg, DISC’05 (2005). doi:10.1007/11561927_10
Liu, Y., Spear, M.: A lock-free, array-based priority queue. SIGPLAN Not. 47(8), 323 (2012). doi:10.1145/2370036.2145876
Saha, B., Adl-Tabatabai, A.R., Hudson, R.L., Minh, C.C., Hertzberg, B.: In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 187–197. ACM, New York, NY, USA, PPoPP ’06 (2006). doi:10.1145/1122971.1123001
Barnes, G.: In: Proceedings of the Fifth Annual ACM Symposium on Parallel Algorithms and Architecturespp. 261–270. ACM, New York, NY, USA, SPAA ’93 (1993). doi:10.1145/165231.165265
Fraser, K., Harris, T.: ACM Trans. Comput. Syst. 25(2) (2007). doi:10.1145/1233307.1233309, www.cl.cam.ac.uk/tlh20/casn-clean.tar.gz
Israeli, A., Rappoport, L.: In: Proceedings of the Thirteenth Annual ACM Symposium on Principles of Distributed Computing, pp. 151–160. ACM, New York, NY, PODC ’94 (1994). doi:10.1145/197917.198079
Anderson, J.H., Ramamurthy, S., Jeffay, K.: Real-time computing with lock-free shared objects. ACM Trans. Comput. Syst. 15(2), 134 (1997). doi:10.1145/253145.253159
Moir, M.: In: Proceedings of the 11th International Workshop on Distributed Algorithms, pp. 305–319. Springer-Verlag, London, UK, WDAG ’97 (1997). http://dl.acm.org/citation.cfm?id=645954.675655
Attiya, H., Hillel, E.: Built-in coloring for highly-concurrent doubly-linked lists. Theor. Comput. Sci. 412(12–14), 1243 (2011). doi:10.1016/j.tcs.2010.12.049
Sundell, H.: International Journal of Parallel Programming 39, 694 (2011) DOI:10.1007/s10766-011-0167-4, http://www.adm.hb.se/hsu/CASNSource.zip
Kogan, A., Petrank, E.: A methodology for creating fast wait-free data structures. SIGPLAN Not. 47(8), 141 (2012). doi:10.1145/2370036.2145835
Detlefs, D.L., Martin, P.A., Moir, M., Steele, G.L. Jr.: In: Proceedings of the Twentieth Annual ACM Symposium on Principles of Distributed Computing, pp. 190–199. ACM, New York, NY, USA, PODC ’01 (2001). doi:10.1145/383962.384016
Herlihy, M.: The Art of Multiprocessor Programming. Elsevier, Amsterdam (2008)
Michael, M.M.: Performance of memory reclamation for lockless synchronization. IEEE Trans. Parallel Distrib. Syst. 15(6), 491 (2004). doi:10.1109/TPDS.2004.8
Amdahl, G.M.: In: Proceedings of the April 18–20, 1967, Spring Joint Computer Conference, pp. 483–485. ACM, New York, NY, AFIPS ’67 (Spring) (1967). doi:10.1145/1465482.1465560
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Feldman, S., LaBorde, P. & Dechev, D. A Wait-Free Multi-Word Compare-and-Swap Operation. Int J Parallel Prog 43, 572–596 (2015). https://doi.org/10.1007/s10766-014-0308-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-014-0308-7