Skip to main content
Log in

A Wait-Free Multi-Word Compare-and-Swap Operation

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The number of cores in future multi-core systems are expected to increase by 100 fold over the next decade. The fine-grained synchronization methods found in wait-free algorithm designs makes them desirable for these future systems. Unfortunately, such designs are often inhibited by the limitations of portable atomic hardware primitives. Typically these primitives can only operate on a single address at a time, while concurrent algorithms often need to operate on multiple addresses. To support such algorithms we present a practical wait-free Multi-word-compare-and-swap. The wait-free property ensures that each thread completes its operation in a finite number of steps, even if it is continuously interrupted. Our approach uses a progress assurance scheme that allows a blocked thread to announce that it is unable to make progress. This differs from traditional lock-free helping techniques where a thread will only help complete an operation that is in conflict with its own. Our design is practical in that it is built from only portable atomic operations, it is efficient in its utilization of memory (i.e. requiring only a single bit to be reserved from each word, not requiring use of explicit memory barriers, and requiring only four words per address in the operation), and has a wait-free progress guarantee. When tested in a high contention scenario with 64 threads executing updates on a single multi-word object, our wait-free design performs on average 77.1 % more operations than other practical approaches. Over all tested scenarios, our design performs on average 8.3 % more operations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Strong scaling is the scenario when the total problem size stays fixed while the number of processing elements are increased. The challenge is how to synchronize the work of the processing elements in a correct and efficient manner without “wasting” too many cycles on parallelism overhead. In weak scaling, the problem size assigned to each processing element remains constant while the total problem size may increase. In this case, the main challenge is how to add new processing elements to the existing system.

  2. An operation with infinite consensus number in the wait-free/lock-free hierarchy

  3. An object that allows an interrupting thread to help an interrupted thread to complete successfully [10].

  4. Load-link, Validate, Store Conditional; used to ensure the value at an address has not been unknowingly modified.

  5. See Sect. 4.3 for more details.

  6. This requires a sequential consistent memory model.

  7. An object is considered thread-local if only one thread holds a reference to that object.

  8. See Sect. 4.3 for details

  9. Incrementing by 16 ensures that the two least significant bits are always 0.

  10. An MCAS read function is designed to return the logical value of a descriptor object that may be at an address.

References

  1. Shalf, J., Dosanjh, S., Morrison, J.: In: Proceedings of the 9th International Conference on High Performance Computing for Computational Science, pp. 1–25. Springer-Verlag, Berlin, Heidelberg, VECPAR’10 (2011). http://dl.acm.org/citation.cfm?id=1964238.1964240

  2. Herlihy, M.: A methodology for implementing highly concurrent data objects. ACM Trans. Prog. Lang. Syst. 15(5), 745 (1993). doi:10.1145/161468.161469

    Article  Google Scholar 

  3. Steven Feldman, D.D., LaBorde, P.: In: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 155–163 (2013)

  4. Timnat, S., Braginsky, A., Kogan, A., Petrank, E.: Wait-free linked-lists. SIGPLAN Not. 47(8), 309 (2012). doi:10.1145/2370036.2145869

    Google Scholar 

  5. Meawad, F., Schoeberl, M., Iyer, K., Vitek, J.: In: Proceedings of the 9th International Workshop on Java Technologies for Real-Time and Embedded Systems, pp. 1–10. ACM, New York, NY, USA, JTRES ’11 (2011). doi:10.1145/2043910.2043912

  6. Harris, T.L., Fraser, K., Pratt, I.A.: In: Proceedings of the 16th International Conference on Distributed Computing, pp. 265–279. Springer-Verlag, London, UK, DISC ’02 (2002). http://dl.acm.org/citation.cfm?id=645959.676137

  7. Purcell, C., Harris, T.: In: Proceedings of the 19th International Conference on Distributed Computing, pp. 108–121. Springer-Verlag, Berlin, Heidelberg, DISC’05 (2005). doi:10.1007/11561927_10

  8. Liu, Y., Spear, M.: A lock-free, array-based priority queue. SIGPLAN Not. 47(8), 323 (2012). doi:10.1145/2370036.2145876

    Article  Google Scholar 

  9. Saha, B., Adl-Tabatabai, A.R., Hudson, R.L., Minh, C.C., Hertzberg, B.: In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 187–197. ACM, New York, NY, USA, PPoPP ’06 (2006). doi:10.1145/1122971.1123001

  10. Barnes, G.: In: Proceedings of the Fifth Annual ACM Symposium on Parallel Algorithms and Architecturespp. 261–270. ACM, New York, NY, USA, SPAA ’93 (1993). doi:10.1145/165231.165265

  11. Fraser, K., Harris, T.: ACM Trans. Comput. Syst. 25(2) (2007). doi:10.1145/1233307.1233309, www.cl.cam.ac.uk/tlh20/casn-clean.tar.gz

  12. Israeli, A., Rappoport, L.: In: Proceedings of the Thirteenth Annual ACM Symposium on Principles of Distributed Computing, pp. 151–160. ACM, New York, NY, PODC ’94 (1994). doi:10.1145/197917.198079

  13. Anderson, J.H., Ramamurthy, S., Jeffay, K.: Real-time computing with lock-free shared objects. ACM Trans. Comput. Syst. 15(2), 134 (1997). doi:10.1145/253145.253159

    Article  Google Scholar 

  14. Moir, M.: In: Proceedings of the 11th International Workshop on Distributed Algorithms, pp. 305–319. Springer-Verlag, London, UK, WDAG ’97 (1997). http://dl.acm.org/citation.cfm?id=645954.675655

  15. Attiya, H., Hillel, E.: Built-in coloring for highly-concurrent doubly-linked lists. Theor. Comput. Sci. 412(12–14), 1243 (2011). doi:10.1016/j.tcs.2010.12.049

    Article  MATH  MathSciNet  Google Scholar 

  16. Sundell, H.: International Journal of Parallel Programming 39, 694 (2011) DOI:10.1007/s10766-011-0167-4, http://www.adm.hb.se/hsu/CASNSource.zip

  17. Kogan, A., Petrank, E.: A methodology for creating fast wait-free data structures. SIGPLAN Not. 47(8), 141 (2012). doi:10.1145/2370036.2145835

    Article  Google Scholar 

  18. Detlefs, D.L., Martin, P.A., Moir, M., Steele, G.L. Jr.: In: Proceedings of the Twentieth Annual ACM Symposium on Principles of Distributed Computing, pp. 190–199. ACM, New York, NY, USA, PODC ’01 (2001). doi:10.1145/383962.384016

  19. Herlihy, M.: The Art of Multiprocessor Programming. Elsevier, Amsterdam (2008)

    Google Scholar 

  20. Michael, M.M.: Performance of memory reclamation for lockless synchronization. IEEE Trans. Parallel Distrib. Syst. 15(6), 491 (2004). doi:10.1109/TPDS.2004.8

    Article  Google Scholar 

  21. Amdahl, G.M.: In: Proceedings of the April 18–20, 1967, Spring Joint Computer Conference, pp. 483–485. ACM, New York, NY, AFIPS ’67 (Spring) (1967). doi:10.1145/1465482.1465560

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven Feldman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feldman, S., LaBorde, P. & Dechev, D. A Wait-Free Multi-Word Compare-and-Swap Operation. Int J Parallel Prog 43, 572–596 (2015). https://doi.org/10.1007/s10766-014-0308-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-014-0308-7

Keywords

Navigation