Skip to main content
Log in

Recoverable mutual exclusion

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Mutex locks have traditionally been the most common mechanism for protecting shared data structures in concurrent programs. However, the robustness of such locks against process failures has not been studied thoroughly. The vast majority of mutex algorithms are designed around the assumption that processes are reliable, meaning that a process may not fail while executing the lock acquisition and release code, or while inside the critical section. If such a failure does occur, then the liveness properties of a conventional mutex lock may cease to hold until the application or operating system intervenes by cleaning up the internal structure of the lock. For example, a process that is attempting to acquire an otherwise starvation-free mutex may be blocked forever waiting for a failed process to release the critical section. Adding to the difficulty, if the failed process recovers and attempts to acquire the same mutex again without appropriate cleanup, then the mutex may become corrupted to the point where it loses safety, notably the mutual exclusion property. We address this challenge by formalizing the problem of recoverable mutual exclusion, and proposing several solutions that vary both in their assumptions regarding hardware support for synchronization, and in their efficiency. Compared to known solutions, our algorithms are more robust as they do not restrict where or when a process may crash, and provide stricter guarantees in terms of efficiency, which we define in terms of remote memory references.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The term bounded in reference to a piece of code means that there exists a function f of the number of processes N such that the code performs at most f(N) shared memory operations in all executions of the algorithm instantiated for N processes.

  2. As explained later on in the model near the discussion of First-Come-First-Served fairness, we assume that the doorway is well-defined and bounded only in a subset of execution histories that are relevant to our weaker notion of FCFS.

  3. In a practical implementation, the code of and can be packaged in a single procedure for simplicity.

  4. The term cleanup-concurrent defined in the conference version of this paper [20] is analogous to 1-failure-concurrent in this model.

  5. The Bounded Recovery property defined in the conference version of this paper [20] is analogous to 1-BR in this model.

  6. Despite the prevalence of cache-coherent architectures, the DSM model remains important in practice because of its inherent scalability. Intel’s Single-chip Cloud Computer, for example, sacrifices cache-coherence “to simplify the design, reduce power consumption and to encourage the exploration of datacenter distributed memory software models” [26].

  7. The RMR complexity of is unbounded if F does not exist for a given history H.

  8. The “\(\wedge \)” operator at line 94 should be interpreted like&& in C++, meaning that the right operand is evaluated only if the left operand is true.

References

  1. Afek, Y., Greenberg, D.S., Merritt, M., Taubenfeld, G.: Computing with faulty shared objects. J. ACM 42(6), 1231–1274 (1995)

    Article  MathSciNet  Google Scholar 

  2. Anderson, J., Kim, Y.-J.: A new fast-path mechanism for mutual exclusion. Distrib. Comput. 14(1), 17–29 (2001)

    Article  Google Scholar 

  3. Anderson, J., Kim, Y.-J.: An improved lower bound for the time complexity of mutual exclusion. Distrib. Comput. 15(4), 221–253 (2002)

    Article  Google Scholar 

  4. Anderson, J., Kim, Y.-J., Herman, T.: Shared-memory mutual exclusion: major research trends since 1986. Distrib. Comput. 16(2–3), 75–110 (2003)

    Article  Google Scholar 

  5. Anderson, T.: The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 1(1), 6–16 (1990)

    Article  Google Scholar 

  6. Attiya, H., Hendler, D., Woelfel, P.: Tight RMR lower bounds for mutual exclusion and other problems. In: Proceedings of the 40th ACM symposium on theory of computing (STOC), pp. 217–226 (2008)

  7. Bender, M.A., Gilbert, S.: Mutual exclusion with \(O(\log ^{2}\log n)\) amortized work. In: Proceedings of the 52nd symposium on foundations of computer science (FOCS), pp. 728–737 (2011)

  8. Bohannon, P., Lieuwen, D.F., Silberschatz, A.: Recovering scalable spin locks. In: Proceedings of the 8th IEEE symposium on parallel and distributed processing (SPDP), pp. 314–322 (1996)

  9. Bohannon, P., Lieuwen, D.F., Silberschatz, A., Sudarshan, S., Gava, J.: Recoverable user-level mutual exclusion. In: Proceedings of the 7th IEEE symposium on parallel and distributed processing (SPDP), pp. 293–301 (1995)

  10. Burns, J.E., Lynch, N.A.: Bounds on shared memory for mutual exclusion. Inf. Comput. 107(2), 171–184 (1993)

    Article  MathSciNet  Google Scholar 

  11. Cypher, R.: The communication requirements of mutual exclusion. In: Proceedings of the 7th ACM symposium on parallel algorithms and architectures (SPAA), pp. 147–156 (1995)

  12. Dijkstra, E.W.: Solution of a problem in concurrent programming control. Commun. ACM 8(9), 569 (1965)

    Article  Google Scholar 

  13. Dijkstra, E.W.: Self-stabilizing systems in spite of distributed control. Commun. ACM 17(11), 643–644 (1974)

    Article  Google Scholar 

  14. Fan, R., Lynch, N.: An \(\Omega (n \log n)\) lower bound on the cost of mutual exclusion. In: Proceedings of the 25th ACM symposium on principles of distributed computing (PODC), pp. 275–284 (2006)

  15. Giakkoupis, G., Woelfel, P.: Randomized mutual exclusion with constant amortized RMR complexity on the DSM. In: Proceedings of the 55th symposium on foundations of computer science (FOCS), pp. 504–513 (2014)

  16. Gibbons, P.B.: How emerging memory technologies will have you rethinking algorithm design. In: Proceedings of the 35th ACM symposium on principles of distributed computing (PODC), p. 303 (2016)

  17. Golab, W., Hadzilacos, V., Hendler, D., Woelfel, P.: RMR-efficient implementations of comparison primitives using read and write operations. Distrib. Comput. 25(2), 109–162 (2012)

    Article  Google Scholar 

  18. Golab, W., Hendler, D.: Recoverable mutual exclusion in sub-logarithmic time. In: Proceedings of the 36th annual ACM symposium on principles of distributed computing (PODC), pp. 211–220 (2017)

  19. Golab, W., Hendler, D.: Recoverable mutual exclusion under system-wide failures. In: Proceedings of the 37th annual ACM symposium on principles of distributed computing (PODC), pp. 17–26 (2018)

  20. Golab, W., Ramaraju, A.: Recoverable mutual exclusion. In: Proceedings of the 35th ACM symposium on principles of distributed computing (PODC), pp. 65–74 (2016)

  21. Graunke, G., Thakkar, S.: Synchronization algorithms for shared-memory multiprocessors. IEEE Comput. 23(6), 60–69 (1990)

    Article  Google Scholar 

  22. Gray, J., Reuter, A.: Transaction processing: concepts and techniques. Morgan Kaufmann, Burlington (1993)

    MATH  Google Scholar 

  23. Hendler, D., Woelfel, P.: Randomized mutual exclusion with sub-logarithmic RMR-complexity. Distrib. Comput. 24(1), 3–19 (2011)

    Article  Google Scholar 

  24. Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)

    Article  Google Scholar 

  25. Hoepman, J.-H., Papatriantafilou, M., Tsigas, P.: Self-stabilization of wait-free shared memory objects. In: Proceedings of the 9th international workshop on distributed algorithms (WDAG), pp. 273–287 (1995)

  26. Intel Corporation. Single-chip cloud computer. http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-single-chip-cloud-overview-paper.pdf. Accessed 31 Oct 2019

  27. Jayanti, P.: F-arrays: implementation and applications. In: Proceedings of the 21st annual ACM symposium on principles of distributed computing (PODC), pp. 270–279 (2002)

  28. Jayanti, P., Chandra, T., Toueg, S.: Fault-tolerant wait-free shared objects. J. ACM 45(3), 451–500 (1998)

    Article  MathSciNet  Google Scholar 

  29. Jayanti, P., Joshi, A.: Recoverable FCFS mutual exclusion with wait-free recovery. In: Proceedings of the 31st international symposium on distributed computing (DISC), pp. 30:1–30:15 (2017)

  30. Jayanti, P., Jayanti, S., Joshi, A.: A recoverable Mutex algorithm with sub-logarithmic RMR on both CC and DSM. In: Proceedings of the 38th annual ACM symposium on principles of distributed computing (PODC), pp. 177–186 (2019)

  31. Johnen, C., Higham, L.: Fault-tolerant implementations of regular registers by safe registers with applications to networks. In: Proceedings of 10th international conference of distributed computing and networking (ICDCN), pp. 337–348 (2009)

  32. Kim, Y.-J., Anderson, J.H.: A space- and time-efficient local-spin spin lock. Inf. Process. Lett. 84(1), 47–55 (2002)

    Article  MathSciNet  Google Scholar 

  33. Kessels, J.: Arbitration without common modifiable variables. Acta Informatica 17, 135–141 (1982)

    Article  MathSciNet  Google Scholar 

  34. Lamport, L.: A new solution of Dijkstra’s concurrent programming problem. Commun. ACM 17(8), 453–455 (1974)

    Article  MathSciNet  Google Scholar 

  35. Lamport, L.: The mutual exclusion problem: part I—a theory of interprocess communication. J. ACM 33(2), 313–326 (1986)

    Article  MathSciNet  Google Scholar 

  36. Lamport, L.: The mutual exclusion problem: part II—statement and solutions. J. ACM 33(2), 327–348 (1986)

    Article  MathSciNet  Google Scholar 

  37. Lamport, L.: A fast mutual exclusion algorithm. ACM Trans. Comput. Syst. 5(1), 1–11 (1987)

    Article  Google Scholar 

  38. Magnusson, P., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: Proceedings of the 8th international parallel processing symposium (IPPS), pp. 165–171 (1994)

  39. Mellor-Crummey, J., Scott, M.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991)

    Article  Google Scholar 

  40. Michael, M., Kim, Y.: Fault tolerant mutual exclusion locks for shared memory systems. US Patent (2009)

  41. Mittal, S., Vetter, J.S.: A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans. Parallel Distrib. Syst. 27(5), 1537–1550 (2016)

    Article  Google Scholar 

  42. Mogul, J.C., Argollo, E., Shah, M.A., Faraboschi, P.: Operating system support for NVM + DRAM hybrid main memory. In: Proceedings of the 12th workshop on hot topics in operating systems (HotOS) (2009)

  43. Moscibroda, T., Oshman, R.: Resilience of mutual exclusion algorithms to transient memory faults. In: Proceedings of the 30th ACM symposium on principles of distributed computing (PODC), pp. 69–78 (2011)

  44. Narayanan, D., Hodson, O.: Whole-system persistence. In: Proceedings of the 17th international conference on architectural support for programming languages and operating systems (ASPLOS), pp. 401–410 (2012)

  45. Ramaraju, A.: RGLock: Recoverable mutual exclusion for non-volatile main memory systems. Master’s thesis, University of Waterloo (2015). https://uwspace.uwaterloo.ca/handle/10012/9473. Accessed 31 Oct 2019

  46. Raynal, M.: Algorithms for Mutual Exclusion. MIT Press, Cambridge (1986)

    MATH  Google Scholar 

  47. Scott, M., Scherer, W.: Scalable queue-based spin locks with timeout. In: Proceedings of the 8th ACM SIGPLAN symposium on principles and practices of parallel programming (PPoPP), pp. 44–52 (2001)

  48. Taubenfeld, G.: Synchronization Algorithms and Concurrent Programming. Prentice Hall, Upper Saddle (2006)

    Google Scholar 

  49. Yang, J.-H., Anderson, J.: A fast, scalable mutual exclusion algorithm. Distrib. Comput. 9(1), 51–60 (1995)

    Article  Google Scholar 

Download references

Acknowledgements

Sincere thanks to Peter Buhr, Patrick Lam, and the anonymous referees of PODC’16 and Distributed Computing for detailed feedback and helpful suggestions on earlier drafts of this work. We are grateful also to Vassos Hadzilacos, Danny Hendler, Prasad Jayanti, Gadi Taubenfeld, and Sam Toueg for stimulating technical discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wojciech Golab.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research is supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada, Discovery Grants Program; the Ontario Early Researcher Awards Program; and the Google Faculty Research Awards Program.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Golab, W., Ramaraju, A. Recoverable mutual exclusion. Distrib. Comput. 32, 535–564 (2019). https://doi.org/10.1007/s00446-019-00364-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-019-00364-0

Keywords

Navigation