ABSTRACT
Mutex locks have traditionally been the most common mechanism for protecting shared data structures in parallel programs. However, the robustness of such locks against process failures has not been studied thoroughly. Most (user-level) mutex algorithms are designed around the assumption that processes are reliable, meaning that a process may not fail while executing the lock acquisition and release code, or while inside the critical section.
If such a failure does occur, then the liveness properties of a conventional mutex lock may cease to hold until the application or operating system intervenes by cleaning up the internal structure of the lock. For example, a process that is attempting to acquire an otherwise starvation-free mutex may be blocked forever waiting for a failed process to release the critical section. Adding to the difficulty, if the failed process recovers and attempts to acquire the same mutex again without appropriate cleanup, then the mutex may become corrupted to the point where it loses safety, notably the mutual exclusion property. We address this challenge by formalizing the problem of recoverable mutual exclusion, and proposing several solutions that vary both in their assumptions regarding hardware support for synchronization, and in their time complexity. Compared to known solutions, our algorithms are more robust as they do not restrict where or when a process may crash, and provide stricter guarantees in terms of time complexity, which we define in terms of remote memory references.
- Y. Afek, D. S. Greenberg, M. Merritt, and G. Taubenfeld. Computing with faulty shared memory. In Proc. of the 11th ACM Symposium on Principles of Distributed Computing (PODC), pages 47--58, 1992. Google ScholarDigital Library
- J. Anderson and Y.-J. Kim. A new fast-path mechanism for mutual exclusion. Distributed Computing, 14(1):17--29, 2001. Google ScholarDigital Library
- J. Anderson and Y.-J. Kim. An improved lower bound for the time complexity of mutual exclusion. Distributed Computing, 15(4):221--253, 2002. Google ScholarDigital Library
- J. Anderson, Y.-J. Kim, and T. Herman. Shared-memory mutual exclusion: Major research trends since 1986. Distributed Computing, 16(2--3):75--110, 2003. Google ScholarDigital Library
- T. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors.break IEEE Transactions on Parallel and Distributed Systems, 1(1):6--16, 1990. Google ScholarDigital Library
- H. Attiya, D. Hendler, and P. Woelfel. Tight RMR lower bounds for mutual exclusion and other problems. In Proc. of the 40th ACM Symposium on Theory of Computing (STOC), pages 217--226, 2008. Google ScholarDigital Library
- M. A. Bender and S. Gilbert. Mutual Exclusion with O(łog2 łog n) Amortized Work. In Proc. of the 52nd Symposium on Foundations of Computer Science (FOCS), pages 728--737, 2011. Google ScholarDigital Library
- P. Bohannon, D. F. Lieuwen, and A. Silberschatz. Recovering scalable spin locks. In Proc. of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP), pages 314--322, 1996. Google ScholarDigital Library
- P. Bohannon, D. F. Lieuwen, A. Silberschatz, S. Sudarshan, and J. Gava. Recoverable user-level mutual exclusion. In Proc. of the 7th IEEE Symposium on Parallel and Distributed Processing (SPDP), pages 293--301, 1995. Google ScholarDigital Library
- R. Cypher. The communication requirements of mutual exclusion. In Proc. of the 7th ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 147--156, 1995. Google ScholarDigital Library
- E. W. Dijkstra. Solution of a problem in concurrent programming control. Communications of the ACM, 8(9):569, 1965. Google ScholarDigital Library
- E. W. Dijkstra. Self-stabilizing systems in spite of distributed control. Communications of the ACM, 17(11):643--644, 1974. Google ScholarDigital Library
- R. Fan and N. Lynch. An Ω(n łog n) lower bound on the cost of mutual exclusion. In Proc. of the 25th ACM Symposium on Principles of Distributed Computing (PODC), pages 275--284, 2006. Google ScholarDigital Library
- G. Giakkoupis and P. Woelfel. A tight RMR lower bound for randomized mutual exclusion. In Proc. of the 44th Symposium on Theory of Computing (STOC), pages 983--1002, 2012. Google ScholarDigital Library
- G. Giakkoupis and P. Woelfel. Randomized Mutual Exclusion with Constant Amortized RMR Complexity on the DSM. In Proc. of the 55th Symposium on Foundations of Computer Science (FOCS), pages 504--513, 2014. Google ScholarDigital Library
- W. Golab, V. Hadzilacos, D. Hendler, and P. Woelfel. RMR-efficient implementations of comparison primitives using read and write operations. Distributed Computing, 25(2):109--162, 2012.Google ScholarCross Ref
- G. Graunke and S. Thakkar. Synchronization algorithms for shared-memory multiprocessors. IEEE Computer, 23(6):60--69, 1990. Google ScholarDigital Library
- J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993. Google ScholarDigital Library
- D. Hendler and P. Woelfel. Randomized mutual exclusion with sub-logarithmic RMR-complexity. Distributed Computing, 24(1):3--19, 2011.Google ScholarDigital Library
- M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, 1991. Google ScholarDigital Library
- J.-H. Hoepman, M. Papatriantafilou, and P. Tsigas. Self-stabilization of wait-free shared memory objects. In Proc. of the 9th International Workshop on Distributed Algorithms (WDAG), pages 273--287, 1995. Google ScholarDigital Library
- Intel Corporation. Single-chip cloud computer. http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-single-chip-cloud-overview-paper.pdf.Google Scholar
- P. Jayanti, T. Chandra, and S. Toueg. Fault-tolerant wait-free shared objects. In Proc. of the 33rd Symposium on Foundations of Computer Science (FOCS), pages 157--166, 1992. Google ScholarDigital Library
- C. Johnen and L. Higham. Fault-tolerant implementations of regular registers by safe registers with applications to networks. In Proc. of 10th International Conference of Distributed Computing and Networking (ICDCN), pages 337--348, 2009. Google ScholarDigital Library
- J. Kessels. Arbitration without common modifiable variables. Acta Informatica, 17:135--141, 1982. Google ScholarDigital Library
- L. Lamport. A new solution of Dijkstra's concurrent programming problem. Communications of the ACM, 17(8):453--455, 1974. Google ScholarDigital Library
- L. Lamport. The mutual exclusion problem: part I -- a theory of interprocess communication. Journal of the ACM, 33(2):313--326, 1986. Google ScholarDigital Library
- L. Lamport. The mutual exclusion problem: part II -- statement and solutions. Journal of the ACM, 33(2):327--348, 1986. Google ScholarDigital Library
- L. Lamport. A fast mutual exclusion algorithm. ACM Transactions on Computer Systems, 5(1):1--11, 1987. Google ScholarDigital Library
- P. Magnusson, A. Landin, and E. Hagersten. Queue locks on cache coherent multiprocessors. In Proc. of the 8th International Parallel Processing Symposium (IPPS), pages 165--171, 1994. Google ScholarDigital Library
- J. Mellor-Crummey and M. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1):21--65, 1991. Google ScholarDigital Library
- M. Michael and Y. Kim. Fault tolerant mutual exclusion locks for shared memory systems. US Patent 7,493,618, 2009.Google Scholar
- D. Narayanan and O. Hodson. Whole-system persistence. In Proc. of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 401--410, 2012. Google ScholarDigital Library
- A. Ramaraju. RGLock: Recoverable mutual exclusion for non-volatile main memory systems. Master's thesis, University of Waterloo, 2015. https://uwspace.uwaterloo.ca/handle/10012/9473.Google Scholar
- M. Raynal. Algorithms for Mutual Exclusion. MIT Press, 1986. Google ScholarDigital Library
- M. Scott and W. Scherer. Scalable queue-based spin locks with timeout. In Proc. of the 8th ACM SIGPLAN symposium on Principles and Practices of Parallel Programming (PPoPP), pages 44--52, 2001. Google ScholarDigital Library
- G. Taubenfeld. Synchronization Algorithms and Concurrent Programming. Prentice Hall, 2006. Google ScholarDigital Library
- J.-H. Yang and J. Anderson. A fast, scalable mutual exclusion algorithm. Distributed Computing, 9(1):51--60, 1995.Google ScholarDigital Library
Index Terms
- Recoverable Mutual Exclusion: [Extended Abstract]
Recommendations
An Adaptive Approach to Recoverable Mutual Exclusion
PODC '20: Proceedings of the 39th Symposium on Principles of Distributed ComputingMutual exclusion (ME) is one of the most commonly used techniques to handle conflicts in concurrent systems. Traditionally, mutual exclusion algorithms have been designed under the assumption that a process does not fail while acquiring/releasing a lock ...
Recoverable Mutual Exclusion in Sub-logarithmic Time
PODC '17: Proceedings of the ACM Symposium on Principles of Distributed ComputingRecoverable mutual exclusion (RME) is a variation on the classic mutual exclusion (ME) problem that allows processes to crash and recover. The time complexity of RME algorithms is quantified in the same way as for ME, namely by counting remote memory ...
Recoverable Mutual Exclusion Under System-Wide Failures
PODC '18: Proceedings of the 2018 ACM Symposium on Principles of Distributed ComputingRecoverable mutual exclusion (RME) is a variation on the classic mutual exclusion (ME) problem that allows processes to crash and recover. The time complexity of RME algorithms is quantified in the same way as for ME, namely by counting remote memory ...
Comments