skip to main content
10.1145/1993806.1993817acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
research-article

Resilience of mutual exclusion algorithms to transient memory faults

Published:06 June 2011Publication History

ABSTRACT

We study the behavior of mutual exclusion algorithms in the presence of unreliable shared memory subject to transient memory faults. It is well-known that classical 2-process mutual exclusion algorithms, such as Dekker and Peterson's algorithms, are not fault-tolerant; in this paper we ask what degree of fault tolerance can be achieved using the same restricted resources as Dekker and Peterson's algorithms, namely, three binary read/write registers.

We show that if one memory fault can occur, it is not possible to guarantee both mutual exclusion and deadlock-freedom using three binary registers; this holds in general when fewer than 2f+1 binary registers are used and f may be faulty. Hence we focus on algorithms that guarantee (a) mutual exclusion and starvation-freedom in fault-free executions, and (b) only mutual exclusion in faulty executions. We show that using only three binary registers it is possible to design an 2-process mutual exclusion algorithm which tolerates a single memory fault in this manner. Further, by replacing one read/write register with a test&set register, we can guarantee mutual exclusion in executions where one variable experiences unboundedly many faults.

In the more general setting where up to f registers may be faulty, we show that it is not possible to guarantee mutual exclusion using 2f + 1 binary read/write registers if each faulty register can exhibit unboundedly many faults. On the positive side, we show that an n-variable single-fault tolerant algorithm satisfying certain conditions can be transformed into an ((n-1)f + 1)-variable f-fault tolerant algorithm with the same progress guarantee as the original. In combination with our three-variable algorithm, this implies that there is a (2f+1)-variable mutual exclusion algorithm tolerating a single fault in up to f variables without violating mutual exclusion.

References

  1. Y. Afek, D. S. Greenberg, M. Merritt, and G. Taubenfeld. Computing with Faulty Shared Memory. In Proceedings of Symposium on Principles of Distributed Computing (PODC), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Y. Afek, D. S. Greenberg, M. Merritt, and G. Taubenfeld. Computing with Faulty Shared Objects. Journal of the ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. C. Baumann. Soft Errors in Advanced Semiconductor Devices -- Part I: The Three Radiation Sources. IEEE Transactions on Device and Materials Reliability, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. C. Baumann. Soft Errors in Commercial Semiconductor Technology: Overview and Scaling Trends. IEEE 2002 Reliability Physics Tutorial Notes, Reliability Fundamentals, 2002.Google ScholarGoogle Scholar
  5. S. Borkar. Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation. IEEE Micro, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. E. Burns and N. A. Lynch. Bounds on shared memory for mutual exclusion. Inf. Comput., 107:171--184, December 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. S. Chlebus, A. Gambin, and P. Indyk. Shared-Memory Simulations on a Faulty-Memory DMM. In Proceedings of 23rd Colloquium on Automata, Languages and Programming (ICALP), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. S. Chlebus, L. Gasieniec, and A. Pelc. Deterministic Computations on a PRAM with Static Processor and Memory Faults. Fundamenta Informaticae, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Derrick, G. Schellhorn, and H. Wehrheim. Proving linearizability via non-atomic refinement. In J. Davies and J. Gibbons, editors, IFM, volume 4591 of Lecture Notes in Computer Science, pages 195--214. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. Finocchi, F. Grandoni, and G. F. Italiano. Designing Reliable Algorithms in Unreliable Memories. In Proceedings of European Symposium on Algorithms (ESA), pages 1--8, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Guerraoui and M. Raynal. From Unreliable Objects to Reliable Objects: The Case of Atomic Registers and Consensus. In Proceedings of PaCT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12:463--492, July 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Jayanti, T. D. Chandra, and S. Toueg. Fault-tolerant wait-free shared objects. Journal of the ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Lamport. The Mutual Exclusion Problem: Part II -- Statement and Solutions. Journal of the ACM, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Liu, W. Chen, Y. A. Liu, and J. Sun. Model checking linearizability via refinement. In Proceedings of the 2nd World Congress on Formal Methods, FM '09, pages 321--337, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. N. V. M. Gomaa, C. Scarbrough and I. Pomeranz. Transient-fault Recovery for Chip Multiprocessors. In Proceedings of 30th Symposium on Computer Architecture (ISCA), pages 98--109, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. S. Mukherjee, M. Kontz, and S. K. Reinhardt. Detailed Design and Evaluation of Redundant Multithreading Alternatives. In Proceedings of 29th Symposium on Computer Architecture (ISCA), pages 99--110, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Oh, P. P. Shirvani, and E. J. McCluskey. Error Detection by Duplicated Instructions in Super-Scalar Processors. IEEE Transactions on Reliability, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  19. G. L. Peterson. Concurrent Reading while Writing. Transactions on Programming Languages and Systems, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. A. Reis, J. Chang, and D. I. August. Automatic Instruction-Level Software-Only Recovery Methods. IEEE Micro Top Picks, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. W. H. B. E. T. S. E. Michalak, K. W. Harris and S. A. Wender. Predicting the Number of Fatal Soft Errors in Los Alamos National Labratory's ASC Q Computer. IEEE Transactions on Device and Materials Reliability, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  22. P. Shivakumar, M. Kistler, S. W. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic. In Proceedings of the Conference on Dependable Systems and Networks, pages 389--388, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. K. Szymanski. Mutual Exclusion Revisited. In Proceedings of 5th Jerusalem Conference on Information Technology, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Tromp. How to Construct an Atomic Variable. In Proceedings of 3rd Workshop on Distributed Algorithms, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Truuvert. A Self-Stabilizing First-Come-First-Serve Mutual Exclusion Algorithm with Small Shared Variables. Technical Note, University of Toronto, 1989.Google ScholarGoogle Scholar

Index Terms

  1. Resilience of mutual exclusion algorithms to transient memory faults

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODC '11: Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
      June 2011
      406 pages
      ISBN:9781450307192
      DOI:10.1145/1993806

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 June 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate740of2,477submissions,30%

      Upcoming Conference

      PODC '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader