Research
Design of loosely coupled processes capable of time-bounded cooperative recovery: the PTC/SL scheme

https://doi.org/10.1016/0140-3664(93)90047-VGet rights and content

Abstract

Design of loosely coupled distributed computer systems (DCS) required to tolerate propagated errors caused by software and/or hardware is a technological challenge that has been inadequately dealt with. In this paper, we adopt the view that a truly loosely coupled DCS consists of loosely coupled interacting processes distributed among multiple physical sites where each process is designed in the ‘partitioned design’ mode, i.e. designed with its interface specification only, rather than with full knowledge of interfaces between other processes (or sites). It then follows naturally that fault tolerance capabilities must be designed into loosely coupled processes in such systems without violating the partitioned design policy. The programmer-transparent coordination (PTC) scheme is one such approach that has been evolving since 1978. While the basic PTC scheme, called PTC/OR (PTC with obedient receiver) scheme, is a scheme for facilitating various forms of cooperative backward recovery in systems of loosely coupled processes, it has one drawback: the difficulty of bounding worst-case recovery time. After discussing various fundamentally different solution approaches and their limitations, a promising approach. called the PTC/SL (PTC with session leaders) scheme, which superimposes additional rules on structuring process interactions onto those of the PTC/OR scheme, is presented. Under the PTC/SL scheme, various flexible forms of process interactions are still allowed while the task of ensuring bounded recovery time is made a simple one.

References (16)

  • M Ancona et al.

    A system architecture for fault tolerance in concurrent software

    IEEE Comput.

    (October 1990)
  • K.H. Kim et al.

    A highly decentralized implementation model for the programmer-transparent coordination (PTC) scheme for cooperative recovery

  • E Nett

    Supporting Fault Tolerant Computations in Distributed Systems

  • K.H. Kim

    An approach to programmer-transparent coordination of recovering parallel processes and its efficient implementation rules

  • K.H. Kim et al.

    A scheme for coordinated execution of independently designed recoverable distributed processes

  • K.H. Kim et al.

    Efficient communication of commitment-dependency information in the PTC scheme for cooperative recovery

  • K.P. Eswaran et al.

    The notions of consistency and predicate locks in a relational database system

    Commun. ACM

    (1976)
There are more references available in the full text version of this article.

Cited by (1)

View full text