Automatic reconfiguration in the presence of failures

Flaviu Cristian

Automatic reconfiguration in the presence of failures

Access Full Text

Automatic reconfiguration in the presence of failures

Author(s): Flaviu Cristian
DOI: 10.1049/sej.1993.0009

For access to this article, please select a purchase option:

Buy article PDF

Buy Knowledge Pack

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership

Recommend Title Publication to library

Software Engineering Journal — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Author(s): Flaviu Cristian ¹
- Affiliations: 1: Computer Science and Engineering Department, University of California at San Diego, La Jolla, USA
Source: Volume 8, Issue 2, March 1993, p. 53 – 60
DOI: 10.1049/sej.1993.0009 , Print ISSN 0268-6961, Online ISSN 2053-910X

Published

We describe a new kind of distributed system service, the availability management service, responsible for ensuring that the critical services of a distributed system remain continuously available to users despite arbitrary numbers of concurrent node removals and node restarts caused by failures, maintenance, and growth. We stress the main ideas behind this new service, and outline a simple design that depends on the existence of synchronous membership and atomic broadcast group communication services. Extensions of this initial design to deal with asynchronous group communication services are also briefly discussed.

References

1. 1)
  - O. Babaoglu , R. Drumond . Streets of Byzantium: network architectures for fast reliable broadcasts. IEEE Trans. , 6
2. 2)
  - Powell, D., Bonn, G., Seaton, D., Verissimo, P., Waeselynck, F.: `The Delta-4 approach to dependability in open distributed computing systems', Proc. 18th Int. Symp. on Fault-tolerant Computing, 1988.
3. 3)
  - F. Cristian . Understanding fault-tolerant distributed systems. Commun. ACM , 2
4. 4)
  - Amir, V., Dolev, D., Kramer, S., Malki, D.: `Transis: a communication sub-system for high availability', 22nd Int. Symp. on Fault-tolerant Computing, 1992.
5. 5)
  - Mishra, S., Peterson, L., Schlichting, R.: `Implementing fault-tolerant objects using Psync', Proc. 8th Symp. on Reliable Distributed Systems, 1989.
6. 6)
  - Cristian, F., Aghili, H., Strong, R., Doley, D.: `Atomic broadcast: from simple message diffusion to Byzantine Agreement', 15th Int. Symp. on Fault-tolerant Computing, 1985.
7. 7)
  - Cristian, F.: `Reaching agreement on processor-group membership in synchronous distributed systems', 18th Int. Symp. on Fault-tolerant Computing, 1988.
8. 8)
  - K. Birman , A. Schiper , P. Stephenson . Light-weight causal and atomic group multicast. ACM Trans. Syst. , 3
9. 9)
  - F. Cristian . Probabilistic clock synchronization. Distrib. Comput. , 146 - 158
10. 10)
  - S. Shrivastava , P. Ezhilchelvan , N. Speirs , S. Tao , A. Tully . Principle features of the Voltan family of reliable node architectures for distributed systems. IEEE Trans. , 5
11. 11)
  - Lundelius, J., Lynch, N.: `A new fault-tolerant algorithm for clock synchronization', Proc. 3rd ACM PODS, 1984.
12. 12)
  - Cristian, F., Dehn, J., Dancey, B.: `Fault-tolerance in the advanced automation system', 20th Int. Symp. on Fault-tolerant Computing, 1990.
13. 13)
  - L. Lamport , M. Melliar-Smith . Synchronizing clocks in the presence of faults. J. ACM , 1
14. 14)
  - Gray, J.: `Why do computers stop and what can be done about it?', 5th Symp. on Reliability in Distributed Software and Database Systems, 1986.
15. 15)
  - D. Parnas . A technique for software module specification with examples. Commun. ACM , 5
16. 16)
  - H. Kopetz . Clock synchronization in distributed real-time systems. IEEE Trans. , 8
17. 17)
  - F. Cristian . A rigorous approach to fault-tolerant programming. IEEE Trans. , 1
18. 18)
  - Cristian, F., Aghili, H., Strong, R.: `Approximate clock synchronization despite omission and performance failures and processor joins', 16th Int. Symp. on Fault-tolerant Computing, 1986.
19. 19)
  - R. Carr . The Tandem global update protocols. Tandem Syst. Rev.
20. 20)
  - L. Lamport . Using time instead of timeout in fault-tolerant distributed systems. ACM Trans. Prog. Lang. Syst. , 2
21. 21)
  - J.M. Chang , N. Maxemchuck . Reliable broadcast protocols. ACM Trans. Comput. Syst. , 3
22. 22)
  - Ladin, R., Liskov, B., Shrira, L.: `Lazy replication: exploiting the semantics of distributed services', Proc. 9th ACM Symp. on Principles of Distributed Computing, 1990.
23. 23)
  - Birman, K., Joseph, T.: `Exploiting virtual synchrony in distributed systems', 11th ACM Symp. on Operating Systems Principles, 1987.
24. 24)
  - F. Cristian . Synchronous atomic broadcast for redundant broadcast channels. J. Real-time Syst. , 195 - 212
25. 25)
  - Halpern, J., Simons, B., Strong, R.: `Fault-tolerant clock synchronization', Proc. 3rd ACM PODS, 1984.
26. 26)
  - F. Schneider . Implementing fault-tolerant services using the state machine approach: a tutorial. Comput. Surv. , 4
27. 27)
  - Cristian, F.: `New asynchronous atomic broadcast protocols', 1st Workshop on Management of Replicated Data, November 1990, Houston, Texas.
28. 28)
  - T. Shrikanth , S. Toueg . Optimal clock synchronization. J. ACM , 3
29. 29)
  - F. Kaashoek , A. Tanenbaum , S. Hummel , H. Bal . An efficient reliable broadcast protocol. Oper. Syst. Rev. , 4

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Automatic reconfiguration in the presence of failures

Automatic reconfiguration in the presence of failures

Buy article PDF

Buy Knowledge Pack

Thank you

References

Related content