Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

Automatic reconfiguration in the presence of failures

Automatic reconfiguration in the presence of failures

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
Software Engineering Journal — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

We describe a new kind of distributed system service, the availability management service, responsible for ensuring that the critical services of a distributed system remain continuously available to users despite arbitrary numbers of concurrent node removals and node restarts caused by failures, maintenance, and growth. We stress the main ideas behind this new service, and outline a simple design that depends on the existence of synchronous membership and atomic broadcast group communication services. Extensions of this initial design to deal with asynchronous group communication services are also briefly discussed.

References

    1. 1)
      • O. Babaoglu , R. Drumond . Streets of Byzantium: network architectures for fast reliable broadcasts. IEEE Trans. , 6
    2. 2)
      • Powell, D., Bonn, G., Seaton, D., Verissimo, P., Waeselynck, F.: `The Delta-4 approach to dependability in open distributed computing systems', Proc. 18th Int. Symp. on Fault-tolerant Computing, 1988.
    3. 3)
      • F. Cristian . Understanding fault-tolerant distributed systems. Commun. ACM , 2
    4. 4)
      • Amir, V., Dolev, D., Kramer, S., Malki, D.: `Transis: a communication sub-system for high availability', 22nd Int. Symp. on Fault-tolerant Computing, 1992.
    5. 5)
      • Mishra, S., Peterson, L., Schlichting, R.: `Implementing fault-tolerant objects using Psync', Proc. 8th Symp. on Reliable Distributed Systems, 1989.
    6. 6)
      • Cristian, F., Aghili, H., Strong, R., Doley, D.: `Atomic broadcast: from simple message diffusion to Byzantine Agreement', 15th Int. Symp. on Fault-tolerant Computing, 1985.
    7. 7)
      • Cristian, F.: `Reaching agreement on processor-group membership in synchronous distributed systems', 18th Int. Symp. on Fault-tolerant Computing, 1988.
    8. 8)
      • K. Birman , A. Schiper , P. Stephenson . Light-weight causal and atomic group multicast. ACM Trans. Syst. , 3
    9. 9)
      • F. Cristian . Probabilistic clock synchronization. Distrib. Comput. , 146 - 158
    10. 10)
      • S. Shrivastava , P. Ezhilchelvan , N. Speirs , S. Tao , A. Tully . Principle features of the Voltan family of reliable node architectures for distributed systems. IEEE Trans. , 5
    11. 11)
      • Lundelius, J., Lynch, N.: `A new fault-tolerant algorithm for clock synchronization', Proc. 3rd ACM PODS, 1984.
    12. 12)
      • Cristian, F., Dehn, J., Dancey, B.: `Fault-tolerance in the advanced automation system', 20th Int. Symp. on Fault-tolerant Computing, 1990.
    13. 13)
      • L. Lamport , M. Melliar-Smith . Synchronizing clocks in the presence of faults. J. ACM , 1
    14. 14)
      • Gray, J.: `Why do computers stop and what can be done about it?', 5th Symp. on Reliability in Distributed Software and Database Systems, 1986.
    15. 15)
      • D. Parnas . A technique for software module specification with examples. Commun. ACM , 5
    16. 16)
      • H. Kopetz . Clock synchronization in distributed real-time systems. IEEE Trans. , 8
    17. 17)
      • F. Cristian . A rigorous approach to fault-tolerant programming. IEEE Trans. , 1
    18. 18)
      • Cristian, F., Aghili, H., Strong, R.: `Approximate clock synchronization despite omission and performance failures and processor joins', 16th Int. Symp. on Fault-tolerant Computing, 1986.
    19. 19)
      • R. Carr . The Tandem global update protocols. Tandem Syst. Rev.
    20. 20)
      • L. Lamport . Using time instead of timeout in fault-tolerant distributed systems. ACM Trans. Prog. Lang. Syst. , 2
    21. 21)
      • J.M. Chang , N. Maxemchuck . Reliable broadcast protocols. ACM Trans. Comput. Syst. , 3
    22. 22)
      • Ladin, R., Liskov, B., Shrira, L.: `Lazy replication: exploiting the semantics of distributed services', Proc. 9th ACM Symp. on Principles of Distributed Computing, 1990.
    23. 23)
      • Birman, K., Joseph, T.: `Exploiting virtual synchrony in distributed systems', 11th ACM Symp. on Operating Systems Principles, 1987.
    24. 24)
      • F. Cristian . Synchronous atomic broadcast for redundant broadcast channels. J. Real-time Syst. , 195 - 212
    25. 25)
      • Halpern, J., Simons, B., Strong, R.: `Fault-tolerant clock synchronization', Proc. 3rd ACM PODS, 1984.
    26. 26)
      • F. Schneider . Implementing fault-tolerant services using the state machine approach: a tutorial. Comput. Surv. , 4
    27. 27)
      • Cristian, F.: `New asynchronous atomic broadcast protocols', 1st Workshop on Management of Replicated Data, November 1990, Houston, Texas.
    28. 28)
      • T. Shrikanth , S. Toueg . Optimal clock synchronization. J. ACM , 3
    29. 29)
      • F. Kaashoek , A. Tanenbaum , S. Hummel , H. Bal . An efficient reliable broadcast protocol. Oper. Syst. Rev. , 4
http://iet.metastore.ingenta.com/content/journals/10.1049/sej.1993.0009
Loading

Related content

content/journals/10.1049/sej.1993.0009
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address