Enhancing Replica Management Services to Cope with Group Failures

Ezhilchelvan, Paul D.; Shrivastava, Santosh K.

doi:10.1007/3-540-46475-1_4

Paul D. Ezhilchelvan⁶ &
Santosh K. Shrivastava⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1752))

356 Accesses
1 Citations

Abstract

In a distributed system, replication of components, such as objects, is a well known way of achieving availability. For increased availability, crashed and disconnected components must be replaced by new components on available spare nodes. This replacement results in the membership of the replicated group ‘walking’ over a number of machines during system operation. In this context, we address the problem of reconfiguring a group after the group as an entity has failed. Such a failure is termed a group failure which, for example, can be the crash of every component in the group or the group being partitioned into minority islands. The solution assumes crash-proof storage, and eventual recovery of crashed nodes and healing of partitions. It guarantees that (i) the number of groups reconfigured after a group failure is never more than one, and (ii) the reconfigured group contains a majority of the components which were members of the group just before the group failure occurred, so that the loss of state information due to a group failure is minimal. Though the protocol is subject to blocking, it remains efficient in terms of communication rounds and use of stable store, during both normal operations and reconfiguration after a group failure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Y. Amir, Dolev, D., Kramer, S., and Malki, D., “Membership Algorithm for Multicast Communication Groups”, Proc. of 6th Intl. Workshop on Dist. Algorithms, pp 292–312, November 1992.
Google Scholar
O. Babaoglu, R. Davoli, and A Montresor, “Group Membership and View Synchrony in Partitionable Asynchronous Distributed Systems: Specifications”, Technical Report UBLCS-95-18, Dept. of Computer Science, University of Bologna, Italy, Nov 1995.
Google Scholar
O. Babaoglu, A. Bartoli, and G Dini, “Enriched View Synchrony: A Programming Paradigm for Partitionable Asynchronous Distributed Systems”, IEEE ToCS, 46(6), June 1997, pp.642–658.
Google Scholar
K. Birman and T. Joseph, “Exploiting virtual synchrony in distributed systems”, Proc. of 11th ACM Symposium on Operating System Principles, Austin, November 1987, pp. 123–138.
Google Scholar
D. Black, P. Ezhilchelvan and S.K. Shrivastava, “Determining the Last Membership of a Process Group after a Total Failure”, Tech. Report No. 602, Dept. of Computing Science, University of Newcastle upon Tyne.
Google Scholar
T. D. Chandra, V. Hadzilacos, and S. Toueg, “The weakest Failure Detector for Solving Consensus”, JACM, 43(4), pp. 685–722, July 1996.
Article MATH MathSciNet Google Scholar
P. Ezhilchelvan, R. Macedo and S. K. Shrivastava, “Newtop: a fault-tolerant group communication protocol”, 15th IEEE Intl. Conf. on Distributed Computing Systems, Vancouver, May 1995, pp. 296–306.
Google Scholar
P D Ezhilchelvan and S K Shrivastava, “Enhancing Replica Management Services to Tolerate Group Failures”, Proceedings of the second International Symposium on Object oriented Real-time Computing (ISORC), May 1999, St Malo, France.
Google Scholar
P Felber, R Guerraoui and A Schiper, “The implementation of CORBA Object service”, Theory and Prctice of Object Systems, Vol. 4,No. 2, 1998, pp. 93–105.
Article Google Scholar
J N Gray, “Notes on Database Operating Systems”, In Operating Systems: An Advanced Course, Lecture Notes In Computer Science, Vol 60, pp. 393–481. Springer Verlag, Berlin, 1978.
Google Scholar
M. Hurfin and M. Raynal, “Asynchronous Protocols to Meet Real-Time Constraints: Is It Really Sensible? How to Proceed?”, Proc. of 1st Int. Symp. on Object-Oriented Real-Time Distributed Computing, (ISORC98) pp. 290–297, April 98.
Google Scholar
S Jajodia and D Mutchler, “Dynamic Voting Algorithms for Maintaining the Consistency of a Replicated Database”, ACM Transactions on Database Systems, Vol 15,No 2, June 1990, pp. 230–280
Article Google Scholar
I Keidar and D Dolev, “Increasing the Resilience of Distributed and Replicated Database Systems”, Journal of Computer and System Sciences (JCSS). 1995.
Google Scholar
E Y Lotem, I Keidar and D Dolev, “Dynamic Voting for Consistent Primary Components”, Proceedings of ACM Symposium on Principles of Distributed Computing (PODC), pp. 63–71, 1997.
Google Scholar
C Malloth and A Schiper, “Virtually Synchronous Communication in Large Scale Networks”, BROADCAST Third Year Report, Vol 3, Chapter 2, July 1995. (Anonymous ftp from broadcast.esprit.ec.org in directory projects/broadcast/reports)
Google Scholar
P. M. Melliar-Smith, Moser L.E., and Agarwala, V., “Membership Algorithms for Asynchronous Distributed Systems”, Proc. of 12th Intl. Conf. on Distributed Comp. Systems, pp. 480–488, May 1991.
Google Scholar
S. Mishra, L. Peterson and R. Schlichting, “A membership Protocol Based on Partial Order”, Proc. IFIP Conf. on Dependable Computing For Critical Applications, Tuscon, Feb. 1991, pp 137–145.
Google Scholar
L.E. Moser, P.M. Melliar-Smith et al, “Totem: a Fault-tolerant multicast group communication system”, CACM, 39(4), April 1996, pp. 54–63.
Google Scholar
P. Murray, R. Flemming, P. Harry and P. Vickers, “Somersault software fault-tolerance”, Hewlett-Packard Technical Report, 1997.
Google Scholar
A. Ricciardi and K P Birman, “Using Process Groups to Implement Failure Detection in Asynchronous Environments”, In Proceedings of ACM symposium on PoDC, pp. 480–488, May 91.
Google Scholar
A Schiper and A Sandoz, “Primary-Partition Virtually Synchronous Communication is Harder Than Consensus”, Proc. of the 8th International Workshop on Distributed Algorithms (WDAG-94), Sept. 94, LNCS 857, Springer Verlag. (Also in BROADCAST Second Year Report, Vol 2, October 1994).
Google Scholar
D. Skeen, “Non-Blocking Commit Protocols”, ACM SIGMOD, pp.133–142, 1981.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Newcastle upon Tyne, Newcastle upon Tyne, NE1 7RU, UK
Paul D. Ezhilchelvan & Santosh K. Shrivastava

Authors

Paul D. Ezhilchelvan
View author publications
You can also search for this author in PubMed Google Scholar
Santosh K. Shrivastava
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Grenoble Laboratoire SIRAC, INRIA Rhône-Alpes, Université Joseph fourier, 655 avenue de l’Europe, 38330, Monbonnot Saint-Martin, France
Sacha Krakowiak
Department of Computing Science, University of Newcastle upon Tyne, Newcastle upon Tyne, NE1 7RU, UK
Santosh Shrivastava

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ezhilchelvan, P.D., Shrivastava, S.K. (2000). Enhancing Replica Management Services to Cope with Group Failures. In: Krakowiak, S., Shrivastava, S. (eds) Advances in Distributed Systems. Lecture Notes in Computer Science, vol 1752. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46475-1_4

Download citation

DOI: https://doi.org/10.1007/3-540-46475-1_4
Published: 28 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67196-1
Online ISBN: 978-3-540-46475-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics