Skip to main content
Log in

On-line self-checking of replication consistency for autonomic computing

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In this paper we are concerned with the live verification of the consistency of a replicated system, an issue that has not been addressed by the research community so far. We consider the problem of how to enable the system to detect automatically and in production whether the invariants defining the correctness of object replication are violated. This feature could greatly improve the dependability of distributed applications and is necessary for constructing self-managing and self-healing replicated systems. We focus on systems that enforce strongly consistent replication: all replicas of each object must be kept “continuously” in-sync. This replication strategy is appropriate for application domains where correctness guarantees in spite of failures are more important than performance and scalability. We present the design and implementation of a replicated web service capable of self-checking whether all replicas are indeed kept in sync. This check occurs on-line, transparently to clients. We also discuss the performance cost of self-checking in our prototype.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. L. Moser, P. Melliar-Smith, and P. Narasimhan, Consistent object replication in the eternal system, Theory and Practice of Object Systems 4(2) (1998) 81–92.

    Article  Google Scholar 

  2. S. Mishra, L. Fei, X. Lin, and G. Xing, On group communication support in CORBA, IEEE Transactions on Parallel and Distributed Systems 12(2) (February 2001).

  3. S. Labourey, Bill Burke JBoss Clustering, The JBoss Group (2003).

  4. T. Amir, R. Caudy, A. Munjal, T. Schlossangle, and C. Tutu, N-Way fail-over infrastructure for reliable servers and routers, in: Proceedings of Dependable Systems and Networks (June 2003).

  5. Y. Ren, D. Bakken, T. Courtney, M.Cukier, D. Karr, P. Rubel, C. Sabnis, W. Sanders, R. Schantz, and M. Seri, AquA: an adaptive architecture that provides dependable distributed objects, IEEE Transactions on Computers 52(1) (January 2003) 31–49.

    Article  Google Scholar 

  6. D. Oppenheimer, A. Ganapathi, and D. Patterson, Why do INTERNET services fail, and what can be done about it?, in: Proceedings of the 4-th USENIX Symposium on Internet Technologies and Systems (March 2003).

  7. K. Birman, A Review of Experiences with Reliable Multicast, Software—Practice & Experience 29(9) (July 1999) 741–774.

    Article  Google Scholar 

  8. V. Castelli, R. E. Harper, P. Heidelberger, S. Hunter, K. Trivedi, K. Vaidyanathan, and W. P. Zeggert, Proactive management of software aging, IBM Journal of Research and Development 45(2) (March 2001).

  9. J. Kephart and D. Chess, The vision of autonomic computing IEEE Computer (January 2003).

  10. F. Hanik, In-memory Session Replication with Tomcat 4, (April 2002), theserverside.com.

  11. V. Cardellini, E. Casalicchio, M. Colajanni, and P. Yu, The State of the Art in Locally Distributed Web Server Systems, ACM Computing Surveys 34(2) (June 2002) 263–311.

    Article  Google Scholar 

  12. G. Masarin, A. Bartoli, and V. Maverick, On-line consistency checking for replicated objects, International Conference on Distributed Objects and Applications (DOA) 2003, poster session, OTM 2003 Workshops, Lecture Notes in Computer Science 2889, Springer Verlag. Full technical report: http://www.univ.trieste.it/bartolia/download/DOA-03-full.pdf.

  13. A. Bartoli, E. Antoniutti, and M. Prica, A replication framework for program-to-program interaction across unreliable networks and its implementation in a servlet container, Concurrency and Computation—Practice & Experience 18(7) (2006) 701–724.

    Article  Google Scholar 

  14. K. Gottschalk, S. Graham, H. Kreger, and J. Snell, Introduction to web services architecture, IBM Systems Journal 41(2) (2002) 170–177.

    Article  Google Scholar 

  15. L. Lamport, How to make a multiprocessor that correctly executes multiprocess programs, IEEE Transactions on Computers 28(9) (September 1979) 690–691.

    MATH  Google Scholar 

  16. M. Raynal, G. Thia-Kime, and M. Ahamad, From serializable to causal transactions for collaborative applications, in: Proceedings of the 23rd IEEE Euromicro Conference (September 1997) pp. 314–321.

  17. S. Frølund and R. Guerraoui, X-Ability: A theory of replication, Distributed Computing 14(4) (2001).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Bartoli.

Additional information

Alberto Bartoli is Associate Professor of Computer Engineering at the University of Trieste, Italy. He took a degree in Electrical Engineering in 1989 and a doctorate in Computer Engineering in 1994, both at the University of Pisa, Italy. His research interests are in the area of reliability and fault-tolerance in distributed systems.

Giovanni Masarin took a degree in Electronic Engineering in 2004, at the University of Trieste, Italy. He is currently involved in product development at RadioTrevisan, a company specialized in the production of lawful interception equipments.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bartoli, A., Masarin, G. On-line self-checking of replication consistency for autonomic computing. Cluster Comput 9, 449–463 (2006). https://doi.org/10.1007/s10586-006-0012-5

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-006-0012-5

Keywords

Navigation