Skip to main content
Log in

A Survey of Distributed Database Checkpointing

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Checkpointing a database is a vital technique to reduce the recovery time in the presence of a failure. For distributed databases, checkpointing also provides an efficient way to perform global reconstruction. In this paper, we survey and classify previous approaches for checkpointing a distributed database. Since the need for global reconstruction is infrequent in most distributed databases, a less restrictive and less resource-consuming approach to checkpoint distributed databases in an integrated distributed database system is recommended over a transaction consistent checkpoint approach. For a federated or multidatabase system, any type of global consistent checkpoint is difficult to achieve without violating local autonomy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. David Bell and Jane Grimson. Distributed Database Systems. Addison-Wesley publishers, 1992.

  2. P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987.

  3. P.A. Bernstein and N. Goodman. An algorithm for concurrency control an recovery in replicated distributed databases. ACM Transactions on Database Systems,9(4):596–615, 1984.

    Google Scholar 

  4. Y. Breitbart, Avi Silberschatz, and Glenn R. Thompson. Transaction management issues in a failure-prone multidatabase system environment. VLDB Journal, 1:1–39, 1992.

    Google Scholar 

  5. Yuri Breitbart, Hector Garcia-Molina, and Avi Silberschatz. Overview of multidatabase transaction management. VLDB Journal, 2:181–239, 1992.

    Google Scholar 

  6. Yuri Breitbart, Avi Silberschatz, and Glenn R. Thompson. Reliable transaction management in a multidatabase system. In Proceedings of the ACM SIGMOD Conference, 1990.

  7. K. Mani Chandy and Leslie Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1):63–75, February 1985.

    Google Scholar 

  8. Maxtor Corporation. The 71260a and 71050a by maxtor, May 1995. MaxFax-Maxtor's Information Faxback Service.

    Google Scholar 

  9. P. Dadam and G. Schlageter. Recovery in distributed databases bases on non-synchronized local checkpoints. In S. Lavington, editor, Proceedings of the IFIP Congress 80, pages 457–462, Tokyo, Japan and Melbourne, Australia, 1980. North-Holland.

  10. Andrew Deacon, Hans-Jorg Schek, and Gerhard Weikum. Semantics-based multilevel transaction management in federated systems. In Proceedings of the 10th International Conference on Data Engineering, 1994.

  11. G. Ferran. Distributed checkpointing in a distributed data management system. In Proc. Real-Time Systems Symposium, Miami Beach, Florida, pages 43–49, December 1981.

  12. Michael J. Fischer, Nancy D. Griffeth, and Nancy A. Lynch. Global states of a distributed system. IEEE Transactions on Software Engineering, SE-8(3):198–202, May 1982.

    Google Scholar 

  13. H. Garcia-Molina and B. Kogan. Node autonomy in distributed systems. In Proceedings of the First International Symposium on Databases for Parallel and Distributed Systems, pages 158–166, 1988.

  14. Dimitrios Georgakopoulos. Transaction management in multidatabase systems, December 1990. University of Houston Department of Computer Science PhD Dissertation.

  15. Dimitrios Georgakopoulos, Marek Rusinkiewicz, and Amit P. Sheth. Using tickets to enforce the serializability of multidatabase transactions. IEEE Transactions on Knowledge and Database Engineering, 6(1):1–15, February 1993.

    Google Scholar 

  16. V. Gligor and R. Popescu-Zeletin. Transaction management in distributed heterogeneous database management systems. Information Systems, 11(4), 1986.

  17. Virgil D. Gligor and Gary L. Luckenbaugh. Interconnecting heterogeneous database management systems. Computer, pages 33–43, January 1984.

  18. J. N. Gray. Notes on data base operating systems. In Operating Systems: an Advanced Course, volume 60, pages 393–481. Springer-Verlag, NewYork, 1978.

    Google Scholar 

  19. Jim Gray. Why do computers stop and what can be done about it? In Proceedings of the 5th Symposium on Reliability in Distributed Software and Database Systems, pages 3–12, 1986.

  20. Jim Gray and Andreas Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers, San Mateo, California, 1993.

    Google Scholar 

  21. Theo Haerder and Andreas Reuter. Principles of transaction-oriented database recovery. ACM Computing Surveys, 15(4):287–317, December 1983.

    Google Scholar 

  22. M. Jouve. Reliability aspects in a distributed data base management system. In Proc. AICA '77, Data Bases, pages 199–209, 1977.

  23. J.L. Kim and T. Park. An efficient recovery scheme for locking-based distributed database systems. In Proceedings. 13th Symposium on Reliable Distributed Systems, pages 116–25, Dana Point, CA, USA, Oct. 1994. IEEE.

    Google Scholar 

  24. Richard P. King, Nagui Halim, Hector Garcia-Molina, and Christos A. Polyzois. Management of a remote backup copy for disaster recovery. ACM Transactions on Database Systems, 16(2):338–368, June 1991.

    Google Scholar 

  25. W. H. Kohler. A survey of techniques for synchronization and recovery in decentralized comptuer systems. ACM Computing Surveys, 13(2):149–183, June 1981.

    Google Scholar 

  26. R. Koo and S. Toueg. Checkpointing and rollback-recovery for distributed systems. IEEE Transactions on Software Engineering, SE-13(1):23–31, January 1987.

    Google Scholar 

  27. Henry F. Korth. The double life of the transaction abstraction: Fundamental principle and evolving system concept. In Proceedings of 21th International Conference on Very Large Data Bases, September 1995.

  28. H. Kuss. Cold restart in distributed data bases. In Proc. IEEE INFOCOM 82, Las Vegas, March 1982.

  29. H. Kuss. On totally ordering checkpoints in distributed data bases. In Proceedings of the ACM International Conference on Management of Data, 1982.

  30. Jong Tae Lim and Song Chun Moon. A checkpointing scheme for heterogeneous distributed database systems. In Proceedings of the 11th International Conference on Distributed Computing Systems, pages 608–615, 1991.

  31. Jun-Lin Lin. On local-cost checkpointing approaches for main memory databases and distributed databases, in preparation 1997. Southern Methodist University Department of Computer Science and Engineering PhD Dissertation.

  32. B.G. Lindsay, P.G. Selinger, C. Galtieri, J.N. Gray, R.A. Lorie, T.G. Price, F. Putzolu, I.L. Traiger, and B.W. Wade. Notes on distributed databases. Technical report, IBM Research Report No. RJ2571, San Jose, Research.Laboratory (CA), July 1979.

  33. Bruce Lindsay, September 1995. personal email communication.

  34. J. Lyon. Design considerations in replicated database systems for disaster protection. In Proceedings of the IEEE Compcon, 1988.

  35. J. A. McDermid. Checkpointing and error recovery in distributed systems. In Proceedings of the 2nd International Conference on Distributed Computing Systems, pages 271–282, 1981.

  36. J. Eliot B. Moss. Checkpoint and Restart in Distributed Transaction Systems. In Proceedings of the 3th Symposium on Reliability in Distributed Software and Database Systems, pages 85–89, 1983.

  37. M. T. Özsu and P. Valduriez. Principles of Distributed Database Systems. Prentice-Hall publishers, 1991.

  38. S. Pilarski and T. Kameda. A novel checkpointing scheme for distributed database systems. In Proc. ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Sys., Nashville, TN, April 1990.

  39. S. Pilarski and T. Kameda. Checkpointing for distributed databases: starting from the basics. IEEE Transactions on Parallel and Distributed Systems, 3(5):602–10, Sept. 1992.

    Google Scholar 

  40. C. Pu. On-the-fly, incremental, consistent reading of entire databases. In Proceedings of the 11th Conference on Very Large Databases, Morgan Kaufman pubs. (Los Altos CA), Stockholm, pages 369–375, 1985.

    Google Scholar 

  41. C. Pu. On-the-fly, incremental, consistent reading of entire databases. Algorithmica, Springer Verlag Inc., 1(3):271–287, 1986.

    Google Scholar 

  42. C. Pu. Superdatabases for composition of heterogeneous databases. In IEEE Proceedings of the 4th International Conference on Data Engineering, 1988.

  43. C. Pu, C. H. Hong, and J. M. Wha. Performance evaluation of global reading of entire databases. In IEEE Intl. Symp. on Databases in Parallel and Distributed Systems, Austin TX, pages 167–176, December 1988.

  44. Werner Schaad and Hans-J. Schek. Federated transaction management using open nested transactions. In Proceedings of the Workshop on Interoperability of Database Systems and Database Applications, 1993.

  45. Werner Schaad, Hans-J. Schek, and G.Weikum. Implementation and performance of multi-level transaction management in a multidatabase environment. In Proceedings of International Workshop on Research Issues in Data Engineering, 1995.

  46. Hans-J. Schek, GerhardWeikum, and Werner Schaad. A multi-level transaction approach to federated dbms transaction management. In Proceedings of the International Workshop on Interoperability in Multidatabase Systems, 1991.

  47. G. Schlageter and P. Dadam. Reconstruction of consistent global states in distributed databases. In Proc. Int. Symposium on Distributed Data Bases 1, Delobel and Litwin ( eds ), Paris, France, North-Holland Publishing Company, March 1980.

    Google Scholar 

  48. Amit P. Sheth and James A. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22(3):183–236, 1990.

    Google Scholar 

  49. D. Skeen and M. Stonebraker. A formal model of crash recovery in a distributed system. IEEE Transactions on Software Engineering, SE-9(3):219–228, 1983.

    Google Scholar 

  50. S.H. Son. Efficient decentralized checkpointing in distributed database systems. In Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Vol. II. Software Track, pages 554–60, Kailua-Kona, HI, USA, Jan. 1988. IEEE.

    Google Scholar 

  51. S.H. Son. An adaptive checkpointing scheme for distributed databases with mixed types of transactions. IEEE Transactions on Knowledge and Data Engineering, 1(4):450–458, December 1989. Also published in/as: Proc. IEEE CS Intl. Conf. No. 4 on Data Engineering, pages pp528-535, Feb.dy1988, Los Angeles.

    Google Scholar 

  52. S.H. Son. An algorithm for non-interfering checkpoints and its practicality in distributed database systems. Information Systems, 14(5):421–429, 1989.

    Google Scholar 

  53. S.H. Son. Checkpointing and recovery in distributed database systems. IEEE Data Engineering Bulletin, 12(1):44–50, March 1989.

    Google Scholar 

  54. S.H. Son and A.K. Agrawala. A non-intrusive checkpointing scheme in distributed database systems. In IEEE 15th International Symposium on Fault-Tolerant Computing, Ann Arbor, Michigan, pages 99–104, 1985.

    Google Scholar 

  55. S.H. Son and A.K. Agrawala. An algorithm for database reconstruction in distributed environments. In Proc. IEEE 6th International Conference on Distributed Computing Systems, Cambridge, Massachusetts, pages 532–539, May 1986.

    Google Scholar 

  56. S.H. Son and A.K. Agrawala. Practicality of non-interfering checkpoints in distributed database systems. In IEEE Real-Time Systems Symposium, New Orleans, Louisiana, pages 234–241, Dec. 1986.

  57. S.H. Son and A.K. Agrawala. Distributed checkpointing for globally consistent states of databases. IEEE Transactions on Software Engineering, 15(10):1157–67, October 1989.

    Google Scholar 

  58. S.H. Son and S.C. Chiang. Experimental evaluation of a concurrent checkpointing algorithm. Technical report, University of Virginia, TR-90-01, January 1990.

  59. S.H. Son and K.M. Choe. Techniques for database recovery in distributed environments. Information and Software Technology, 30(5):285–294, June 1988.

    Google Scholar 

  60. S.H. Son and S.K. Tripathi. Distributed database systems: Failure recovery procedure. Technical report, University of Virginia, TR-88-06, March 1988.

  61. Nandit Soparkar, Henry F. Korth, and Abraham Silberschatz. Failure-resilient transaction management in multidatabases. Computer, pages 28–36, December 1991.

  62. Seagate Technologies, September 1995. URL http://www.seagate.com/sales/cuda/cudatext.html.

  63. J. S. Verhofstadt. Recovery techniques for database systems. ACM Computing Surveys, 10(2):168–195, June 1978.

    Google Scholar 

  64. A. Wolski and J. Veijalainen. 2pc agent method: Achieving serializability in presence of failures in a heterogeneous multidatabase. In Proceedings of the PARBASE-90 Conference, February 1990.

  65. Cheng-Ru Young and Ge-Ming Chiu. A crash recovery technique in distributed computing systems. In Proceedings of the 14th International Conference on Distributed Computing Systems, pages 235–242, 1994.

  66. G. Zurfluh. Failure survivability mechanisms in plexus project. In Proceedings of the International Symposium on Distributed Data Sharing Systems, pages 83–92, 1981.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, JL., Dunham, M.H. A Survey of Distributed Database Checkpointing. Distributed and Parallel Databases 5, 289–319 (1997). https://doi.org/10.1023/A:1008689312900

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008689312900

Navigation