Skip to main content

Analysis of a Software System with Rejuvenation, Restoration and Checkpointing

  • Conference paper
Service Availability (ISAS 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5017))

Included in the following conference series:

Abstract

In this paper we consider operational software system with two failure modes and develop a stochastic model to quantify steady-state system availability. Three kinds of preventive/corrective maintenance policies; rejuvenation, restoration and checkpointing, are incorporated in our unified availability model. We propose a dynamic programming algorithm to determine the joint optimal maintenance schedule maximizing the steady-state system availability and calculate the optimal aperiodic checkpoint sequence and preventive rejuvenation time simultaneously. In numerical examples, the sensitivity of model parameters to characterize failure modes are examined, and effects of the preventive/corrective maintenance policies are studied in details.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Adams, E.: Optimizing preventive service of the software products. IBM Journal of Research & Development 28, 2–14 (1984)

    Article  Google Scholar 

  2. Avritzer, A., Weyuker, E.J.: Monitoring smoothly degrading systems for increased dependability. Empirical Software Engineering 2, 59–77 (1997)

    Article  Google Scholar 

  3. Baccelli, F.: Analysis of s service facility with periodic checkpointing. Acta Informatica 15, 67–81 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  4. Barlow, R., Proschan, F.: Mathematical Theory of Reliability. John Wiley & Sons, Chichester (1965)

    MATH  Google Scholar 

  5. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  6. Bobbio, A., Garg, S., Gribaudo, M., Horvath, A., Sereno, M., Telek, M.: Modeling software systems with rejuvenation, restoration and checkpointing through fluid stochastic Petri nets. In: Proceedings of International Workshop on Petri Nets and Performance Models (PNPM 1999), pp. 82–91. IEEE CS Press, Los Alamitos (1999)

    Google Scholar 

  7. Bobbio, A., Sereno, M., Anglano, C.: Fine grained software degradation models for optimal rejuvenation policies. Performance Evaluation 46, 45–62 (2001)

    Article  MATH  Google Scholar 

  8. Bao, Y., Sun, X., Trivedi, K.S.: A workload-based analysis of software aging, and rejuvenation. IEEE Transactions on Reliability 54(3), 541–548 (2005)

    Article  Google Scholar 

  9. Castelli, V., Harper, R.E., Heidelberger, P., Hunter, S.W., Trivedi, K.S., Vaidyanathan, K., Zeggert, W.P.: Proactive management of software aging. IBM J. Research & Development 45, 311–332 (2001)

    Article  Google Scholar 

  10. Chandy, K.M.: A survey of analytic models of roll-back and recovery strategies. Computer 8(5), 40–47 (1975)

    Article  Google Scholar 

  11. Chandy, K.M., Browne, J.C., Dissly, C.W., Uhrig, W.R.: Analytic models for rollback and recovery strategies in database systems. IEEE Transactions on Software Engineering SE-1(1), 100–110 (1975)

    Article  Google Scholar 

  12. Dohi, T., Kaio, N., Osaki, S.: The optimal age-dependent checkpoint strategy for a stochastic system subject to general failure mode. Journal of Mathematical Analysis and Applications 249, 80–94 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  13. Dohi, T., Goseva-Popstojanova, K., Trivedi, K.S.: Estimating software rejuvenation schedule in high assurance systems. The Computer Journal 44(6), 473–485 (2001)

    Article  MATH  Google Scholar 

  14. Dohi, T., Kaio, N., Trivedi, K.S.: Availability models with age dependent-checkpointing. In: Proceedings of 21st Symposium on Reliable Distributed Systems (SRDS 2002), pp. 130–139. IEEE CS Press, Los Alamitos (2002)

    Google Scholar 

  15. Dohi, T., Suzuki, H., Trivedi, K.S.: Comparing software rejuvenation policies under different dependability measures. IEICE Transactions on Information and Systems (D) E87-D(8), 2078–2085 (2004)

    Google Scholar 

  16. Dohi, T., Ozaki, T., Kaio, N.: Optimal sequential checkpoint placement with equality constraints. In: Proceedings of The 2nd IEEE International Symposium on Dependable Autonomic and Secure Computing (DASC 2006), pp. 77–84. IEEE CS Press, Los Alamitos (2006)

    Chapter  Google Scholar 

  17. Eto, H., Dohi, T.: Analysis of a service degradation model with preventive rejuvenation. In: Penkler, D., Reitenspiess, M., Tam, F. (eds.) ISAS 2006. LNCS, vol. 4328, pp. 17–29. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Fukumoto, S., Kaio, N., Osaki, S.: A study of checkpoint generations for a database recovery mechanism. Computers Math. Applic. 24, 63–70 (1992)

    Article  MATH  Google Scholar 

  19. Fukumoto, S., Kaio, N., Osaki, S.: Optimal checkpointing strategies using the checkpointing density. Journal of Information Processing 15, 87–92 (1992)

    MathSciNet  Google Scholar 

  20. Garg, S., Telek, M., Puliafito, A., Trivedi, K.S.: Analysis of software rejuvenation using Markov regenerative stochastic Petri net. In: Proceedings of 6th International Symposium on Software Reliability Engineering (ISSRE 1995), pp. 24–27. IEEE CS Press, Los Alamitos (1995)

    Google Scholar 

  21. Garg, S., Huang, Y., Kintala, C., Trivedi, K.S.: Minimizing completion time of a program by checkpointing and rejuvenation. In: Proceedings of 1996 ACM SIGMETRICS Conference, pp. 252–261. ACM Press, New York (1996)

    Chapter  Google Scholar 

  22. Garg, S., Pfening, S., Puliafito, A., Telek, M., Trivedi, K.S.: Analysis of preventive maintenance in transactions based software systems. IEEE Transactions on Computers 47, 96–107 (1998)

    Article  Google Scholar 

  23. Gottfried, B.S.: A stopping criterion for the golden-ratio search. Operations Research 23, 553–555 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  24. Huang, Y., Kintala, C., Kolettin, N., Funton, N.D.: Software rejuvenation: analysis, module and applications. In: Proceedings 25th International Symposium on Fault Tolerant Computing (FTC 1995), pp. 381–390. IEEE CS Press, Los Alamitos (1995)

    Google Scholar 

  25. Gelenbe, E., Derochette, D.: Performance of rollback recovery systems under intermittent failures. Communications of the ACM 21(6), 493–499 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  26. Gelenbe, E.: On the optimum checkpoint interval. Journal of the ACM 26(2), 259–270 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  27. Gelenbe, E.E., Hernandez, M.: Optimum checkpoints with age dependent failures. Acta Informatica 27, 519–531 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  28. Goes, P.B., Sumita, U.: Stochastic models for performance analysis of database recovery control. IEEE Transactions on Computers C-44(4), 561–576 (1995)

    Article  Google Scholar 

  29. Goes, P.B.: A stochastic model for performance evaluation of main memory resident database systems. ORSA Journal of Computing 7(3), 269–282 (1997)

    Article  Google Scholar 

  30. Grassi, V., Donatiello, L., Tucci, S.: On the optimal checkpointing of critical tasks and transaction-oriented systems. IEEE Transactions on Software Engineering SE-18(1), 72–77 (1992)

    Article  Google Scholar 

  31. Kulkarni, V.G., Nicola, V.F., Trivedi, K.S.: Effects of checkpointing and queueing on program performance. Stochastic Models 6(4), 615–648 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  32. L’Ecuyer, P., Malenfant, J.: Computing optimal checkpointing strategies for rollback and recovery systems. IEEE Transactions on Computers C-37(4), 491–496 (1988)

    Article  Google Scholar 

  33. Ling, Y., Mi, J., Lin, X.: A variational calculus approach to optimal checkpoint placement. IEEE Transactions on Computers 50(7), 699–707 (2001)

    Article  Google Scholar 

  34. Nicola, V.F., Van Spanje, J.M.: Comparative analysis of different models of checkpointing and recovery. IEEE Transactions on Software Engineering SE-16(8), 807–821 (1990)

    Article  Google Scholar 

  35. Nicola, V.F.: Checkpointing and modeling of program execution time. In: Lyu, M.R. (ed.) Software Fault Tolerance, pp. 167–188. John Wiley & Sons, Chichester (1995)

    Google Scholar 

  36. Okamura, H., Miyahara, S., Dohi, T.: Dependability analysis of a client/server software systems with rejuvenation. In: Proceedings of 13th International Symposium on Software Reliability Engineering (ISSRE 2002), pp. 171–180. IEEE CS Press, Los Alamitos (2002)

    Chapter  Google Scholar 

  37. Okamura, H., Miyahara, S., Dohi, T.: Dependability analysis of a transaction-based multi server system with rejuvenation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (A) E86-A(8), 2081–2090 (2003)

    Google Scholar 

  38. Okamura, H., Fujio, H., Dohi, T.: Fine-grained shock models to rejuvenate software systems. IEICE Transactions on Information and Systems (D) E86-D(10), 2165–2171 (2003)

    Google Scholar 

  39. Okamura, H., Nishimura, Y., Dohi, T.: A dynamic checkpointing scheme based on reinforcement learning. In: Proceedings of The 10th International Symposium on Pacific Rim Dependable Computing (PRDC 2004), pp. 151–158. IEEE CS Press, Los Alamitos (2004)

    Chapter  Google Scholar 

  40. Okamura, H., Miyahara, S., Dohi, T.: Rejuvenating communication network system with burst arrival. IEICE Transactions on Communications (B) E88-B(12), 4498–4506 (2005)

    Article  Google Scholar 

  41. Okamura, H., Iwamoto, K., Dohi, T.: A dynamic programming algorithm for software rejuvenation scheduling under distributed computation circumstance. In: Proceedings of IEEE 11th International Conference on Parallel and Distributed Systems (ICPDS 2005), vol. II, pp. 493–497. IEEE CS Press, Los Alamitos (2005)

    Google Scholar 

  42. Okamura, H., Iwamoto, K., Dohi, T.: A DP-based optimal checkpointing algorithm for real-time appications. International Journal of Reliability, Quality and Safety Engineering 13(4), 323–340 (2006)

    Article  Google Scholar 

  43. Ozaki, T., Dohi, T., Okamura, H., Kaio, N.: Distribution-free checkpoint placement algorithms based on min-max principle. IEEE Transactions on Dependable and Secure Computing 3(2), 130–140 (2006)

    Article  Google Scholar 

  44. Pfening, S., Garg, S., Puliafito, A., Telek, M., Trivedi, K.S.: Optimal rejuvenation for tolerating soft failure. Performance Evaluation 27/28(4), 491–506 (1996)

    Google Scholar 

  45. Puterman, M.: Markov Decision Processes. John Wiley & Sons, New York (1994)

    Book  MATH  Google Scholar 

  46. Reinecke, P., van Moorsel, A.P., Wolter, K.: A measurement study of the interplay between application level restart and transport protocol. In: Malek, M., Reitenspiess, M., Kaiser, J. (eds.) ISAS 2004. LNCS, vol. 3335, pp. 86–100. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  47. Rinsaka, K., Dohi, T.: Behavioral analysis of fault-torellant software systems with rejuvenation. IEICE Transactions on Information and Systems (D) E88-D(12), 2681–2690 (2005)

    Article  Google Scholar 

  48. Rinsaka, K., Dohi, T.: A faster estimation algorithm for periodic preventive rejuvenation schedule maximizing system availability. In: Malek, M., Reitenspieß, M., van Moorsel, A. (eds.) ISAS 2007. LNCS, vol. 4526, pp. 94–104. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  49. Tai, A.T., Alkalai, L., Chau, S.N.: On-board preventive maintenance: a design-oriented analytic study for long-life applications. Performance Evaluation 35(3/4), 215–232 (1999)

    Article  MATH  Google Scholar 

  50. Toueg, S., Babaog̃lu, Ö.: On the optimum checkpoint selection problem. SIAM Journal of Computing 13(3), 630–649 (1984)

    Google Scholar 

  51. Vaidyanathan, K.V., Harper, R.E., Hunter, S.W., Trivedi, K.S.: Analysis of software rejuvenation in cluster systems. In: Proceedings of ACM SIGMETRICS 2001/Performance 2001, pp. 62–71. ACM Press, New York (2001)

    Google Scholar 

  52. Vaidyanathan, K.V., Trivedi, K.S.: A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing 2(2), 124–137 (2005)

    Article  Google Scholar 

  53. Vaidya, N.H.: Impact of checkpoint latency on overhead ratio of a checkpointing scheme. IEEE Transactions on Computers C-46(8), 942–947 (1997)

    Article  Google Scholar 

  54. van Moorsel, A.P., Wolter, K.: Optimal restart times for moments of completion time. IEE Proceedings of Software 151(5), 219–223 (2004)

    Article  Google Scholar 

  55. van Moorsel, A.P., Wolter, K.: Analysis of restart mechanisms in software systems. IEEE Transactions on Software Engineering 32(8), 547–558 (2006)

    Article  Google Scholar 

  56. Wang, D., Xie, W., Trivedi, K.S.: Performability analysis of clustered systems with rejuvenation under varying workload. Performance Evaluation (in press)

    Google Scholar 

  57. Ziv, A., Bruck, J.: An on-line algorithm for checkpoint placement. IEEE Transactions on Computers C-46(9), 976–985 (1997)

    Article  MathSciNet  Google Scholar 

  58. Young, J.W.: A first order approximation to the optimum checkpoint interval. Communications of the ACM 17(9), 530–531 (1974)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takashi Nanya Fumihiro Maruyama András Pataricza Miroslaw Malek

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Okamura, H., Dohi, T. (2008). Analysis of a Software System with Rejuvenation, Restoration and Checkpointing. In: Nanya, T., Maruyama, F., Pataricza, A., Malek, M. (eds) Service Availability. ISAS 2008. Lecture Notes in Computer Science, vol 5017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68129-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68129-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68128-1

  • Online ISBN: 978-3-540-68129-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics