Abstract
In this paper we consider operational software system with two failure modes and develop a stochastic model to quantify steady-state system availability. Three kinds of preventive/corrective maintenance policies; rejuvenation, restoration and checkpointing, are incorporated in our unified availability model. We propose a dynamic programming algorithm to determine the joint optimal maintenance schedule maximizing the steady-state system availability and calculate the optimal aperiodic checkpoint sequence and preventive rejuvenation time simultaneously. In numerical examples, the sensitivity of model parameters to characterize failure modes are examined, and effects of the preventive/corrective maintenance policies are studied in details.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adams, E.: Optimizing preventive service of the software products. IBM Journal of Research & Development 28, 2–14 (1984)
Avritzer, A., Weyuker, E.J.: Monitoring smoothly degrading systems for increased dependability. Empirical Software Engineering 2, 59–77 (1997)
Baccelli, F.: Analysis of s service facility with periodic checkpointing. Acta Informatica 15, 67–81 (1981)
Barlow, R., Proschan, F.: Mathematical Theory of Reliability. John Wiley & Sons, Chichester (1965)
Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
Bobbio, A., Garg, S., Gribaudo, M., Horvath, A., Sereno, M., Telek, M.: Modeling software systems with rejuvenation, restoration and checkpointing through fluid stochastic Petri nets. In: Proceedings of International Workshop on Petri Nets and Performance Models (PNPM 1999), pp. 82–91. IEEE CS Press, Los Alamitos (1999)
Bobbio, A., Sereno, M., Anglano, C.: Fine grained software degradation models for optimal rejuvenation policies. Performance Evaluation 46, 45–62 (2001)
Bao, Y., Sun, X., Trivedi, K.S.: A workload-based analysis of software aging, and rejuvenation. IEEE Transactions on Reliability 54(3), 541–548 (2005)
Castelli, V., Harper, R.E., Heidelberger, P., Hunter, S.W., Trivedi, K.S., Vaidyanathan, K., Zeggert, W.P.: Proactive management of software aging. IBM J. Research & Development 45, 311–332 (2001)
Chandy, K.M.: A survey of analytic models of roll-back and recovery strategies. Computer 8(5), 40–47 (1975)
Chandy, K.M., Browne, J.C., Dissly, C.W., Uhrig, W.R.: Analytic models for rollback and recovery strategies in database systems. IEEE Transactions on Software Engineering SE-1(1), 100–110 (1975)
Dohi, T., Kaio, N., Osaki, S.: The optimal age-dependent checkpoint strategy for a stochastic system subject to general failure mode. Journal of Mathematical Analysis and Applications 249, 80–94 (2000)
Dohi, T., Goseva-Popstojanova, K., Trivedi, K.S.: Estimating software rejuvenation schedule in high assurance systems. The Computer Journal 44(6), 473–485 (2001)
Dohi, T., Kaio, N., Trivedi, K.S.: Availability models with age dependent-checkpointing. In: Proceedings of 21st Symposium on Reliable Distributed Systems (SRDS 2002), pp. 130–139. IEEE CS Press, Los Alamitos (2002)
Dohi, T., Suzuki, H., Trivedi, K.S.: Comparing software rejuvenation policies under different dependability measures. IEICE Transactions on Information and Systems (D) E87-D(8), 2078–2085 (2004)
Dohi, T., Ozaki, T., Kaio, N.: Optimal sequential checkpoint placement with equality constraints. In: Proceedings of The 2nd IEEE International Symposium on Dependable Autonomic and Secure Computing (DASC 2006), pp. 77–84. IEEE CS Press, Los Alamitos (2006)
Eto, H., Dohi, T.: Analysis of a service degradation model with preventive rejuvenation. In: Penkler, D., Reitenspiess, M., Tam, F. (eds.) ISAS 2006. LNCS, vol. 4328, pp. 17–29. Springer, Heidelberg (2006)
Fukumoto, S., Kaio, N., Osaki, S.: A study of checkpoint generations for a database recovery mechanism. Computers Math. Applic. 24, 63–70 (1992)
Fukumoto, S., Kaio, N., Osaki, S.: Optimal checkpointing strategies using the checkpointing density. Journal of Information Processing 15, 87–92 (1992)
Garg, S., Telek, M., Puliafito, A., Trivedi, K.S.: Analysis of software rejuvenation using Markov regenerative stochastic Petri net. In: Proceedings of 6th International Symposium on Software Reliability Engineering (ISSRE 1995), pp. 24–27. IEEE CS Press, Los Alamitos (1995)
Garg, S., Huang, Y., Kintala, C., Trivedi, K.S.: Minimizing completion time of a program by checkpointing and rejuvenation. In: Proceedings of 1996 ACM SIGMETRICS Conference, pp. 252–261. ACM Press, New York (1996)
Garg, S., Pfening, S., Puliafito, A., Telek, M., Trivedi, K.S.: Analysis of preventive maintenance in transactions based software systems. IEEE Transactions on Computers 47, 96–107 (1998)
Gottfried, B.S.: A stopping criterion for the golden-ratio search. Operations Research 23, 553–555 (1975)
Huang, Y., Kintala, C., Kolettin, N., Funton, N.D.: Software rejuvenation: analysis, module and applications. In: Proceedings 25th International Symposium on Fault Tolerant Computing (FTC 1995), pp. 381–390. IEEE CS Press, Los Alamitos (1995)
Gelenbe, E., Derochette, D.: Performance of rollback recovery systems under intermittent failures. Communications of the ACM 21(6), 493–499 (1978)
Gelenbe, E.: On the optimum checkpoint interval. Journal of the ACM 26(2), 259–270 (1979)
Gelenbe, E.E., Hernandez, M.: Optimum checkpoints with age dependent failures. Acta Informatica 27, 519–531 (1990)
Goes, P.B., Sumita, U.: Stochastic models for performance analysis of database recovery control. IEEE Transactions on Computers C-44(4), 561–576 (1995)
Goes, P.B.: A stochastic model for performance evaluation of main memory resident database systems. ORSA Journal of Computing 7(3), 269–282 (1997)
Grassi, V., Donatiello, L., Tucci, S.: On the optimal checkpointing of critical tasks and transaction-oriented systems. IEEE Transactions on Software Engineering SE-18(1), 72–77 (1992)
Kulkarni, V.G., Nicola, V.F., Trivedi, K.S.: Effects of checkpointing and queueing on program performance. Stochastic Models 6(4), 615–648 (1990)
L’Ecuyer, P., Malenfant, J.: Computing optimal checkpointing strategies for rollback and recovery systems. IEEE Transactions on Computers C-37(4), 491–496 (1988)
Ling, Y., Mi, J., Lin, X.: A variational calculus approach to optimal checkpoint placement. IEEE Transactions on Computers 50(7), 699–707 (2001)
Nicola, V.F., Van Spanje, J.M.: Comparative analysis of different models of checkpointing and recovery. IEEE Transactions on Software Engineering SE-16(8), 807–821 (1990)
Nicola, V.F.: Checkpointing and modeling of program execution time. In: Lyu, M.R. (ed.) Software Fault Tolerance, pp. 167–188. John Wiley & Sons, Chichester (1995)
Okamura, H., Miyahara, S., Dohi, T.: Dependability analysis of a client/server software systems with rejuvenation. In: Proceedings of 13th International Symposium on Software Reliability Engineering (ISSRE 2002), pp. 171–180. IEEE CS Press, Los Alamitos (2002)
Okamura, H., Miyahara, S., Dohi, T.: Dependability analysis of a transaction-based multi server system with rejuvenation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences (A) E86-A(8), 2081–2090 (2003)
Okamura, H., Fujio, H., Dohi, T.: Fine-grained shock models to rejuvenate software systems. IEICE Transactions on Information and Systems (D) E86-D(10), 2165–2171 (2003)
Okamura, H., Nishimura, Y., Dohi, T.: A dynamic checkpointing scheme based on reinforcement learning. In: Proceedings of The 10th International Symposium on Pacific Rim Dependable Computing (PRDC 2004), pp. 151–158. IEEE CS Press, Los Alamitos (2004)
Okamura, H., Miyahara, S., Dohi, T.: Rejuvenating communication network system with burst arrival. IEICE Transactions on Communications (B) E88-B(12), 4498–4506 (2005)
Okamura, H., Iwamoto, K., Dohi, T.: A dynamic programming algorithm for software rejuvenation scheduling under distributed computation circumstance. In: Proceedings of IEEE 11th International Conference on Parallel and Distributed Systems (ICPDS 2005), vol. II, pp. 493–497. IEEE CS Press, Los Alamitos (2005)
Okamura, H., Iwamoto, K., Dohi, T.: A DP-based optimal checkpointing algorithm for real-time appications. International Journal of Reliability, Quality and Safety Engineering 13(4), 323–340 (2006)
Ozaki, T., Dohi, T., Okamura, H., Kaio, N.: Distribution-free checkpoint placement algorithms based on min-max principle. IEEE Transactions on Dependable and Secure Computing 3(2), 130–140 (2006)
Pfening, S., Garg, S., Puliafito, A., Telek, M., Trivedi, K.S.: Optimal rejuvenation for tolerating soft failure. Performance Evaluation 27/28(4), 491–506 (1996)
Puterman, M.: Markov Decision Processes. John Wiley & Sons, New York (1994)
Reinecke, P., van Moorsel, A.P., Wolter, K.: A measurement study of the interplay between application level restart and transport protocol. In: Malek, M., Reitenspiess, M., Kaiser, J. (eds.) ISAS 2004. LNCS, vol. 3335, pp. 86–100. Springer, Heidelberg (2005)
Rinsaka, K., Dohi, T.: Behavioral analysis of fault-torellant software systems with rejuvenation. IEICE Transactions on Information and Systems (D) E88-D(12), 2681–2690 (2005)
Rinsaka, K., Dohi, T.: A faster estimation algorithm for periodic preventive rejuvenation schedule maximizing system availability. In: Malek, M., Reitenspieß, M., van Moorsel, A. (eds.) ISAS 2007. LNCS, vol. 4526, pp. 94–104. Springer, Heidelberg (2007)
Tai, A.T., Alkalai, L., Chau, S.N.: On-board preventive maintenance: a design-oriented analytic study for long-life applications. Performance Evaluation 35(3/4), 215–232 (1999)
Toueg, S., Babaog̃lu, Ö.: On the optimum checkpoint selection problem. SIAM Journal of Computing 13(3), 630–649 (1984)
Vaidyanathan, K.V., Harper, R.E., Hunter, S.W., Trivedi, K.S.: Analysis of software rejuvenation in cluster systems. In: Proceedings of ACM SIGMETRICS 2001/Performance 2001, pp. 62–71. ACM Press, New York (2001)
Vaidyanathan, K.V., Trivedi, K.S.: A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing 2(2), 124–137 (2005)
Vaidya, N.H.: Impact of checkpoint latency on overhead ratio of a checkpointing scheme. IEEE Transactions on Computers C-46(8), 942–947 (1997)
van Moorsel, A.P., Wolter, K.: Optimal restart times for moments of completion time. IEE Proceedings of Software 151(5), 219–223 (2004)
van Moorsel, A.P., Wolter, K.: Analysis of restart mechanisms in software systems. IEEE Transactions on Software Engineering 32(8), 547–558 (2006)
Wang, D., Xie, W., Trivedi, K.S.: Performability analysis of clustered systems with rejuvenation under varying workload. Performance Evaluation (in press)
Ziv, A., Bruck, J.: An on-line algorithm for checkpoint placement. IEEE Transactions on Computers C-46(9), 976–985 (1997)
Young, J.W.: A first order approximation to the optimum checkpoint interval. Communications of the ACM 17(9), 530–531 (1974)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Okamura, H., Dohi, T. (2008). Analysis of a Software System with Rejuvenation, Restoration and Checkpointing. In: Nanya, T., Maruyama, F., Pataricza, A., Malek, M. (eds) Service Availability. ISAS 2008. Lecture Notes in Computer Science, vol 5017. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68129-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-68129-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68128-1
Online ISBN: 978-3-540-68129-8
eBook Packages: Computer ScienceComputer Science (R0)