Skip to main content
Log in

Availability and reliability modeling of VM migration as rejuvenation on a system under varying workload

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Cloud computing serves as a platform for diverse types of applications, from low-priority to critical. Some of these applications require high levels of system availability and reliability. Developing methods for cloud computing availability and reliability evaluation is of utmost importance. In this paper, we propose a set of models for availability and reliability evaluation of a virtualized system with VMM software rejuvenation enabled by VM migration scheduling. To improve models fidelity with a real environment, we added a specific sub-model to represent the aspects of workload variation. Our main goal is to find the proper VM migration schedule to maximize system availability and to analyze the impact of such a schedule on the system reliability. Our results include the following: (1) the appropriate rejuvenation schedule to maximize availability in each proposed scenario; (2) downtime reduction when comparing the system with and without rejuvenation; and (3) reliability analysis of different scenarios of workload variation considering the proper rejuvenation schedules. The evaluation results comprise from systems without high workload demand (peakDuration = 0 h per day) to systems with only high workload demand (peakDuration = 24 h per day). Our results show a significant improvement in availability and reliability due to VM migration scheduling. In scenarios with a heavy workload, the downtime avoidance caused by software rejuvenation surpasses 3.39 days, and the reliability gain passes 86%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Just a clarification note: Erlang with fewer phases can represent a gradual increase. So we use a four-phase Erlang sub-net to represent software aging. However, Erlang with more phases tends to deterministic results. Therefore, we used a ten-phase Erlang sub-net to represent the Clock behavior.

  2. Arc terminating in a small circle instead of an arrow. In our case, the inhibitor arc enables the transition in the absence of tokens in its input places.

  3. PD means peak duration and RT means rejuvenation trigger

  4. A week has 168 h.

Abbreviations

MTTF :

- Mean Time To Failure

MTTR :

- Mean Time To Repair

SPN :

- Stochastic Petri Net

SRN :

- Stochastic Reward Net

VM :

- Virtual Machine

VMM :

- Virtual Machine Monitor

References

  • Araujo, J., Matos, R., Maciel, P., Matias, R. (2011a). Software aging issues on the eucalyptus cloud computing infrastructure. In 2011 IEEE international conference on systems, man, and cybernetics (SMC) (pp. 1411–1416). IEEE.

  • Araujo, J., Matos, R., Maciel, P., Matias, R., Beicker, I. (2011b). Experimental evaluation of software aging effects on the eucalyptus cloud computing infrastructure. In Proceedings of the middleware 2011 industry track workshop (p. 4). ACM.

  • Avizienis, A., Laprie, J.-C., Randell, B., et al. (2001). Fundamental concepts of dependability. University of Newcastle upon Tyne, Computing Science.

  • Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C. (2004). Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1(1), 11–33.

    Article  Google Scholar 

  • Bovenzi, A., Cotroneo, D., Pietrantuono, R., Russo, S. (2011). Workload characterization for software aging analysis. In 2011 IEEE 22nd international symposium on Software reliability engineering (ISSRE) (pp. 240–249): IEEE.

  • Ciardo, G., Blakemore, A., Chimento, P.F., Muppala, J.K., Trivedi, K.S. (1993). Automated generation and analysis of markov reward models using stochastic reward nets. In Linear algebra, markov chains, and queueing models (pp. 145–191). Springer.

  • Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I., Warfield, A. (2005). Live migration of virtual machines. In Proceedings of the 2nd conference on symposium on networked systems design & implementation-Volume 2 (pp. 273–286). USENIX Association.

  • Constantinescu, C. (2005). Dependability evaluation of a fault-tolerant processor by gspn modeling. IEEE Transactions on Reliability, 54(3), 468–474.

    Article  MathSciNet  Google Scholar 

  • Cotroneo, D., Natella, R., Pietrantuono, R., Russo, S. (2014). A survey of software aging and rejuvenation studies. ACM Journal on Emerging Technologies in Computing Systems (JETC), 10(1), 8.

    Google Scholar 

  • Dantas, J., Matos, R., Araujo, J., Maciel, P. (2012). An availability model for eucalyptus platform An analysis of warm-standy replication mechanism. In 2012 IEEE international conference on Systems, man, and cybernetics (SMC) (pp. 1664–1669). IEEE.

  • de Melo, M.D.T., Araujo, J., Umesh, IM., Maciel, P.R.M. (2017). Sware: An approach to support software aging and rejuvenation experiments. Journal on Advances in Theoretical and Applied Informatics, 3(1), 31–38.

    Article  Google Scholar 

  • Dohi, T., Zheng, J., Okamura, H., Trivedi, K.S. (2018). Optimal periodic software rejuvenation policies based on interval reliability criteria. Reliability Engineering & System Safety, 180, 463–475.

    Article  Google Scholar 

  • Grottke, M., & Trivedi, K. (2005). A classification of software faults. Journal of Reliability Engineering Association of Japan, 27(7), 425–438.

    Google Scholar 

  • Grottke, M., & Trivedi, K.S. (2007). Fighting bugs: Remove, retry, replicate, and rejuvenate. Computer, 40(2), 107–109.

    Article  Google Scholar 

  • Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D. (1995a). Software rejuvenation Analysis, module and applications. In Ftcs (p. 0381). IEEE.

  • Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D. (1995b). Software rejuvenation Analysis, module and applications. In 1995. FTCS-25. Digest of papers., twenty-fifth international symposium on Fault-tolerant computing (pp. 381–390). IEEE.

  • Kim, D.S., Machida, F., Trivedi, K.S. (2009). Availability modeling and analysis of a virtualized system. In 2009. PRDC’09. 15th IEEE pacific rim international symposium on Dependable computing (pp. 365–371). IEEE.

  • Langner, F., & Andrzejak, A. (2013). Detecting software aging in a cloud computing framework by comparing development versions. In 2013 IFIP/IEEE international symposium on integrated network management (IM 2013) (pp. 896–899). IEEE.

  • Li, H., Zhao, Z., He, L., Zheng, X. (2014). Model and analysis of cloud storage service reliability based on stochastic petri nets. Journal of Information & Computational Science, 11(7), 2341–2354.

    Article  Google Scholar 

  • Machida, F., Kim, Dong S., Trivedi, K.S. (2010). Modeling and analysis of software rejuvenation in a server virtualized system. In 2010 IEEE second international workshop on Software aging and rejuvenation (woSAR) (pp. 1–6). IEEE.

  • Machida, F., Kim, D.S., Trivedi, K.S. (2013). Modeling and analysis of software rejuvenation in a server virtualized system with live vm migration. Performance Evaluation, 70(3), 212–230.

    Article  Google Scholar 

  • Malhotra, M., & Reibman, A. (1993). Selecting and implementing phase approximations for semi-markov models. Stochastic Models, 9(4), 473–506.

    Article  MathSciNet  Google Scholar 

  • Malhotra, M., & Trivedi, K.S. (1994). Power-hierarchy of dependability-model types. IEEE Transactions on Reliability, 43(3), 493–502.

    Article  Google Scholar 

  • Malhotra, M., & Trivedi, K.S. (1995). Dependability modeling using petri-nets. IEEE Transactions on reliability, 44(3), 428–440.

    Article  Google Scholar 

  • Ajmone Marsan, M, Balbo, G., Conte, G., Donatelli, S., Franceschinis, G. (1995). Modelling with generalized stochastic Petri nets Vol. 292. New York: Wiley.

    MATH  Google Scholar 

  • Matos, R. D. S., Maciel, P. R. M., Machida, F., Kim, D.S., Trivedi, K.S. (2012a). Sensitivity analysis of server virtualized system availability. IEEE Trans. Reliab., 61(4), 994–1006.

    Article  Google Scholar 

  • Matos, R., Araujo, J., Alves, V., Maciel, P. (2012b). Characterization of software aging effects in elastic storage mechanisms for private clouds. In 2012 IEEE 23rd international symposium on software reliability engineering workshops (ISSREW) (pp. 293–298). IEEE.

  • Melo, M., Araujo, J., Matos, R., Menezes, J., Maciel, P. (2013a). Comparative analysis of migration-based rejuvenation schedules on cloud availability. In 2013 IEEE international conference on systems, man, and cybernetics (SMC) (pp. 4110–4115). IEEE.

  • Melo, M., Maciel, P., Araujo, J., Matos, R., Araujo, C. (2013b). Availability study on cloud computing environments Live migration as a rejuvenation mechanism. In 2013 43rd annual IEEE/IFIP international conference on Dependable systems and networks (DSN) (pp. 1–6). IEEE.

  • Muppala, J., Ciardo, G., Trivedi, K.S. (1994). Stochastic reward nets for reliability prediction. Communications in reliability, maintainability and serviceability, 1(2), 9–20.

    Google Scholar 

  • Mural, I, Bondavalli, A, Zang, X, Trivedi, K S. (1999). Dependability modeling and evaluation of phased mission systems: a dspn approach. In Dependable computing for critical applications 7, 1999. IEEE (pp. 319–337).

  • Okamura, H, Yamamoto, K, Dohi, T. (2014). Transient analysis of software rejuvenation policies in virtualized system Phase-type expansion approach. Quality Technology & Quantitative Management, 11(3), 335–351.

    Article  Google Scholar 

  • Parnas, D.L. (1994). Software aging. In Proceedings of the 16th international conference on Software engineering (pp. 279–287). IEEE Computer Society Press.

  • Royce, W.W. (1970). Managing the development of large software systems. In Proceedings of IEEE WESCON (vol. 26). Los Angeles.

  • Suzuki, H., Dohi, T., Kaio, N., Trivedi, K.S. (2003). Maximizing interval reliability in operational software system with rejuvenation. In 2003. ISSRE 2003. 14th international symposium on Software reliability engineering (pp. 479–490). IEEE.

  • Thein, T., Chi, S.-D., Park, J.S. (2008). Availability modeling and analysis on virtualized clustering with rejuvenation. International Journal of Computer Science and Network Security.

  • Thein, T., & Park, J.S. (2009). Availability analysis of application servers using software rejuvenation and virtualization. Journal of computer science and technology, 24(2), 339–346.

    Article  Google Scholar 

  • Torquato, M., Maciel, P., Araujo, J., Umesh, I.M. (2017). An approach to investigate aging symptoms and rejuvenation effectiveness on software systems. In 2017 12th iberian conference on Information systems and technologies (CISTI) (pp. 1–6): IEEE.

  • Torquato, M, Araujo, J, Umesh, I M, Maciel, P. (2018a). Sware A methodology for software aging and rejuvenation experiments. Journal of Information Systems Engineering & Management, 3(2), 15.

    Article  Google Scholar 

  • Torquato, M., Umesh, I.M., Maciel, P. (2018b). Models for availability and power consumption evaluation of a private cloud with vmm rejuvenation enabled by vm live migration. The Journal of Supercomputing, 1–25.

  • Torquato, M., & Vieira, M. (2018). Interacting srn models for availability evaluation of vm migration as rejuvenation on a system under varying workload. In 2018 IEEE International symposium on software reliability engineering workshops (ISSREW) (pp. 300–307). IEEE.

  • Trivedi, K.S., Ciardo, G., Malhotra, M., Sahner, R.A. (1993). Dependability and performability analysis. In Performance evaluation of computer and communication systems (pp. 587–612). Springer.

  • Trivedi, K.S., Vaidyanathan, K., Goseva-Popstojanova, K. (2000). Modeling and analysis of software aging and rejuvenation. In 2000.(SS 2000) proceedings. 33rd annual Simulation symposium (pp. 270–279). IEEE.

  • Trivedi, K.S., & Bobbio, A. (2017). Reliability and availability engineering: modeling, analysis, and applications. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Vaidyanathan, K., & Trivedi, K.S. (1999). A measurement-based model for estimation of resource exhaustion in operational software systems. In Issre (pp. 84). IEEE.

  • Vaidyanathan, K., & Trivedi, K.S. (2001). Extended classification of software faults based on aging. In Fast abstract, int. Symp. Software Reliability Eng. Hong Kong.

  • Vaidyanathan, K., & Trivedi, K.S. (2005). A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing, 2(2), 124–137.

    Article  Google Scholar 

  • Wajid, R.A., & Shuaib Khan, M. (2006). Comparative distributions of hazard modeling analysis. Pakistan Journal of Statistics and Operation Research, 2(2), 127–134.

    Article  Google Scholar 

  • Wang, D., Xie, W., Trivedi, K.S. (2007). Performability analysis of clustered systems with rejuvenation under varying workload. Performance Evaluation, 64(3), 247–265.

    Article  Google Scholar 

  • Xie, W., Hong, Y., Trivedi, K.S. (2004). Software rejuvenation policies for cluster systems under varying workload. In 2004. Proceedings 10th IEEE pacific rim international symposium on dependable computing (pp. 122–129). IEEE.

  • Zimmermann, A. (2017). Modelling and performance evaluation with timenet 4.4. In International conference on quantitative evaluation of systems (pp. 300–303). Springer.

Download references

Acknowledgments

This paper is an extended version of the paper (Torquato and Vieira 2018). We want to thank the anonymous reviewers for the valuable comments to improve the research presented in this paper.

Funding

This work has been partially supported by Portuguese funding institution FCT - Foundation for Science and Technology, Ph.D. grant SFRH/BD/146181/2019 and project ATMOSPHERE, funded by the European Commission under the Cooperation Programme, Horizon 2020 grant agreement no 777154.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matheus Torquato.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Approach used in the deterministic transitions

In the design of our models, we faced two significant challenges: (1) the use of deterministic transitions that substantially increases the time for model evaluation, and 2) the use of guard functions that usually affect the model readability as they hide specific system behaviors in some transitions.

To tackle these challenges, we adopted the following approach. For challenge (1), we replaced the deterministic transitions with 10-phase Erlang subnets that are capable of reproducing the deterministic behavior. For challenge (2), we added the specific arcs and transitions that represent the intended guard function behavior.

In the representation of the guard function behavior, we used token swaps in some places of the SRN model. Figure 10 presents a token swap with a deterministic transition. The problem with tokenswaps is that each time that the transition TokenSwap fires the time counting for the deterministic transition T0 is reset due to the token re-entrance in the place P0.

Fig. 10
figure 10

Token swap with deterministic transition

To overcome the token swap problem, we adopted a technique based on the age memory policy (Ajmone Marsan et al. 1995). In the age memory policy, the SRN is able to persist the cumulative enabling time of a transition since the last time it was fired. The age memory policy is useful to model multitask systems, where the cumulative work in one task is preserved when the system switches to others. In our case, we used the 10-phase Erlang sub-net as the age memory policy mechanism. Figure 11 presents the adopted approach.

Fig. 11
figure 11

Token swap with age memory policy

As soon as IT0 fires, the place PEr0 receives 10 tokens. Those tokens represent the time counting for the transition EPh firing. It is possible to notice that, regardless of TokenSwap firing, the time counting is preserved.

We used this approach in all the deterministic transitions presented in this paper. We decided to represent the approach using a deterministic transition to improve the readability of the model.

Appendix B: A brief description of TimeNET Tool

This description is based on paper (Zimmermann 2017).

TimeNET is a tool for modeling and evaluating several variants of SPNs. It allows the user to calculate reward measures from the models. It also supports the evaluation of models with exponential, deterministic, and non-exponential transitions. The tool provides numerical analysis and simulation methods to compute transient and steady-state solutions for the models. TimeNET also supports Colored Petri Nets.

The software is built using a Java graphical interface, shell scripts, and C++ algorithms. The tool is available on 32 and 64-bit versions for Windows and Linux.

TimeNET tool is developed and maintained by the Systems and Software Engineering Group, TU Ilmenau, Germany. It is available free of charge for non-commercial use from http://timenet.tu-ilmenau.de/.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Torquato, M., Maciel, P. & Vieira, M. Availability and reliability modeling of VM migration as rejuvenation on a system under varying workload. Software Qual J 28, 59–83 (2020). https://doi.org/10.1007/s11219-019-09474-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-019-09474-1

Keywords

Navigation