Abstract
Cloud computing serves as a platform for diverse types of applications, from low-priority to critical. Some of these applications require high levels of system availability and reliability. Developing methods for cloud computing availability and reliability evaluation is of utmost importance. In this paper, we propose a set of models for availability and reliability evaluation of a virtualized system with VMM software rejuvenation enabled by VM migration scheduling. To improve models fidelity with a real environment, we added a specific sub-model to represent the aspects of workload variation. Our main goal is to find the proper VM migration schedule to maximize system availability and to analyze the impact of such a schedule on the system reliability. Our results include the following: (1) the appropriate rejuvenation schedule to maximize availability in each proposed scenario; (2) downtime reduction when comparing the system with and without rejuvenation; and (3) reliability analysis of different scenarios of workload variation considering the proper rejuvenation schedules. The evaluation results comprise from systems without high workload demand (peakDuration = 0 h per day) to systems with only high workload demand (peakDuration = 24 h per day). Our results show a significant improvement in availability and reliability due to VM migration scheduling. In scenarios with a heavy workload, the downtime avoidance caused by software rejuvenation surpasses 3.39 days, and the reliability gain passes 86%.
Similar content being viewed by others
Notes
Just a clarification note: Erlang with fewer phases can represent a gradual increase. So we use a four-phase Erlang sub-net to represent software aging. However, Erlang with more phases tends to deterministic results. Therefore, we used a ten-phase Erlang sub-net to represent the Clock behavior.
Arc terminating in a small circle instead of an arrow. In our case, the inhibitor arc enables the transition in the absence of tokens in its input places.
PD means peak duration and RT means rejuvenation trigger
A week has 168 h.
Abbreviations
- MTTF :
-
- Mean Time To Failure
- MTTR :
-
- Mean Time To Repair
- SPN :
-
- Stochastic Petri Net
- SRN :
-
- Stochastic Reward Net
- VM :
-
- Virtual Machine
- VMM :
-
- Virtual Machine Monitor
References
Araujo, J., Matos, R., Maciel, P., Matias, R. (2011a). Software aging issues on the eucalyptus cloud computing infrastructure. In 2011 IEEE international conference on systems, man, and cybernetics (SMC) (pp. 1411–1416). IEEE.
Araujo, J., Matos, R., Maciel, P., Matias, R., Beicker, I. (2011b). Experimental evaluation of software aging effects on the eucalyptus cloud computing infrastructure. In Proceedings of the middleware 2011 industry track workshop (p. 4). ACM.
Avizienis, A., Laprie, J.-C., Randell, B., et al. (2001). Fundamental concepts of dependability. University of Newcastle upon Tyne, Computing Science.
Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C. (2004). Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1(1), 11–33.
Bovenzi, A., Cotroneo, D., Pietrantuono, R., Russo, S. (2011). Workload characterization for software aging analysis. In 2011 IEEE 22nd international symposium on Software reliability engineering (ISSRE) (pp. 240–249): IEEE.
Ciardo, G., Blakemore, A., Chimento, P.F., Muppala, J.K., Trivedi, K.S. (1993). Automated generation and analysis of markov reward models using stochastic reward nets. In Linear algebra, markov chains, and queueing models (pp. 145–191). Springer.
Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I., Warfield, A. (2005). Live migration of virtual machines. In Proceedings of the 2nd conference on symposium on networked systems design & implementation-Volume 2 (pp. 273–286). USENIX Association.
Constantinescu, C. (2005). Dependability evaluation of a fault-tolerant processor by gspn modeling. IEEE Transactions on Reliability, 54(3), 468–474.
Cotroneo, D., Natella, R., Pietrantuono, R., Russo, S. (2014). A survey of software aging and rejuvenation studies. ACM Journal on Emerging Technologies in Computing Systems (JETC), 10(1), 8.
Dantas, J., Matos, R., Araujo, J., Maciel, P. (2012). An availability model for eucalyptus platform An analysis of warm-standy replication mechanism. In 2012 IEEE international conference on Systems, man, and cybernetics (SMC) (pp. 1664–1669). IEEE.
de Melo, M.D.T., Araujo, J., Umesh, IM., Maciel, P.R.M. (2017). Sware: An approach to support software aging and rejuvenation experiments. Journal on Advances in Theoretical and Applied Informatics, 3(1), 31–38.
Dohi, T., Zheng, J., Okamura, H., Trivedi, K.S. (2018). Optimal periodic software rejuvenation policies based on interval reliability criteria. Reliability Engineering & System Safety, 180, 463–475.
Grottke, M., & Trivedi, K. (2005). A classification of software faults. Journal of Reliability Engineering Association of Japan, 27(7), 425–438.
Grottke, M., & Trivedi, K.S. (2007). Fighting bugs: Remove, retry, replicate, and rejuvenate. Computer, 40(2), 107–109.
Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D. (1995a). Software rejuvenation Analysis, module and applications. In Ftcs (p. 0381). IEEE.
Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D. (1995b). Software rejuvenation Analysis, module and applications. In 1995. FTCS-25. Digest of papers., twenty-fifth international symposium on Fault-tolerant computing (pp. 381–390). IEEE.
Kim, D.S., Machida, F., Trivedi, K.S. (2009). Availability modeling and analysis of a virtualized system. In 2009. PRDC’09. 15th IEEE pacific rim international symposium on Dependable computing (pp. 365–371). IEEE.
Langner, F., & Andrzejak, A. (2013). Detecting software aging in a cloud computing framework by comparing development versions. In 2013 IFIP/IEEE international symposium on integrated network management (IM 2013) (pp. 896–899). IEEE.
Li, H., Zhao, Z., He, L., Zheng, X. (2014). Model and analysis of cloud storage service reliability based on stochastic petri nets. Journal of Information & Computational Science, 11(7), 2341–2354.
Machida, F., Kim, Dong S., Trivedi, K.S. (2010). Modeling and analysis of software rejuvenation in a server virtualized system. In 2010 IEEE second international workshop on Software aging and rejuvenation (woSAR) (pp. 1–6). IEEE.
Machida, F., Kim, D.S., Trivedi, K.S. (2013). Modeling and analysis of software rejuvenation in a server virtualized system with live vm migration. Performance Evaluation, 70(3), 212–230.
Malhotra, M., & Reibman, A. (1993). Selecting and implementing phase approximations for semi-markov models. Stochastic Models, 9(4), 473–506.
Malhotra, M., & Trivedi, K.S. (1994). Power-hierarchy of dependability-model types. IEEE Transactions on Reliability, 43(3), 493–502.
Malhotra, M., & Trivedi, K.S. (1995). Dependability modeling using petri-nets. IEEE Transactions on reliability, 44(3), 428–440.
Ajmone Marsan, M, Balbo, G., Conte, G., Donatelli, S., Franceschinis, G. (1995). Modelling with generalized stochastic Petri nets Vol. 292. New York: Wiley.
Matos, R. D. S., Maciel, P. R. M., Machida, F., Kim, D.S., Trivedi, K.S. (2012a). Sensitivity analysis of server virtualized system availability. IEEE Trans. Reliab., 61(4), 994–1006.
Matos, R., Araujo, J., Alves, V., Maciel, P. (2012b). Characterization of software aging effects in elastic storage mechanisms for private clouds. In 2012 IEEE 23rd international symposium on software reliability engineering workshops (ISSREW) (pp. 293–298). IEEE.
Melo, M., Araujo, J., Matos, R., Menezes, J., Maciel, P. (2013a). Comparative analysis of migration-based rejuvenation schedules on cloud availability. In 2013 IEEE international conference on systems, man, and cybernetics (SMC) (pp. 4110–4115). IEEE.
Melo, M., Maciel, P., Araujo, J., Matos, R., Araujo, C. (2013b). Availability study on cloud computing environments Live migration as a rejuvenation mechanism. In 2013 43rd annual IEEE/IFIP international conference on Dependable systems and networks (DSN) (pp. 1–6). IEEE.
Muppala, J., Ciardo, G., Trivedi, K.S. (1994). Stochastic reward nets for reliability prediction. Communications in reliability, maintainability and serviceability, 1(2), 9–20.
Mural, I, Bondavalli, A, Zang, X, Trivedi, K S. (1999). Dependability modeling and evaluation of phased mission systems: a dspn approach. In Dependable computing for critical applications 7, 1999. IEEE (pp. 319–337).
Okamura, H, Yamamoto, K, Dohi, T. (2014). Transient analysis of software rejuvenation policies in virtualized system Phase-type expansion approach. Quality Technology & Quantitative Management, 11(3), 335–351.
Parnas, D.L. (1994). Software aging. In Proceedings of the 16th international conference on Software engineering (pp. 279–287). IEEE Computer Society Press.
Royce, W.W. (1970). Managing the development of large software systems. In Proceedings of IEEE WESCON (vol. 26). Los Angeles.
Suzuki, H., Dohi, T., Kaio, N., Trivedi, K.S. (2003). Maximizing interval reliability in operational software system with rejuvenation. In 2003. ISSRE 2003. 14th international symposium on Software reliability engineering (pp. 479–490). IEEE.
Thein, T., Chi, S.-D., Park, J.S. (2008). Availability modeling and analysis on virtualized clustering with rejuvenation. International Journal of Computer Science and Network Security.
Thein, T., & Park, J.S. (2009). Availability analysis of application servers using software rejuvenation and virtualization. Journal of computer science and technology, 24(2), 339–346.
Torquato, M., Maciel, P., Araujo, J., Umesh, I.M. (2017). An approach to investigate aging symptoms and rejuvenation effectiveness on software systems. In 2017 12th iberian conference on Information systems and technologies (CISTI) (pp. 1–6): IEEE.
Torquato, M, Araujo, J, Umesh, I M, Maciel, P. (2018a). Sware A methodology for software aging and rejuvenation experiments. Journal of Information Systems Engineering & Management, 3(2), 15.
Torquato, M., Umesh, I.M., Maciel, P. (2018b). Models for availability and power consumption evaluation of a private cloud with vmm rejuvenation enabled by vm live migration. The Journal of Supercomputing, 1–25.
Torquato, M., & Vieira, M. (2018). Interacting srn models for availability evaluation of vm migration as rejuvenation on a system under varying workload. In 2018 IEEE International symposium on software reliability engineering workshops (ISSREW) (pp. 300–307). IEEE.
Trivedi, K.S., Ciardo, G., Malhotra, M., Sahner, R.A. (1993). Dependability and performability analysis. In Performance evaluation of computer and communication systems (pp. 587–612). Springer.
Trivedi, K.S., Vaidyanathan, K., Goseva-Popstojanova, K. (2000). Modeling and analysis of software aging and rejuvenation. In 2000.(SS 2000) proceedings. 33rd annual Simulation symposium (pp. 270–279). IEEE.
Trivedi, K.S., & Bobbio, A. (2017). Reliability and availability engineering: modeling, analysis, and applications. Cambridge: Cambridge University Press.
Vaidyanathan, K., & Trivedi, K.S. (1999). A measurement-based model for estimation of resource exhaustion in operational software systems. In Issre (pp. 84). IEEE.
Vaidyanathan, K., & Trivedi, K.S. (2001). Extended classification of software faults based on aging. In Fast abstract, int. Symp. Software Reliability Eng. Hong Kong.
Vaidyanathan, K., & Trivedi, K.S. (2005). A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing, 2(2), 124–137.
Wajid, R.A., & Shuaib Khan, M. (2006). Comparative distributions of hazard modeling analysis. Pakistan Journal of Statistics and Operation Research, 2(2), 127–134.
Wang, D., Xie, W., Trivedi, K.S. (2007). Performability analysis of clustered systems with rejuvenation under varying workload. Performance Evaluation, 64(3), 247–265.
Xie, W., Hong, Y., Trivedi, K.S. (2004). Software rejuvenation policies for cluster systems under varying workload. In 2004. Proceedings 10th IEEE pacific rim international symposium on dependable computing (pp. 122–129). IEEE.
Zimmermann, A. (2017). Modelling and performance evaluation with timenet 4.4. In International conference on quantitative evaluation of systems (pp. 300–303). Springer.
Acknowledgments
This paper is an extended version of the paper (Torquato and Vieira 2018). We want to thank the anonymous reviewers for the valuable comments to improve the research presented in this paper.
Funding
This work has been partially supported by Portuguese funding institution FCT - Foundation for Science and Technology, Ph.D. grant SFRH/BD/146181/2019 and project ATMOSPHERE, funded by the European Commission under the Cooperation Programme, Horizon 2020 grant agreement no 777154.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Approach used in the deterministic transitions
In the design of our models, we faced two significant challenges: (1) the use of deterministic transitions that substantially increases the time for model evaluation, and 2) the use of guard functions that usually affect the model readability as they hide specific system behaviors in some transitions.
To tackle these challenges, we adopted the following approach. For challenge (1), we replaced the deterministic transitions with 10-phase Erlang subnets that are capable of reproducing the deterministic behavior. For challenge (2), we added the specific arcs and transitions that represent the intended guard function behavior.
In the representation of the guard function behavior, we used token swaps in some places of the SRN model. Figure 10 presents a token swap with a deterministic transition. The problem with tokenswaps is that each time that the transition TokenSwap fires the time counting for the deterministic transition T0 is reset due to the token re-entrance in the place P0.
To overcome the token swap problem, we adopted a technique based on the age memory policy (Ajmone Marsan et al. 1995). In the age memory policy, the SRN is able to persist the cumulative enabling time of a transition since the last time it was fired. The age memory policy is useful to model multitask systems, where the cumulative work in one task is preserved when the system switches to others. In our case, we used the 10-phase Erlang sub-net as the age memory policy mechanism. Figure 11 presents the adopted approach.
As soon as IT0 fires, the place PEr0 receives 10 tokens. Those tokens represent the time counting for the transition EPh firing. It is possible to notice that, regardless of TokenSwap firing, the time counting is preserved.
We used this approach in all the deterministic transitions presented in this paper. We decided to represent the approach using a deterministic transition to improve the readability of the model.
Appendix B: A brief description of TimeNET Tool
This description is based on paper (Zimmermann 2017).
TimeNET is a tool for modeling and evaluating several variants of SPNs. It allows the user to calculate reward measures from the models. It also supports the evaluation of models with exponential, deterministic, and non-exponential transitions. The tool provides numerical analysis and simulation methods to compute transient and steady-state solutions for the models. TimeNET also supports Colored Petri Nets.
The software is built using a Java graphical interface, shell scripts, and C++ algorithms. The tool is available on 32 and 64-bit versions for Windows and Linux.
TimeNET tool is developed and maintained by the Systems and Software Engineering Group, TU Ilmenau, Germany. It is available free of charge for non-commercial use from http://timenet.tu-ilmenau.de/.
Rights and permissions
About this article
Cite this article
Torquato, M., Maciel, P. & Vieira, M. Availability and reliability modeling of VM migration as rejuvenation on a system under varying workload. Software Qual J 28, 59–83 (2020). https://doi.org/10.1007/s11219-019-09474-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-019-09474-1