Abstract
Cloud IaaS services bring the novelty of elastic delivery of computational resources in a virtualized form and resource management through easy replication of virtual nodes and live migration. In such dynamic and volatile environments where resources are virtualized, availability and reliability are mandatory for assuring an accepted quality of service for end users. In this context specific fault tolerance strategies are needed. Using concepts from fault tree analysis, we propose a distributed and autonomous approach where each virtualized node can assess and predict its own health state. In our setup each node can proactively take a decision about accepting future jobs, delegate jobs to own replicated instances or start a live migration process. We practically evaluate our model using real Xen log traces.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Atif, M., Strazdins, P.: Adaptive parallel application resource remapping through the live migration of virtual machines. Future Gener. Comput. Syst. 37, 148–161 (2014)
Bobbio, A., Portinale, L., Minichino, M., Ciancamerla, E.: Improving the analysis of dependable systems by mapping fault trees into bayesian networks. Reliab. Eng. Syst. Saf. 71(3), 249–260 (2001)
Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I., Warfield, A.: Live migration of virtual machines. In: Proceedings of the 2nd Conference on Symposium on Networked Systems Design & Implementation, NSDI’05, vol. 2, pp. 273–286. USENIX Association (2005)
Colesa, A., Mihai, B.: An adaptive virtual machine replication algorithm for highly-available services. In: 2011 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 941–948. IEEE (2011)
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus: high availability via asynchronous virtual machine replication. In: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, San Francisco, pp. 161–174 (2008)
Feller, E., Rilling, L., Morin, C.: Snooze: a scalable and autonomic virtual machine management framework for private clouds. In: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 482–489. IEEE Press, May 2012
Gerofi, B., Vass, Z., Ishikawa, Y.: Utilizing memory content similarity for improving the performance of replicated virtual machines. In: 2011 Fourth IEEE International Conference on Utility and Cloud Computing (UCC), pp. 73–80. IEEE (2011)
Guerraoui, R., Yabandeh, M.: Independent faults in the cloud. In: Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware, LADIS ’10, pp. 12–17. ACM Press (2010)
Haimes, Y.: Risk Modeling, Assessment, and Management. Wiley, New York (2005)
Jhawar, R., Piuri, V.: Fault tolerance management in iaas clouds. In: IEEE First AESS European Conference on Satellite Telecommunications (ESTEL), pp. 1–6. IEEE Press (2012)
Jhawar, R., Piuri, V.: Fault tolerance and resilience in cloud computing environments. In: Vacca, J.R. (ed.) Cyber Security and IT Infrastructure Protection, pp. 1–28. Syngress (2014)
Jin, H., Deng, L., Wu, S., Shi, X., Chen, H., Pan, X.: Mecom: live migration of virtual machines by adaptively compressing memory pages. Future Gener. Comput. Syst. 38, 23–35 (2014)
Kim, D.S., Machida, F., Trivedi, K.S.: Availability modeling and analysis of a virtualized system. In: Proceedings of the 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC ’09, pp. 365–371. IEEE Computer Society (2009)
Nagarajan, A.B., Mueller, F., Engelmann, C., Scott, S.L.: Proactive fault tolerance for hpc with xen virtualization. In: Proceedings of the 21st Annual International Conference on Supercomputing, ICS ’07, pp. 23–32. ACM Press (2007)
Nicolae, B., Cappello, F.: BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds. J. Parallel Distrib. Comput. 73(5), 698–711 (2013)
Sampaio, A.M., Barbosa, J.G.: Towards high-available and energy-efficient virtual computing environments in the cloud. Future Gener. Comput. Syst. 40, 30–43 (2014)
Travostino, F., Daspit, P., Gommans, L., Jog, C., De Laat, C., Mambretti, J., Monga, I., Van Oudenaarde, B., Raghunath, S., Yonghui Wang, P.: Seamless live migration of virtual machines over the man/wan. Future Gener. Comput. Syst. 22(8), 901–907 (2006)
Undheim, A., Chilwan, A., Heegaard, P.: Differentiated availability in cloud computing slas. In: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing, GRID ’11, pp. 129–136. IEEE Computer Society (2011)
Vallee, G., Engelmann, C., Tikotekar, A., Naughton, T., Charoenpornwattana, K., Leangsuksun, C., Scott, S.: A framework for proactive fault tolerance. In: Third International Conference on Availability, Reliability and Security, (ARES), pp. 659–664. IEEE Press (2008)
Vishwanath, K.V., Nagappan, N.: Characterizing cloud computing hardware reliability. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 193–204. ACM Press (2010)
Wang, S.S., Wang, S.C.: The consensus problem with dual failure nodes in a cloud computing environment. Inf. Sci. 279, 213–228 (2014)
Acknowledgements
This work was co-financed from the European Social Fund through Sectoral Operational Programme Human Resources Development 2007-2013, project number POSDRU/159/1.5/S/134197 “Performance and excellence in doctoral and postdoctoral research in Romanian economics science domain”. G.C. Silaghi acknowledges support from UEFISCDI under project JustASR - PN-II-PT-PCCA-2013-4-1644.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Butoi, A., Stan, A., Silaghi, G.C. (2014). Autonomous Management of Virtual Machine Failures in IaaS Using Fault Tree Analysis. In: Altmann, J., Vanmechelen, K., Rana, O. (eds) Economics of Grids, Clouds, Systems, and Services. GECON 2014. Lecture Notes in Computer Science(), vol 8914. Springer, Cham. https://doi.org/10.1007/978-3-319-14609-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-14609-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14608-9
Online ISBN: 978-3-319-14609-6
eBook Packages: Computer ScienceComputer Science (R0)