ABSTRACT
Fault tolerance (FT) is one of the most important ways to achieve high availability (HA). However, as for cloud, with diverse user requirements, heterogeneous cloud providers, complex FT implementation as well as error-prone configuration, it is a real challenge. To cope with it, we proposed a model defined FT approach which automatically deploys FT mechanisms following a high-level model. With the help of FT model, the existing FT mechanisms will be optimized by reusability. We implemented a prototype of our approach and evaluated it on a popular IaaS cloud - CloudStack.
- Kephart, J. O. 2003. The Vision of Autonomic Computing. Computer. 36(1): 41–50. Google ScholarDigital Library
- Mell, P., Grance, T. 2009. The NIST definition of cloud computing. National Institute of Standards and Technology. 53(6): 50.Google Scholar
- Clark, C., Fraser, K., S. Hand, J. Hansem, E. Jul, C. Limpach, I. Pratt, and A. Warfield. 2005. Live migration of virtual machines. Symposium on Networked Systems Design and Implementation. 273-286. Google ScholarDigital Library
- Sousa P, Bessani AN, Correia M, Neves NF, Verissimo P. 2010. Highly available intrusion-tolerant services with proactive-reactive recovery. IEEE Trans. on Parallel and Distributed Systems. 21, 4, 452 -465. Google ScholarDigital Library
- Ganga, K. and Karthik, S. 2013. A fault tolerant approach in scientific workflow systems based on cloud computing. International Conference on Pattern Recognition, Informatics and Medical Engineering. 387-390.Google Scholar
- Florian H., Openstack high availability guide. 2014.Google Scholar
- Ganesh, A., Sandhya, M., Shankar, S. A study on fault tolerance methods in Cloud Computing. 2014. IEEE International Advance Computing Conference (IACC). 844-849.Google Scholar
- JOnAS. http://jonas.ow2.org/xwiki/bin/view/Main/Google Scholar
- Dashofy, E. M., Van der Hoek, A., Taylor, R. N. 2002. Towards architecture-based self-healing systems. Proceedings of the first workshop on Self-healing systems. 21-26. Google ScholarDigital Library
- De Lemos, R, Fiadeiro, J L. An architectural support for self-adaptive software for treating faults. Proceedings of the first workshop on Self-healing systems. 2002: 39-42. Google ScholarDigital Library
- Becker, M., Becker, S., Meyer, J. 2013. SimuLizar: Design-Time Modeling and Performance Analysis of Self-Adaptive Systems. Software Engineering. 71-84.Google Scholar
- Avizienis, A., Laprie, J. C., Randell B., et al. 2004. Basic concepts and taxonomy of dependable and secure computing. Dependable and Secure Computing, IEEE Transactions on. 1, 1, 11-33. Google ScholarDigital Library
- Avizzienis A., 1967. Design of Fault-Tolerant Computers. Proc. 1967 Fall Joint Computer Conf. vol. 31, pp. 733-743. Google ScholarDigital Library
- Cheng, S. W., Garlan, D., Schmerl, B., et al. 2002. Using architectural style as a basis for system self-repair. Software Architecture. Springer US. 45-59. Google ScholarDigital Library
- Nagarajan, A. B., Mueller, F., Engelmann C., et al. 2007. Proactive fault tolerance for HPC with Xen virtualization. Proceedings of the 21st annual international conference on Supercomputing. 23-32. Google ScholarDigital Library
Index Terms
- Model defined fault tolerance in cloud
Recommendations
Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments
Failures are normal rather than exceptional in cloud computing environments, high fault tolerance issue is one of the major obstacles for opening up a new era of high serviceability cloud computing as fault tolerance plays a key role in ensuring cloud ...
New Fuzzy-Based Fault Tolerance Evaluation Framework for Cloud Computing
AbstractFault tolerance is one of the principal challenges in cloud computing. This capability has a trade off with other system features. Providing a fuzzy inference system to evaluate fault tolerance architectural capabilities in cloud computing systems ...
Fault Tolerance in Multiprocessor Systems Without Dedicated Redundancy
An algorithm called RAFT (recursive algorithm for fault tolerance) for achieving fault tolerance in multiprocessor systems is described. Through the use of a combination of dynamic space- and time- redundancy techniques, RAFT achieves fault tolerance in ...
Comments