Abstract
Cloud computing is a recent trend in IT, which has attracted lots of attention. In cloud computing, service reliability and service performance are two important issues. To improve cloud service reliability, fault tolerance techniques such as fault recovery may be used, which in turn has impact on cloud service performance. Such impact deserves detailed research. Although there exist some researches on cloud/grid service reliability and performance, very few of them addressed the issues of fault recovery and its impact on service performance. In this paper, we conduct detailed research on performance evaluation of cloud service considering fault recovery. We consider recovery on both processing nodes and communication links. The commonly adopted assumption of Poisson arrivals of users’ service requests is relaxed, and the interarrival times of service requests can take arbitrary probability distribution. The precedence constraints of subtasks are also considered. The probability distribution of service response time is derived, and a numerical example is presented. The proposed cloud performance evaluation models and methods could yield results which are realistic, and thus are of practical value for related decision-makings in cloud computing.
Similar content being viewed by others
References
Armbrust M, Fox A, Griffith R et al (2009) Above the clouds: a Berkeley view of cloud computing. Technical Report No UCB/EECS-2009-28, University of California at Berkeley
Armbrust M, Fox A, Griffith R et al (2010) A view of cloud computing. Commun ACM 53(4):50–58
Buyya R, Yeo CS, Venugopal S (2008) Market-oriented cloud computing: Vision, hype, and reality for delivering IT services as computing utilities. In: Proceedings of the 10th IEEE international conference on high performance computing and communications (HPCC2008), pp 5–13
Foster I, Zhao Y, Raicu I et al (2008) Cloud computing and grid computing 360-degree compared. In: Proceedings of grid computing environments workshop (GCE ’08), pp 1–10
di Costanzo A, de Assunção MD, Buyya R (2009) Harnessing cloud technologies for a virtualized distributed computing infrastructure. IEEE Internet Comput 13(5):24–33
Dikaiakos MD, Katsaros D, Mehra P et al (2009) Cloud computing: distributed Internet computing for IT and scientific research. IEEE Internet Comput 13(5):10–11
Lyu MR (ed) (1995) Software fault tolerance. Wiley, New York
Avižienis A, Laprie J-C, Randell B et al (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secure Comput 1(1):11–33
Hwang S, Kesselman C (2003) A flexible framework for fault tolerance in the grid. J Grid Comput 1(3):251–272
Quan DM (2007) Error recovery mechanism for grid-based workflow within SLA context. Int J High Perform Comput Netw 5(1–2):110–121
Lee H, Chung K, Chin S et al (2005) A resource management and fault tolerance services in grid computing. J Parallel Distrib Comput 65(11):1305–1317
Litke A, Skoutas D, Tserpes K et al (2007) Efficient task replication and management for adaptive fault tolerance in mobile grid environments. Future Gener Comput Syst 23(2):163–178
Nazir B, Qureshi K, Manuel P (2009) Adaptive checkpointing strategy to tolerate faults in economy based grid. J Supercomput 50(1):1–18
Kandaswamy G, Mandal A, Reed DA (2008) Fault tolerance and recovery of scientific workflows on computational grids. In: Proceedings of the 8th IEEE international symposium on cluster computing and the grid (CCGRID 2008), pp 777–782
Guo S, Yang B, Huang HZ (2009) Grid service reliability modeling on fault recovery and optimal task scheduling. In: Proceedings of the 55th annual reliability & maintainability symposium (RAMS2009), pp 471–476
Ramanathan P, Shin KG (1993) Use of common time base for checkpointing and rollback recovery in a distributed system. IEEE Trans Softw Eng 19(6):571–583
Pradhan DK, Vaidya NH (1994) Roll-forward checkpointing scheme: a novel fault-tolerant architecture. IEEE Trans Comput 43(10):1163–1174
Yang B, Tan F, Dai YS, Guo S (2009) Performance evaluation of cloud service considering fault recovery. In: Proceedings of the first international conference on cloud computing (CloudCom 2009). Lecture notes in computer science, vol 5931, pp 571–576
Dai YS, Levitin G (2006) Reliability and performance of tree-structured grid services. IEEE Trans Reliab 55(2):337–349
Dai YS, Levitin G, Trivedi KS (2007) Performance and reliability of tree-structured grid services considering data dependence and failure correlation. IEEE Trans Comput 56(7):925–936
Dai YS, Levitin G (2007) Optimal resource allocation for maximizing performance and reliability in tree-structured grid services. IEEE Trans Reliab 56(3):444–453
Levitin G, Dai YS, Ben-Haim H (2006) Reliability and performance of star topology grid service with precedence constraints on subtask execution. IEEE Trans Reliab 55(3):507–515
Levitin G, Dai YS (2007) Service reliability and performance in grid system with star topology. Reliab Eng Syst Saf 92(1):40–46
Levitin G, Dai YS (2008) Optimal service task partition and distribution in grid system with star topology. Reliab Eng Syst Saf 93(1):152–159
Kong X, Huang J, Lin C et al (2009) Performance, fault-tolerance and scalability analysis of virtual infrastructure management system. In: Proceedings of the 2009 IEEE international symposium on parallel and distributed processing with applications (ISPA 2009), pp 282–289
Krauter K, Buyya R, Maheswaran M (2002) A taxonomy and survey of grid resource management systems for distributed computing. Softw Pract Exp 32(2):135–164
Dai YS, Pan Y, Zou X (2007) A hierarchical modeling and analysis for grid service reliability. IEEE Trans Comput 56(5):681–691
Vijaya Laxmi P, Gupta UC (2000) Analysis of finite-buffer multi-server queues with group arrivals: GI X/M/c/N. Queueing Syst 36(1-3):125–140
Ross SM (2007) Introduction to probability models, 9th edn. Elsevier, London
Soong TT (2004) Fundamentals of probability and statistics for engineers. Wiley, Chichester
Yang B, Hu H, Guo S (2009) Cost-oriented task allocation and hardware redundancy policies in heterogeneous distributed computing systems considering software reliability. Comput Ind Eng 56(4):1687–1696
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, B., Tan, F. & Dai, YS. Performance evaluation of cloud service considering fault recovery. J Supercomput 65, 426–444 (2013). https://doi.org/10.1007/s11227-011-0551-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0551-2