Skip to main content
Log in

Performance evaluation of cloud service considering fault recovery

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Cloud computing is a recent trend in IT, which has attracted lots of attention. In cloud computing, service reliability and service performance are two important issues. To improve cloud service reliability, fault tolerance techniques such as fault recovery may be used, which in turn has impact on cloud service performance. Such impact deserves detailed research. Although there exist some researches on cloud/grid service reliability and performance, very few of them addressed the issues of fault recovery and its impact on service performance. In this paper, we conduct detailed research on performance evaluation of cloud service considering fault recovery. We consider recovery on both processing nodes and communication links. The commonly adopted assumption of Poisson arrivals of users’ service requests is relaxed, and the interarrival times of service requests can take arbitrary probability distribution. The precedence constraints of subtasks are also considered. The probability distribution of service response time is derived, and a numerical example is presented. The proposed cloud performance evaluation models and methods could yield results which are realistic, and thus are of practical value for related decision-makings in cloud computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Armbrust M, Fox A, Griffith R et al (2009) Above the clouds: a Berkeley view of cloud computing. Technical Report No UCB/EECS-2009-28, University of California at Berkeley

  2. Armbrust M, Fox A, Griffith R et al (2010) A view of cloud computing. Commun ACM 53(4):50–58

    Article  Google Scholar 

  3. Buyya R, Yeo CS, Venugopal S (2008) Market-oriented cloud computing: Vision, hype, and reality for delivering IT services as computing utilities. In: Proceedings of the 10th IEEE international conference on high performance computing and communications (HPCC2008), pp 5–13

    Chapter  Google Scholar 

  4. Foster I, Zhao Y, Raicu I et al (2008) Cloud computing and grid computing 360-degree compared. In: Proceedings of grid computing environments workshop (GCE ’08), pp 1–10

    Chapter  Google Scholar 

  5. di Costanzo A, de Assunção MD, Buyya R (2009) Harnessing cloud technologies for a virtualized distributed computing infrastructure. IEEE Internet Comput 13(5):24–33

    Article  Google Scholar 

  6. Dikaiakos MD, Katsaros D, Mehra P et al (2009) Cloud computing: distributed Internet computing for IT and scientific research. IEEE Internet Comput 13(5):10–11

    Article  Google Scholar 

  7. http://aws.amazon.com/ec2/

  8. http://www.googlecloud.com/

  9. http://www.ibm.com/ibm/cloud/

  10. http://www.microsoft.com/azure

  11. Lyu MR (ed) (1995) Software fault tolerance. Wiley, New York

    Google Scholar 

  12. Avižienis A, Laprie J-C, Randell B et al (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secure Comput 1(1):11–33

    Article  Google Scholar 

  13. Hwang S, Kesselman C (2003) A flexible framework for fault tolerance in the grid. J Grid Comput 1(3):251–272

    Article  MATH  Google Scholar 

  14. Quan DM (2007) Error recovery mechanism for grid-based workflow within SLA context. Int J High Perform Comput Netw 5(1–2):110–121

    Article  Google Scholar 

  15. Lee H, Chung K, Chin S et al (2005) A resource management and fault tolerance services in grid computing. J Parallel Distrib Comput 65(11):1305–1317

    Article  Google Scholar 

  16. Litke A, Skoutas D, Tserpes K et al (2007) Efficient task replication and management for adaptive fault tolerance in mobile grid environments. Future Gener Comput Syst 23(2):163–178

    Article  Google Scholar 

  17. Nazir B, Qureshi K, Manuel P (2009) Adaptive checkpointing strategy to tolerate faults in economy based grid. J Supercomput 50(1):1–18

    Article  Google Scholar 

  18. Kandaswamy G, Mandal A, Reed DA (2008) Fault tolerance and recovery of scientific workflows on computational grids. In: Proceedings of the 8th IEEE international symposium on cluster computing and the grid (CCGRID 2008), pp 777–782

    Chapter  Google Scholar 

  19. Guo S, Yang B, Huang HZ (2009) Grid service reliability modeling on fault recovery and optimal task scheduling. In: Proceedings of the 55th annual reliability & maintainability symposium (RAMS2009), pp 471–476

    Google Scholar 

  20. Ramanathan P, Shin KG (1993) Use of common time base for checkpointing and rollback recovery in a distributed system. IEEE Trans Softw Eng 19(6):571–583

    Article  Google Scholar 

  21. Pradhan DK, Vaidya NH (1994) Roll-forward checkpointing scheme: a novel fault-tolerant architecture. IEEE Trans Comput 43(10):1163–1174

    Article  MATH  Google Scholar 

  22. Yang B, Tan F, Dai YS, Guo S (2009) Performance evaluation of cloud service considering fault recovery. In: Proceedings of the first international conference on cloud computing (CloudCom 2009). Lecture notes in computer science, vol 5931, pp 571–576

    Google Scholar 

  23. Dai YS, Levitin G (2006) Reliability and performance of tree-structured grid services. IEEE Trans Reliab 55(2):337–349

    Article  Google Scholar 

  24. Dai YS, Levitin G, Trivedi KS (2007) Performance and reliability of tree-structured grid services considering data dependence and failure correlation. IEEE Trans Comput 56(7):925–936

    Article  MathSciNet  Google Scholar 

  25. Dai YS, Levitin G (2007) Optimal resource allocation for maximizing performance and reliability in tree-structured grid services. IEEE Trans Reliab 56(3):444–453

    Article  MathSciNet  Google Scholar 

  26. Levitin G, Dai YS, Ben-Haim H (2006) Reliability and performance of star topology grid service with precedence constraints on subtask execution. IEEE Trans Reliab 55(3):507–515

    Article  Google Scholar 

  27. Levitin G, Dai YS (2007) Service reliability and performance in grid system with star topology. Reliab Eng Syst Saf 92(1):40–46

    Article  Google Scholar 

  28. Levitin G, Dai YS (2008) Optimal service task partition and distribution in grid system with star topology. Reliab Eng Syst Saf 93(1):152–159

    Article  Google Scholar 

  29. Kong X, Huang J, Lin C et al (2009) Performance, fault-tolerance and scalability analysis of virtual infrastructure management system. In: Proceedings of the 2009 IEEE international symposium on parallel and distributed processing with applications (ISPA 2009), pp 282–289

    Chapter  Google Scholar 

  30. Krauter K, Buyya R, Maheswaran M (2002) A taxonomy and survey of grid resource management systems for distributed computing. Softw Pract Exp 32(2):135–164

    Article  MATH  Google Scholar 

  31. Dai YS, Pan Y, Zou X (2007) A hierarchical modeling and analysis for grid service reliability. IEEE Trans Comput 56(5):681–691

    Article  MathSciNet  Google Scholar 

  32. Vijaya Laxmi P, Gupta UC (2000) Analysis of finite-buffer multi-server queues with group arrivals: GI X/M/c/N. Queueing Syst 36(1-3):125–140

    Article  MathSciNet  MATH  Google Scholar 

  33. Ross SM (2007) Introduction to probability models, 9th edn. Elsevier, London

    Google Scholar 

  34. Soong TT (2004) Fundamentals of probability and statistics for engineers. Wiley, Chichester

    MATH  Google Scholar 

  35. Yang B, Hu H, Guo S (2009) Cost-oriented task allocation and hardware redundancy policies in heterogeneous distributed computing systems considering software reliability. Comput Ind Eng 56(4):1687–1696

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, B., Tan, F. & Dai, YS. Performance evaluation of cloud service considering fault recovery. J Supercomput 65, 426–444 (2013). https://doi.org/10.1007/s11227-011-0551-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0551-2

Keywords

Navigation