Performance evaluation of cloud service considering fault recovery

Yang, Bo; Tan, Feng; Dai, Yuan-Shun

doi:10.1007/s11227-011-0551-2

Performance evaluation of cloud service considering fault recovery

Published: 23 February 2011

Volume 65, pages 426–444, (2013)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Bo Yang¹,
Feng Tan² &
Yuan-Shun Dai^1,3

941 Accesses
64 Citations
Explore all metrics

Abstract

Cloud computing is a recent trend in IT, which has attracted lots of attention. In cloud computing, service reliability and service performance are two important issues. To improve cloud service reliability, fault tolerance techniques such as fault recovery may be used, which in turn has impact on cloud service performance. Such impact deserves detailed research. Although there exist some researches on cloud/grid service reliability and performance, very few of them addressed the issues of fault recovery and its impact on service performance. In this paper, we conduct detailed research on performance evaluation of cloud service considering fault recovery. We consider recovery on both processing nodes and communication links. The commonly adopted assumption of Poisson arrivals of users’ service requests is relaxed, and the interarrival times of service requests can take arbitrary probability distribution. The precedence constraints of subtasks are also considered. The probability distribution of service response time is derived, and a numerical example is presented. The proposed cloud performance evaluation models and methods could yield results which are realistic, and thus are of practical value for related decision-makings in cloud computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Armbrust M, Fox A, Griffith R et al (2009) Above the clouds: a Berkeley view of cloud computing. Technical Report No UCB/EECS-2009-28, University of California at Berkeley
Armbrust M, Fox A, Griffith R et al (2010) A view of cloud computing. Commun ACM 53(4):50–58
Article Google Scholar
Buyya R, Yeo CS, Venugopal S (2008) Market-oriented cloud computing: Vision, hype, and reality for delivering IT services as computing utilities. In: Proceedings of the 10th IEEE international conference on high performance computing and communications (HPCC2008), pp 5–13
Chapter Google Scholar
Foster I, Zhao Y, Raicu I et al (2008) Cloud computing and grid computing 360-degree compared. In: Proceedings of grid computing environments workshop (GCE ’08), pp 1–10
Chapter Google Scholar
di Costanzo A, de Assunção MD, Buyya R (2009) Harnessing cloud technologies for a virtualized distributed computing infrastructure. IEEE Internet Comput 13(5):24–33
Article Google Scholar
Dikaiakos MD, Katsaros D, Mehra P et al (2009) Cloud computing: distributed Internet computing for IT and scientific research. IEEE Internet Comput 13(5):10–11
Article Google Scholar
http://aws.amazon.com/ec2/
http://www.googlecloud.com/
http://www.ibm.com/ibm/cloud/
http://www.microsoft.com/azure
Lyu MR (ed) (1995) Software fault tolerance. Wiley, New York
Google Scholar
Avižienis A, Laprie J-C, Randell B et al (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secure Comput 1(1):11–33
Article Google Scholar
Hwang S, Kesselman C (2003) A flexible framework for fault tolerance in the grid. J Grid Comput 1(3):251–272
Article MATH Google Scholar
Quan DM (2007) Error recovery mechanism for grid-based workflow within SLA context. Int J High Perform Comput Netw 5(1–2):110–121
Article Google Scholar
Lee H, Chung K, Chin S et al (2005) A resource management and fault tolerance services in grid computing. J Parallel Distrib Comput 65(11):1305–1317
Article Google Scholar
Litke A, Skoutas D, Tserpes K et al (2007) Efficient task replication and management for adaptive fault tolerance in mobile grid environments. Future Gener Comput Syst 23(2):163–178
Article Google Scholar
Nazir B, Qureshi K, Manuel P (2009) Adaptive checkpointing strategy to tolerate faults in economy based grid. J Supercomput 50(1):1–18
Article Google Scholar
Kandaswamy G, Mandal A, Reed DA (2008) Fault tolerance and recovery of scientific workflows on computational grids. In: Proceedings of the 8th IEEE international symposium on cluster computing and the grid (CCGRID 2008), pp 777–782
Chapter Google Scholar
Guo S, Yang B, Huang HZ (2009) Grid service reliability modeling on fault recovery and optimal task scheduling. In: Proceedings of the 55th annual reliability & maintainability symposium (RAMS2009), pp 471–476
Google Scholar
Ramanathan P, Shin KG (1993) Use of common time base for checkpointing and rollback recovery in a distributed system. IEEE Trans Softw Eng 19(6):571–583
Article Google Scholar
Pradhan DK, Vaidya NH (1994) Roll-forward checkpointing scheme: a novel fault-tolerant architecture. IEEE Trans Comput 43(10):1163–1174
Article MATH Google Scholar
Yang B, Tan F, Dai YS, Guo S (2009) Performance evaluation of cloud service considering fault recovery. In: Proceedings of the first international conference on cloud computing (CloudCom 2009). Lecture notes in computer science, vol 5931, pp 571–576
Google Scholar
Dai YS, Levitin G (2006) Reliability and performance of tree-structured grid services. IEEE Trans Reliab 55(2):337–349
Article Google Scholar
Dai YS, Levitin G, Trivedi KS (2007) Performance and reliability of tree-structured grid services considering data dependence and failure correlation. IEEE Trans Comput 56(7):925–936
Article MathSciNet Google Scholar
Dai YS, Levitin G (2007) Optimal resource allocation for maximizing performance and reliability in tree-structured grid services. IEEE Trans Reliab 56(3):444–453
Article MathSciNet Google Scholar
Levitin G, Dai YS, Ben-Haim H (2006) Reliability and performance of star topology grid service with precedence constraints on subtask execution. IEEE Trans Reliab 55(3):507–515
Article Google Scholar
Levitin G, Dai YS (2007) Service reliability and performance in grid system with star topology. Reliab Eng Syst Saf 92(1):40–46
Article Google Scholar
Levitin G, Dai YS (2008) Optimal service task partition and distribution in grid system with star topology. Reliab Eng Syst Saf 93(1):152–159
Article Google Scholar
Kong X, Huang J, Lin C et al (2009) Performance, fault-tolerance and scalability analysis of virtual infrastructure management system. In: Proceedings of the 2009 IEEE international symposium on parallel and distributed processing with applications (ISPA 2009), pp 282–289
Chapter Google Scholar
Krauter K, Buyya R, Maheswaran M (2002) A taxonomy and survey of grid resource management systems for distributed computing. Softw Pract Exp 32(2):135–164
Article MATH Google Scholar
Dai YS, Pan Y, Zou X (2007) A hierarchical modeling and analysis for grid service reliability. IEEE Trans Comput 56(5):681–691
Article MathSciNet Google Scholar
Vijaya Laxmi P, Gupta UC (2000) Analysis of finite-buffer multi-server queues with group arrivals: GI ^X/M/c/N. Queueing Syst 36(1-3):125–140
Article MathSciNet MATH Google Scholar
Ross SM (2007) Introduction to probability models, 9th edn. Elsevier, London
Google Scholar
Soong TT (2004) Fundamentals of probability and statistics for engineers. Wiley, Chichester
MATH Google Scholar
Yang B, Hu H, Guo S (2009) Cost-oriented task allocation and hardware redundancy policies in heterogeneous distributed computing systems considering software reliability. Comput Ind Eng 56(4):1687–1696
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
Bo Yang & Yuan-Shun Dai
Department of Industrial Engineering, University of Electronic Science and Technology of China, Chengdu, China
Feng Tan
Innovative Computing Laboratory, University of Tennessee, Knoxville, USA
Yuan-Shun Dai

Authors

Bo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Tan
View author publications
You can also search for this author in PubMed Google Scholar
Yuan-Shun Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, B., Tan, F. & Dai, YS. Performance evaluation of cloud service considering fault recovery. J Supercomput 65, 426–444 (2013). https://doi.org/10.1007/s11227-011-0551-2

Download citation

Published: 23 February 2011
Issue Date: July 2013
DOI: https://doi.org/10.1007/s11227-011-0551-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance evaluation of cloud service considering fault recovery

Abstract

Access this article

Similar content being viewed by others

Survey on Fault-Tolerance-Aware Scheduling in Cloud Computing

On Stochastic Performance and Cost-Aware Optimal Capacity Planning of Unreliable Infrastructure-as-a-Service Cloud

A novel method for adaptive fault tolerance during load balancing in cloud computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance evaluation of cloud service considering fault recovery

Abstract

Access this article

Similar content being viewed by others

Survey on Fault-Tolerance-Aware Scheduling in Cloud Computing

On Stochastic Performance and Cost-Aware Optimal Capacity Planning of Unreliable Infrastructure-as-a-Service Cloud

A novel method for adaptive fault tolerance during load balancing in cloud computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation