Skip to main content

Advertisement

Log in

Availability modeling and analysis of a disaster-recovery-as-a-service solution

  • Published:
Computing Aims and scope Submit manuscript

Abstract

In modern corporate environments, having a disaster recovery solution is no longer a luxury but a business necessity. The adoption of a disaster recovery solution, however, can be costly and often only affordable by large enterprises. Disaster-Recovery-as-a-Service (DRaaS) is a cloud-based solution that small and medium-sized businesses have been adopting to guarantee availability even in catastrophic situations. It offers lower acquisition and operational costs than traditional solutions. This work presents availability models for evaluating a DRaaS solution taking into account crucial disaster recovery metrics, such as downtime, costs, recovery time objective, and transaction loss. Furthermore, we performed sensitivity analysis on the DRaaS model to determine the parameters that cause the greatest impact on the system availability. Based on these analyses, disaster recovery coordinators can determine the optimum point to recover the damaged system by balancing the cost of system downtime against the cost of resources required for restoring the system. Our numerical results show the effectiveness of a DRaaS solution in terms of downtime, cost, and transition loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Abdallah H, Hamza M (2002) On the sensitivity analysis of the expected accumulated reward. Perform Eval 47(2):163–179

    Article  MATH  Google Scholar 

  2. Alves TL, Ypma C, Visser J (2010) Deriving metric thresholds from benchmark data. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10

  3. Andrade E, Maciel P, Callou G, Nogueira B (2008) Mapping uml interaction overview diagram to time petri net for analysis and verification of embedded real-time systems with energy constraints. In: International conference on computational intelligence for modelling control & automation. IEEE, pp 615–620

  4. Andrade EC, Alves M, Matos R, Silva B, Maciel P (2013) Openmads: an open source tool for modeling and analysis of distributed systems. In: Computer safety, reliability, and security. Springer, pp 277–284

  5. Andrade EC, Machida F, Kim DS, Trivedi KS (2011) Modeling and analyzing server system with rejuvenation through sysml and stochastic reward nets. In: Sixth international conference on availability, reliability and security (ARES). IEEE, pp 161–168

  6. Araújo C, Sousa E, Maciel P, Chicout F, Andrade E (2009) Performance modeling for evaluation and planning of electronic funds transfer systems with bursty arrival traffic. In: First international conference on intensive applications and services. IEEE, pp 65–70

  7. Avizienis A, Laprie J, Randell B, Landwehr C (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secure Comput 1(1):11–33

    Article  Google Scholar 

  8. Bell K (2015) Google Drive is back online after brief outage. Marsable. http://mashable.com/2015/10/09/google-docs-sheets-down/

  9. Blake JT, Reibman AL, Trivedi KS (1988) Sensitivity analysis of reliability and performability measures for multiprocessor systems. In: Proceedings of the ACM SIGMETRICS conference on Measurement and modeling of computer systems, pp 177–186

  10. Campos E, Matos R, Maciel P, Pereira A, Souza F (2015) Stochastic modeling of auto scaling mechanism in private clouds for supporting performance tuning. In: IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 109–114

  11. Chisholm P (2008) Prepared to recover from an IT disaster? Certification magazine. http://certmag.com/is-your-company-prepared-to-recover-from-an-it-disaster/

  12. Dantas J, Matos R, Araujo J, Maciel P (2012) An availability model for eucalyptus platform: an analysis of warm-standy replication mechanism. In: IEEE International conference on systems, man, and cybernetics (SMC). IEEE, pp 1664–1669

  13. Dantas J, Matos R, Araujo J, Maciel P (2015) Eucalyptus-based private clouds: availability modeling and comparison to the cost of a public cloud. Computing 97(11):1121–1140

    Article  MathSciNet  MATH  Google Scholar 

  14. Fiveash K (2015) AWS outage knocks Amazon, Netflix, Tinder and IMDb in MEGA data collapse. The Register. http://www.theregister.co.uk/2015/09/20/aws-database-outage/

  15. Frank PM (1978) Introduction to system sensitivity theory. Academic Press Inc, New York

    MATH  Google Scholar 

  16. Hamby DM (1994) A review of techniques for parameter sensitivity analysis of environmental models. Environ Monit Assess 32:135–154

    Article  Google Scholar 

  17. Hoffman F, Gardner R (1983) Evaluation of uncertainties in environmental radiological assessment models. In: Till J, Meyer H (eds) Radiological assessments: a textbook on environmental dose assessment. U.S. Nuclear Regulatory Commission, Washington, DC. Report no. NUREG/CR-3332

  18. Hu T, Guo M, Gu S, Ozaki H, Zheng L, Ota K, Dong M (2010) Mttf of composite web services. In: International symposium on parallel and distributed processing with applications (ISPA). IEEE, pp 130–137

  19. Johnson R (2015) More details on today’s outage. https://www.facebook.com/notes/facebook-engineering/more-details-on-todays-outage/431441338919/

  20. Kaplan RS, Norton DP (1996) Using the balanced scorecard as a strategic management system. Harvard Business Review, Boston

    Google Scholar 

  21. Keeton K, Santos C, Beyer D, Chase J, Wilkes J (2004) Designing for disasters. In: Proceedings of the 3rd USENIX conference on file and storage technologies. USENIX Association, pp 59–62

  22. Kim DS, Machida F, Trivedi KS (2009) Availability modeling and analysis of a virtualized system. In: 15th IEEE Pacific rim international symposium on dependable computing, 2009. IEEE, pp 365–371

  23. Machida F, Andrade E, Kim D, Trivedi K (2011) Candy: component-based availability modeling framework for cloud service management using sysml. In: 30th IEEE symposium on reliable distributed systems (SRDS). IEEE, pp 209–218

  24. Maciel P, Trivedi KS, Matias R, Kim DS (2011) Dependability modeling. In: Performance and dependability in service computing: concepts, techniques and research directions. IGI Global, Hershey

  25. Mao M, Li J, Humphrey M (2010) Cloud auto-scaling with deadline and budget constraints. In: 2010 11th IEEE/ACM international conference on grid computing. IEEE, pp 41–48

  26. Marsan MA, Conte G, Balbo G (1984) A class of generalized stochastic petri nets for the performance evaluation of multiprocessor systems. ACM Trans Comput Syst 2:93–122. doi:10.1145/190.191

    Article  Google Scholar 

  27. Mishra N (2011) Advantages of disaster recovery as a service. Data Center Knowledge. http://www.datacenterknowledge.com/archives/2011/10/25/advantages-of-disaster-recovery-as-a-service/

  28. Molloy MK (1982) Performance analysis using stochastic petri nets. IEEE Trans Comput 31(9):913–917. doi:10.1109/TC.1982.1676110

    Article  Google Scholar 

  29. Muppala JK, Trivedi KS (1990) GSPN models: sensitivity analysis and applications. In: ACM-SE 28: proceedings of the 28th annual southeast regional conference. ACM, New York, NY, USA, pp 25–33. doi:10.1145/98949.98962

  30. Murata T (1989) Petri nets: properties, analysis and applications. Proc IEEE 77(4):541–580

    Article  Google Scholar 

  31. Rackspace: disaster recovery planning. http://www.rackspace.com/disaster-recovery-planning. 2013

  32. Reese G (2009) Cloud application architectures. O’Reilly Media Inc, Newton

    Google Scholar 

  33. Silva B, Maciel P, Tavares E, Zimmermann A (2013) Dependability models for designing disaster tolerant cloud computing systems. In: 43rd annual IEEE/IFIP international conference on dependable systems and networks (DSN), 2013. IEEE, pp 1–6

  34. Silva B, Matos R, Callou G, Figueiredo J, Oliveira D, Ferreira J, Dantas J, Junior A, Alves V, Maciel P (2015) Mercury: An integrated environment for performance and dependability evaluation of general systems. In: Proceedings of industrial track at 45th dependable systems and networks conference (DSN)

  35. Trivedi KS, Andrade EC, Machida F (2012) Combining performance and availability analysis in practice. Adv Comput 84:1–38

    Article  Google Scholar 

  36. Wood T, Cecchet E, Ramakrishnan K, Shenoy P, Van der Merwe J, Venkataramani A (2010) Disaster recovery as a cloud service: economic benefits & deployment challenges. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing. USENIX Association, pp 8–8

  37. Yin B, Dai G, Li Y, Xi H (2007) Sensitivity analysis and estimates of the performance for m/g/1 queueing systems. Perform Eval 64(4):347–356

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank FACEPE and CNPq for the financial support provided to this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ermeson Andrade.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Andrade, E., Nogueira, B., Matos, R. et al. Availability modeling and analysis of a disaster-recovery-as-a-service solution. Computing 99, 929–954 (2017). https://doi.org/10.1007/s00607-017-0539-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-017-0539-8

Keywords

Mathematics Subject Classification

Navigation