Abstract
In modern corporate environments, having a disaster recovery solution is no longer a luxury but a business necessity. The adoption of a disaster recovery solution, however, can be costly and often only affordable by large enterprises. Disaster-Recovery-as-a-Service (DRaaS) is a cloud-based solution that small and medium-sized businesses have been adopting to guarantee availability even in catastrophic situations. It offers lower acquisition and operational costs than traditional solutions. This work presents availability models for evaluating a DRaaS solution taking into account crucial disaster recovery metrics, such as downtime, costs, recovery time objective, and transaction loss. Furthermore, we performed sensitivity analysis on the DRaaS model to determine the parameters that cause the greatest impact on the system availability. Based on these analyses, disaster recovery coordinators can determine the optimum point to recover the damaged system by balancing the cost of system downtime against the cost of resources required for restoring the system. Our numerical results show the effectiveness of a DRaaS solution in terms of downtime, cost, and transition loss.
Similar content being viewed by others
References
Abdallah H, Hamza M (2002) On the sensitivity analysis of the expected accumulated reward. Perform Eval 47(2):163–179
Alves TL, Ypma C, Visser J (2010) Deriving metric thresholds from benchmark data. In: IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10
Andrade E, Maciel P, Callou G, Nogueira B (2008) Mapping uml interaction overview diagram to time petri net for analysis and verification of embedded real-time systems with energy constraints. In: International conference on computational intelligence for modelling control & automation. IEEE, pp 615–620
Andrade EC, Alves M, Matos R, Silva B, Maciel P (2013) Openmads: an open source tool for modeling and analysis of distributed systems. In: Computer safety, reliability, and security. Springer, pp 277–284
Andrade EC, Machida F, Kim DS, Trivedi KS (2011) Modeling and analyzing server system with rejuvenation through sysml and stochastic reward nets. In: Sixth international conference on availability, reliability and security (ARES). IEEE, pp 161–168
Araújo C, Sousa E, Maciel P, Chicout F, Andrade E (2009) Performance modeling for evaluation and planning of electronic funds transfer systems with bursty arrival traffic. In: First international conference on intensive applications and services. IEEE, pp 65–70
Avizienis A, Laprie J, Randell B, Landwehr C (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secure Comput 1(1):11–33
Bell K (2015) Google Drive is back online after brief outage. Marsable. http://mashable.com/2015/10/09/google-docs-sheets-down/
Blake JT, Reibman AL, Trivedi KS (1988) Sensitivity analysis of reliability and performability measures for multiprocessor systems. In: Proceedings of the ACM SIGMETRICS conference on Measurement and modeling of computer systems, pp 177–186
Campos E, Matos R, Maciel P, Pereira A, Souza F (2015) Stochastic modeling of auto scaling mechanism in private clouds for supporting performance tuning. In: IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 109–114
Chisholm P (2008) Prepared to recover from an IT disaster? Certification magazine. http://certmag.com/is-your-company-prepared-to-recover-from-an-it-disaster/
Dantas J, Matos R, Araujo J, Maciel P (2012) An availability model for eucalyptus platform: an analysis of warm-standy replication mechanism. In: IEEE International conference on systems, man, and cybernetics (SMC). IEEE, pp 1664–1669
Dantas J, Matos R, Araujo J, Maciel P (2015) Eucalyptus-based private clouds: availability modeling and comparison to the cost of a public cloud. Computing 97(11):1121–1140
Fiveash K (2015) AWS outage knocks Amazon, Netflix, Tinder and IMDb in MEGA data collapse. The Register. http://www.theregister.co.uk/2015/09/20/aws-database-outage/
Frank PM (1978) Introduction to system sensitivity theory. Academic Press Inc, New York
Hamby DM (1994) A review of techniques for parameter sensitivity analysis of environmental models. Environ Monit Assess 32:135–154
Hoffman F, Gardner R (1983) Evaluation of uncertainties in environmental radiological assessment models. In: Till J, Meyer H (eds) Radiological assessments: a textbook on environmental dose assessment. U.S. Nuclear Regulatory Commission, Washington, DC. Report no. NUREG/CR-3332
Hu T, Guo M, Gu S, Ozaki H, Zheng L, Ota K, Dong M (2010) Mttf of composite web services. In: International symposium on parallel and distributed processing with applications (ISPA). IEEE, pp 130–137
Johnson R (2015) More details on today’s outage. https://www.facebook.com/notes/facebook-engineering/more-details-on-todays-outage/431441338919/
Kaplan RS, Norton DP (1996) Using the balanced scorecard as a strategic management system. Harvard Business Review, Boston
Keeton K, Santos C, Beyer D, Chase J, Wilkes J (2004) Designing for disasters. In: Proceedings of the 3rd USENIX conference on file and storage technologies. USENIX Association, pp 59–62
Kim DS, Machida F, Trivedi KS (2009) Availability modeling and analysis of a virtualized system. In: 15th IEEE Pacific rim international symposium on dependable computing, 2009. IEEE, pp 365–371
Machida F, Andrade E, Kim D, Trivedi K (2011) Candy: component-based availability modeling framework for cloud service management using sysml. In: 30th IEEE symposium on reliable distributed systems (SRDS). IEEE, pp 209–218
Maciel P, Trivedi KS, Matias R, Kim DS (2011) Dependability modeling. In: Performance and dependability in service computing: concepts, techniques and research directions. IGI Global, Hershey
Mao M, Li J, Humphrey M (2010) Cloud auto-scaling with deadline and budget constraints. In: 2010 11th IEEE/ACM international conference on grid computing. IEEE, pp 41–48
Marsan MA, Conte G, Balbo G (1984) A class of generalized stochastic petri nets for the performance evaluation of multiprocessor systems. ACM Trans Comput Syst 2:93–122. doi:10.1145/190.191
Mishra N (2011) Advantages of disaster recovery as a service. Data Center Knowledge. http://www.datacenterknowledge.com/archives/2011/10/25/advantages-of-disaster-recovery-as-a-service/
Molloy MK (1982) Performance analysis using stochastic petri nets. IEEE Trans Comput 31(9):913–917. doi:10.1109/TC.1982.1676110
Muppala JK, Trivedi KS (1990) GSPN models: sensitivity analysis and applications. In: ACM-SE 28: proceedings of the 28th annual southeast regional conference. ACM, New York, NY, USA, pp 25–33. doi:10.1145/98949.98962
Murata T (1989) Petri nets: properties, analysis and applications. Proc IEEE 77(4):541–580
Rackspace: disaster recovery planning. http://www.rackspace.com/disaster-recovery-planning. 2013
Reese G (2009) Cloud application architectures. O’Reilly Media Inc, Newton
Silva B, Maciel P, Tavares E, Zimmermann A (2013) Dependability models for designing disaster tolerant cloud computing systems. In: 43rd annual IEEE/IFIP international conference on dependable systems and networks (DSN), 2013. IEEE, pp 1–6
Silva B, Matos R, Callou G, Figueiredo J, Oliveira D, Ferreira J, Dantas J, Junior A, Alves V, Maciel P (2015) Mercury: An integrated environment for performance and dependability evaluation of general systems. In: Proceedings of industrial track at 45th dependable systems and networks conference (DSN)
Trivedi KS, Andrade EC, Machida F (2012) Combining performance and availability analysis in practice. Adv Comput 84:1–38
Wood T, Cecchet E, Ramakrishnan K, Shenoy P, Van der Merwe J, Venkataramani A (2010) Disaster recovery as a cloud service: economic benefits & deployment challenges. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing. USENIX Association, pp 8–8
Yin B, Dai G, Li Y, Xi H (2007) Sensitivity analysis and estimates of the performance for m/g/1 queueing systems. Perform Eval 64(4):347–356
Acknowledgements
The authors would like to thank FACEPE and CNPq for the financial support provided to this research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Andrade, E., Nogueira, B., Matos, R. et al. Availability modeling and analysis of a disaster-recovery-as-a-service solution. Computing 99, 929–954 (2017). https://doi.org/10.1007/s00607-017-0539-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-017-0539-8