Skip to main content
Log in

Robustness improvement of component-based cloud computing systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the increasing popularity of Cloud computing systems, the demand for highly dependable Cloud applications has increased significantly. For this, reliability and availability of Cloud applications are two prominent issues for both the providers and the users of Cloud. However, ensuring these two properties in Cloud applications is often very difficult. This is especially because of the characteristics of the Cloud computing paradigm, which is a combination of hardware and software components in a dynamic setting. In spite of the challenges, it is often a key objective to ensure reliability and availability of such applications to guarantee the expected quality of service (QoS). Many methods, strategies and approaches have been proposed in the existing literature; however, as far as we have investigated, these works do not provide a global solution that could provide reliability, availability and high margin of QoS at the same time (for such systems). In this paper, we propose a novel formal framework for constructing reliable and available Cloud components using the DRB (distributed recovery block) scheme. The aim is to provide a strategy that can enhance Cloud dependability through the uniform treatment of software and hardware faults by constructing fault-masking nodes. A fault-masking node is suitable for handling (i.e., detection and tolerance of faults) software, hardware, and response time faults using both the acceptance test and try blocks to ensure safety and liveness properties at the same time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Fernando N, Loke SW, Rahayu W (2013) Mobile Cloud computing: a survey. Fut Gen Comput Syst 29(1):84–106. https://doi.org/10.1016/j.future.2012.05.023

    Article  Google Scholar 

  2. Cox PA (2011) Mobile Cloud computing: devices, trends, issues and the enabling technologies, Retreived from: https://www.ibm.com/developerworks/cloud/library/clmobilecloudcomputing /index.html, March 11, 2011

  3. Ahmed A, Ahmed E (2016) A survey on mobile edge computing. In: 10th IEEE International Conference on Intelligent Systems and Control, (ISCO), Coimbatore, India. https://doi.org/10.1109/ISCO.2016.7727082

  4. Ahmed E, Gani A, Khan M-K, Bayya R, Khan SU (2015) Seamless application execution in mobile cloud computing: motivation, taxonomy, and open challenges. J Net Comp Appl 52(1):154–172. https://doi.org/10.1016/j.jnca.2015.03.001

    Article  Google Scholar 

  5. Alofe OM, Fatema K (2020) Trustworthy cloud computing. In: Lynn T, Mooney JG, van der Werff L, Fox G (eds) Data privacy and trust in cloud computing. Palgrave studies in digital business & enabling technologies. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-54660-1_7

    Chapter  Google Scholar 

  6. Mell P, Grance T (2009) NIST definition of cloud computing Vol 15, Available: http://www.nist.gov/itl/cloud/upload/cloud-def-v15.pdf

  7. Moore SJ, Nugent CD, Zhang S et al (2020) IoT reliability: a review leading to 5 key research directions. CCF Trans Pervasive Comp Interact 2:147–163. https://doi.org/10.1007/s42486-020-00037-z

    Article  Google Scholar 

  8. Shankland S (2008) Amazon suffers U.S outage on Friday, CNET, San Francisco, California Retreived from: https://www.cnet.com/news/amazon-suffers-u-s-outage-on-friday/, June 6, 2008

  9. Aldowah H, Ul Rehman S, Umar I (2021) Trust in IoT systems: a vision on the current issues, challenges, and recommended solutions. In: Saeed F, Al-Hadhrami T, Mohammed F, Mohammed E (eds) Advances on smart and soft computing. Advances in intelligent systems and computing, vol 1188. Springer, Singapore. https://doi.org/10.1007/978-981-15-6048-4_29

    Chapter  Google Scholar 

  10. Kobie N (2011) Microsoft cloud service office 365 falls over again, alphr, Retreived from: https://www.alphr.com/news/enterprise/369790/microsoft-cloud-service-office-365-falls-over-again/. September 9, 2011

  11. Bilal K, Khalid O, Malik R, Usman M, Khan S (2016) Fault tolerance in the cloud. In: Encyclopedia on cloud computing, Wiley & Sons, Ltd, USA, pp 291–300, Available: https://my.ece.msstate.edu/faculty/skhan/pub/B_K_2015_BC_MB.pdf

  12. Zheng Z, Zhan T, Lyu M, King T (2012) Component ranking for fault tolerant cloud applications. IEEE Trans Serv Comput 5(4):540–550. https://doi.org/10.1109/TSC.2011.42

    Article  Google Scholar 

  13. Bagherzadeh L, Shahinzadeh H, Shayeghi H, Dejamkhooy A, Bayindir R, Iranpour M (2020) Integration of cloud computing and IoT (CloudIoT) in smart grids: benefits, challenges, and solutions. In: 2020 International Conference on Computational Intelligence for Smart Power System and Sustainable Energy (CISPSSE, July 2020), India, (pp 1–8). IEEE. https://doi.org/10.1109/CISPSSE49931.2020.9212195

  14. Winterford B (2009) Stress tests rain on Amazon’s cloud, IT News, 20 August 2009, Available: http://www.itnews.com.au/News/153451,stress-tests-rain-on-amazons-cloud.aspx

  15. Khomh F (2014) On improving the dependability of cloud applications with fault tolerance, in: WISCA 2014, Sydney, Australia, Available: https://dl.acm.org/doi/https://doi.org/10.1145/2578128.2578228

  16. Laprie JC (1992) Dependability: basic concepts and terminology. In: Laprie JC (ed) Dependability: basic concepts and terminology. Dependable computing and fault-tolerant systems, vol 5. Springer, Vienna. https://doi.org/10.1145/2578128.2578228

    Chapter  MATH  Google Scholar 

  17. Laprie JC (1995) Dependability—its attributes, impairments and means. In: Randell B, Laprie JC, Kopetz H, Littlewood B (eds) Predictably dependable computing systems. ESPRIT basic research series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-79789-7_1

    Chapter  MATH  Google Scholar 

  18. Stanisavljević M, Schmid A, Leblebici Y (2011) Reliability, faults, and fault tolerance. In: Stanisavljević M, Schmid A, Leblebici Y (eds) Reliability of nanoscale circuits and systems. Springer, New York, pp 7–18. https://doi.org/10.1007/978-1-4419-6217-1_2

    Chapter  Google Scholar 

  19. Lamport L (1977) Proving the correctness of multiprocess programs. IEEE Trans Softw Eng 3(2):125–143. https://doi.org/10.1109/TSE.1977.229904

    Article  MathSciNet  MATH  Google Scholar 

  20. Alpern B, Schneider FB (1985) Defining liveness. Inf Proc Lett 21(4):181–185. https://doi.org/10.1016/0020-0190(85)90056-0

    Article  MathSciNet  MATH  Google Scholar 

  21. Avizienis A (1976) Fault-tolerant systems. IEEE Trans Comput 25(12):1304–1312. https://doi.org/10.1109/TC.1976.1674598

    Article  MathSciNet  MATH  Google Scholar 

  22. Mansouri H, Pathan A-SK (2021) A communication-induced checkpointing algorithm for consistent-transaction in distributed database systems. In: Thampi SM et al (eds) Eight International Symposium on Security in Computing and Communications (SSCC 2020), Oct 14–17, 2020, Chennai, India, Communications in Computer and Information Science (CCIS), vol 1364, Springer, pp 21–32

  23. Randel B (1975) System structure for software fault tolerance. IEEE Trans Softw Eng 1(2):220–232. https://doi.org/10.1109/TSE.1975.6312842

    Article  Google Scholar 

  24. Horning JJ, Lauer HC, Melliar-Smith PM, Randell B (1974) A program structure for error detection and recovery. In: Gelenbe E, Kaiser C (eds) Operating systems. OS 1974. Lecture notes in computer science, vol 16. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0029359

  25. Kim KH, Welch HO (1989) Distributed execution of recovery blocks: an approach for uniform treatment of hardware and software faults in real-time applications. IEEE Trans Comput 38(5):626–636. https://doi.org/10.1109/12.24266

    Article  Google Scholar 

  26. Shahid MA, Islam N, Alam MM, Mazliham MS, Musa S (2021) Towards resilient method: an exhaustive survey of fault tolerance methods in the cloud computing environment. Comput Sci Rev 40:100398. https://doi.org/10.1016/j.cosrev.2021.100398

    Article  Google Scholar 

  27. Basu A, Bozga M, Sifakis J (2006) Modeling heterogeneous real-time components in BIP. In: IEEE International Conference on Software Engineering and Formal Methods (SEFM), pp 3–12.  https://doi.org/10.1109/SEFM.2006.27

  28. Bozga M, Sfyrla V, Sifakis J (2009) Modeling synchronous systems in BIP. In: ACM International Conference on Embedded Software (EMSOFT), pp 77–86. https://doi.org/10.1145/1629335.1629347

  29. Basu A, Mounier L, Poulhies M, Pulou J, Sifakis J (2007) Using BIP for modeling and verification of networked systems–a case study on TinyOS-based networks. In: IEEE International Symposium on Network Computing and Applications, (NCA), Cambridge, MA, pp 257–260.  https://doi.org/10.1109/NCA.2007.52

  30. UPPAAL (2021) available at: https://uppaal.org/ (last accessed: 20 August, 2021)

  31. Behrmann G, David A, Larsen KG (2004) A tutorial on UPPAAL. In: Bernardo M, Corradini F (eds) Formal methods for the design of real-time systems. SFM-RT 2004. Lecture notes in computer science, vol 3185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30080-9_7

  32. Ganesh A, Sandhya M, Shankar S (2014) A study of fault tolerance methods in cloud computing. In: IEEE International Advance Computing Conference (IACC), India, pp 844–849.  https://doi.org/10.1109/IAdCC.2014.6779432

  33. Zhang Y, Zheng Z, Lyn MR (2011) BFT Cloud, A byzantine fault tolerance framework for voluntary-resource cloud computing. In: IEEE International Conference on Cloud Computing, Washington, US, pp 444–451. https://doi.org/10.1109/CLOUD.2011.16

  34. Jia Z, Chen R, Xing X, Xu J, Xie Y (2014) SFDCloud: top-k service faults diagnosis in cloud computing. Autom Softw Eng 21(4):461–488. https://doi.org/10.1007/s10515-013-0137-8

    Article  Google Scholar 

  35. Choi SK, Chung K, Yu H (2014) Fault tolerance and QoS scheduling using CAN in mobile social cloud computing. Clus Comp 17(1):911–926. https://doi.org/10.1007/s10586-013-0286-3

    Article  Google Scholar 

  36. Jing D, Scott H, Yunghsiang H, Julia D (2010) Fault-tolerant and reliable computation in cloud computing. In: Globecom, Miami, FL, USA, pp 1601–1605. https://doi.org/10.1109/GLOCOMW.2010.5700210

  37. Sun D, Chang G, Miao C, Wang X (2013) Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments. J Supercomp 66(1):193–228. https://doi.org/10.1007/s11227-013-0898-7

    Article  Google Scholar 

  38. Yi H, Bin G, Fengyu W (2010) Cloud model-based security-aware and fault tolerant job scheduling for computing grid. In: ChinaGrid, Guangzhou, China, 25–30. https://doi.org/10.1109/ChinaGrid.2010.36

  39. Malik S, Huet F (2011) Adaptive fault tolerance in real-time cloud computing In: IEEE World Congress on Services, Washington, USA, pp 280–287. https://doi.org/10.1109/SERVICES.2011.108

  40. Wu Y, Yuan Y, Yang G, Zheng W (2010) An adaptive task-level fault tolerant approach to Grid. J SuperComp 51(1):97–114. https://doi.org/10.1007/s11227-009-0276-7

    Article  Google Scholar 

  41. Lim JB, Gip JM, Chung KS, Kang J, Lee D, Yu H (2014) Gossip membership management with social graphs for byzantine fault tolerance in clouds. In: Hsu CH, Shi X, Salapura V (eds) Network and parallel computing. NPC. Lecture notes in computer science, Springer, Berlin, Heidelberg, vol 8707. https://doi.org/10.1007/978-3-662-44917-2_27

  42. Liu J, Zhou J, Buyya R (2015) Software rejuvenation based fault tolerance scheme for cloud applications. In: IEEE International Conference on Cloud Computing, New York, USA, pp 1115–1118.  https://doi.org/10.1109/CLOUD.2015.164

  43. Garg A, Bagga S (2015) An autonomic approach for fault tolerance using scaling, replication and monitoring in cloud computing. In: International Conference on MOOCs, Innovation and Technology in Education, (MITE), Amritsar, India, pp 129–134.  https://doi.org/10.1109/MITE.2015.7375302

  44. Garraghan P, Townend P, Xu J, Yang X, Sui P (2013) Using byzantine fault-tolerance to improve dependability in federated cloud computing. Inter J Soft Inform 7(2):221–237

    Google Scholar 

  45. Alannsary M, Tian J (2016) Measurement and prediction of SaaS reliability in the cloud. In: IEEE International Conference on Software Quality, Reliability and Security Companion, (QRS-C), Vienna, Austria, pp 123–130.  https://doi.org/10.1109/QRS-C.2016.20

  46. Mohamed B, Kiran M, Awan I-U, Maiyama KM (2016) Optimizing fault tolerance in real-time cloud computing IaaS environment. In: IEEE International Conference on Future Internet of Things and Cloud, (FiCloud), Vienna, Austria, pp 363–370.  https://doi.org/10.1109/FiCloud.2016.58

  47. Reddy CM, Nalini N (2014) FT2R2Cloud: fault tolerance using timeout and retransmission of requests for cloud applications. In: International Conference on Advances in Electronics, Computers and Communications (ICAECC), Bangalore, India.  https://doi.org/10.1109/ICAECC.2014.7002396

  48. Zheng Z, Lyu MR (2015) Selecting an optimal fault tolerance strategy for reliable service-oriented systems with local and global constraints. IEEE Trans Comput 64(1):219–232. https://doi.org/10.1109/TC.2013.189

    Article  MathSciNet  MATH  Google Scholar 

  49. Chen G, Chen G, Jin H, Zou D, Zhou BB (2015) A lightweight software fault tolerance system in the cloud environment. Concur Comp: Pract Exper 27(12):2982–2998. https://doi.org/10.1002/cpe.3190

    Article  Google Scholar 

  50. Moghtadaeipour A, Tavoli R (2015) A new approach to improve load balancing for increasing fault tolerance and decreasing energy consumption in cloud computing. In: 2nd International Conference on Knowledge-based Engineering and Innovation (KBEI), Tehran, Iran.  https://doi.org/10.1109/KBEI.2015.7436178

  51. Andrade E, Nogueira B (2020) Dependability evaluation of a disaster recovery solution for IoT infrastructures. J Supercomput 76:1828–1849. https://doi.org/10.1007/s11227-018-2290-0

    Article  Google Scholar 

  52. Kousalya A, Sakthidasan K, Latha A (2021) Reliable service availability and access control method for cloud assisted IOT communications. Wireless Netw 27:881–892. https://doi.org/10.1007/s11276-019-02184-3

    Article  Google Scholar 

  53. Elrotub M, Bali A, Gherbi A (2021) Sharing VM resources with using prediction of future user requests for an efficient load balancing in cloud computing environment. Int J Softw Sci Comput Intell 13(2):37–64. https://doi.org/10.4018/IJSSCI.2021040103

    Article  Google Scholar 

  54. Abadi M, Lamport L (1991) The existence of refinement mappings. Theo Comp Sci 82(2):253–284. https://doi.org/10.1016/0304-3975(91)90224-P

    Article  MathSciNet  MATH  Google Scholar 

  55. Kim KH (1995) The distributed recovery block scheme. In: Lyu MR (ed) Software fault tolerance. John Wiley & Sons Ltd, New York

    Google Scholar 

  56. Humphrey JLM, Cheah YW, Ryu Y (2010) Fault tolerance and scaling in e-Science cloud applications: observations from the continuing development of MODISAzure. In: IEEE International Conference on e-Science, Brisbane, QLD, Australia, pp 1–8. https://doi.org/10.1109/eScience.2010.47

  57. Sha W, Zhu Y, Chen M, Huang T (2018) Statistical learning for anomaly detection in cloud server systems: A multi-order Markov chain framework. IEEE Trans Cloud Comput 6(2):401–413. https://doi.org/10.1109/TCC.2015.2415813

    Article  Google Scholar 

  58. Aharwir M-K, Ahirwar M-K, Chourasia U (2014) Anomaly detection in the services provided by multi cloud architectures; a survey. Int J Res Eng Technol 3(9):196–200

    Article  Google Scholar 

  59. Sari A (2015) A review of anomaly detection systems in cloud network and survey of cloud security measures in cloud storage applications. J Inf Sec 6(2):142–154. https://doi.org/10.4236/jis.2015.62015

    Article  Google Scholar 

  60. Osman IM, Elshoush HT (2011) Alert correlation in collaborative intelligent intrusion detection systems- a survey. Appl Soft Comp J 11(7):4349–4365. https://doi.org/10.1016/j.asoc.2010.12.004

    Article  Google Scholar 

  61. Wu SX, Banzhaf W (2010) The use of computational intelligence in intrusion detection systems: a review. Appl Soft Comp J 10(1):1–35. https://doi.org/10.1016/j.asoc.2009.06.019

    Article  Google Scholar 

  62. Pathan A-SK (2014) The state of the art in intrusion prevention and detection. CRC Press, Taylor & Francis Group, USA, ISBN 9781482203516

  63. Smara M, Aliouat M, Pathan A-SK, Aliouat Z (2017) Acceptance test for fault detection in component-based cloud computing and systems. Future Gener Comput Syst 70:74–93

    Article  Google Scholar 

  64. Gupta I, Chandra TD, Goldszmidt GS (2001) On scalable and efficient distributed failure detectors. In: ACM Symposium on Principles of Distributed Computing, pp 170–179. https://doi.org/10.1145/383962.384010

  65. Ganesh A, Sandhya M, Shankar S (2014) A study of fault tolerance methods in cloud computing. In: IEEE International Advance Computing Conference, (IACC), India, pp 844–849.  https://doi.org/10.1109/IAdCC.2014.6779432

  66. Chamoli SK, Rana DS (2015) Fault tolerance and load balancing algorithm in cloud computing: a survey. Int J Adv Res Comput Commun Eng 4(7):92–96

    Google Scholar 

  67. Amin Z, Sethi N, Singh H (2015) Review on fault tolerance techniques in cloud computing. Int J Comput Appl 116(18):11–17

    Google Scholar 

  68. Hosseini SM, Arani MG (2015) Fault tolerance techniques in cloud storage: a survey. Int J Datab Theo Appl 8(4):183–190. https://doi.org/10.14257/ijdta.2015.8.4.19

    Article  Google Scholar 

  69. Basu A, Mounier L, Poulhies M, Pulou J, Sifakis J (2007) Using BIP for modeling and verification of networked systems–a case study on TinyOS-based networks. In: IEEE International Symposium on Network Computing and Applications, NCA, Cambridge, MA, USA, pp 257–260.  https://doi.org/10.1109/NCA.2007.52

  70. de Silva L, Yan R, Ingrand F, Rachid A, Bensalem S (2015) A verifiable and correct-by-construction controller for robots in human environments. In: ACM/IEEE International Conference on Human-Robot Interaction, (HRI ACM New York, NY, USA, pp 281- 281. Available in: https://homepages.laas.fr/felix/publis-pdf/drhe10.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mounya Smara.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Smara, M., Aliouat, M., Harous, S. et al. Robustness improvement of component-based cloud computing systems. J Supercomput 78, 4977–5009 (2022). https://doi.org/10.1007/s11227-021-04054-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-04054-2

Keywords

Navigation