Skip to main content

Fault Characterization and Mitigation Strategies in Desktop Cloud Systems

  • Conference paper
  • First Online:
High Performance Computing (CARLA 2018)

Abstract

Desktop cloud platforms, such as UnaCloud and CernVM, run clusters of virtual machines taking advantage of idle resources on desktop computers. These platforms execute virtual machines along with the applications started by the users in those desktops. Unfortunately, although the use of computer resources is better, desktop user actions, such as turning off the computer or running certain applications may conflict with the virtual machines. Desktop clouds commonly run applications based on technologies such as Tensorflow or Hadoop that rely on master-worker architectures and are sensitive to failures in specific nodes. To support these new types of applications, it is important to understand which failures may interrupt the execution of these clusters, what faults may cause some errors and which strategies can be used to mitigate or tolerate them. Using the UnaCloud platform as a case study, this paper presents an analysis of (1) the failures that may occur in desktop clouds and (2) the mitigation strategies available to improve dependability.

This work has been partially carried out with resources provided by the CYTED cofunded Thematic Network RICAP (517RT0529).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://sistemasproyectos.uniandes.edu.co/iniciativas/unacloud/.

  2. 2.

    https://cernvm.cern.ch/portal/publications.

  3. 3.

    https://www.tensorflow.org/.

  4. 4.

    https://hadoop.apache.org/.

  5. 5.

    http://www.uniandes.edu.co.

References

  1. Alwabel, A., Walters, R., Wills, G.: A view at desktop clouds. In: International Workshop on Emerging Software as a Service and Analytics (ESaaSA 2014), pp. 55–61 (2014)

    Google Scholar 

  2. Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secure Comput. 1(1), 11–33 (2004)

    Article  Google Scholar 

  3. Bakken, D.E., Schlichting, R.D.: Tolerating failures in the bag-of-tasks programming paradigm. In: 21st International Symposium on Fault-Tolerant Computing, FTCS-21, pp. 248–255. IEEE (1991)

    Google Scholar 

  4. Cunsolo, V., Distefano, S., Puliafito, A., Scarpa, M.: Volunteer computing and desktop cloud: the Cloud@Home paradigm. In: 8th IEEE International Symposium on Network Computing and Applications, NCA 2009, pp. 134–139 (2009)

    Google Scholar 

  5. Jonsson, E.: An integrated framework for security and dependability. In: The 1998 Workshop on New Security Paradigms, NSPW 1998, pp. 22–29 (1998)

    Google Scholar 

  6. Jonsson, E.: Towards an integrated conceptual model of security and dependability. In: The First International Conference on Availability, Reliability and Security, ARES 2006, 8 pp. IEEE (2006)

    Google Scholar 

  7. Kangarlou-Haghighi, A.: Improving the reliability and performance of virtual cloud infrastructures. Ph.D. thesis, Purdue University (2011)

    Google Scholar 

  8. Kondo, D.: Scheduling task parallel applications for rapid turnaround on desktop grids. Ph.D. thesis, University of California, San Diego (2005)

    Google Scholar 

  9. Laprie, J.C.: Dependability: basic concepts and terminology. In: Laprie, J.C. (ed.) Dependability Basic Concepts and Terminology. Dependable Computing and Fault-Tolerant Systems, vol. 5. Springer, Vienna (1992). https://doi.org/10.1007/978-3-7091-9170-5_1

    Chapter  MATH  Google Scholar 

  10. Prasad, D., McDermid, J., Wand, I.: Dependability terminology: similarities and differences. In: 10th Annual Conference on Computer Assurance, COMPASS 1995, pp. 213–221. IEEE (1995)

    Google Scholar 

  11. Rosales, E., Castro, H., Villamizar, M.: UnaCloud: opportunistic cloud computing infrastructure as a service. In: Cloud Computing, pp. 187–194 (2011)

    Google Scholar 

  12. Sarmenta, L.F.G.: Volunteer computing. Ph.D. thesis, Massachusetts Institute of Technology (2001)

    Google Scholar 

  13. Segal, B., et al.: LHC cloud computing with CernVM. PoS, p. 004 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos E. Gómez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gómez, C.E., Chavarriaga, J., Castro, H.E. (2019). Fault Characterization and Mitigation Strategies in Desktop Cloud Systems. In: Meneses, E., Castro, H., Barrios Hernández, C., Ramos-Pollan, R. (eds) High Performance Computing. CARLA 2018. Communications in Computer and Information Science, vol 979. Springer, Cham. https://doi.org/10.1007/978-3-030-16205-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-16205-4_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-16204-7

  • Online ISBN: 978-3-030-16205-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics