Skip to main content

Evaluation of a Failure Prediction Model for Large Scale Cloud Applications

  • Conference paper
  • First Online:
Book cover Advances in Artificial Intelligence (Canadian AI 2020)

Abstract

Modern cloud-based applications, including smart homes and cities require high levels of reliability and availability. All cloud services, including hardware and software experience failures because of their large scale and heterogeneity nature. In this paper, the main objective is to develop a failure prediction model that can early detect failed jobs. The advantage of the proposed model is to enhance resource utilization and to increase the efficiency of cloud applications. The proposed model is evaluated based on three public available traces, which are the Google cluster, Mustang, and Trinity. Moreover, four different machine learning algorithms have been applied to the traces in order to select the best accurate model. Furthermore, we have improved the prediction accuracy using different feature selection techniques. The evaluation results show that the proposed model has achieved a high rate of precision, recall, and f1-score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amvrosiadis, G., et al.: The Atlas cluster trace repository. ;login 43(4), 29–35 (2018)

    Google Scholar 

  2. Chen, X., Lu, C.D., Pattabiraman, K.: Failure analysis of jobs in compute clouds: a Google cluster case study. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering, pp. 167–177. IEEE (2014)

    Google Scholar 

  3. El-Sayed, N., Zhu, H., Schroeder, B.: Learning from failure across multiple clusters: a trace-driven approach to understanding, predicting, and mitigating job terminations. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1333–1344. IEEE (2017)

    Google Scholar 

  4. Jassas, M., Mahmoud, Q.H.: Failure analysis and characterization of scheduling jobs in Google cluster trace. In: IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society, pp. 3102–3107. IEEE (2018)

    Google Scholar 

  5. Jassas, M., Mahmoud, Q.H.: Failure characterization and prediction of scheduling jobs in Google cluster traces. In: 2019 10th IEEE-GCC Conference and Exhibition (GCCCE). IEEE (2019)

    Google Scholar 

  6. Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format+ schema. Google Inc., White Paper, pp. 1–14 (2011)

    Google Scholar 

  7. Ros, A., Chen, L.Y., Binder, W.: Predicting and mitigating jobs failures in big data clusters. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 221–230 (2015)

    Google Scholar 

  8. Snir, M., et al.: Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28(2), 129–173 (2014)

    Article  Google Scholar 

  9. Sun, Y., Xu, L., Li, Y., Guo, L., Ma, Z., Wang, Y.: Utilizing deep architecture networks of VAE in software fault prediction. In: 2018 IEEE International Conference on Parallel & Distributed Processing with Applications, pp. 870–877. IEEE (2018)

    Google Scholar 

Download references

Acknowledgement

The first author would like to thank Umm Al-Qura University, Saudi Arabia for funding this work as part of his graduate scholarship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad S. Jassas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jassas, M.S., Mahmoud, Q.H. (2020). Evaluation of a Failure Prediction Model for Large Scale Cloud Applications. In: Goutte, C., Zhu, X. (eds) Advances in Artificial Intelligence. Canadian AI 2020. Lecture Notes in Computer Science(), vol 12109. Springer, Cham. https://doi.org/10.1007/978-3-030-47358-7_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-47358-7_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-47357-0

  • Online ISBN: 978-3-030-47358-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics