Abstract
Modern cloud-based applications, including smart homes and cities require high levels of reliability and availability. All cloud services, including hardware and software experience failures because of their large scale and heterogeneity nature. In this paper, the main objective is to develop a failure prediction model that can early detect failed jobs. The advantage of the proposed model is to enhance resource utilization and to increase the efficiency of cloud applications. The proposed model is evaluated based on three public available traces, which are the Google cluster, Mustang, and Trinity. Moreover, four different machine learning algorithms have been applied to the traces in order to select the best accurate model. Furthermore, we have improved the prediction accuracy using different feature selection techniques. The evaluation results show that the proposed model has achieved a high rate of precision, recall, and f1-score.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amvrosiadis, G., et al.: The Atlas cluster trace repository. ;login 43(4), 29–35 (2018)
Chen, X., Lu, C.D., Pattabiraman, K.: Failure analysis of jobs in compute clouds: a Google cluster case study. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering, pp. 167–177. IEEE (2014)
El-Sayed, N., Zhu, H., Schroeder, B.: Learning from failure across multiple clusters: a trace-driven approach to understanding, predicting, and mitigating job terminations. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1333–1344. IEEE (2017)
Jassas, M., Mahmoud, Q.H.: Failure analysis and characterization of scheduling jobs in Google cluster trace. In: IECON 2018-44th Annual Conference of the IEEE Industrial Electronics Society, pp. 3102–3107. IEEE (2018)
Jassas, M., Mahmoud, Q.H.: Failure characterization and prediction of scheduling jobs in Google cluster traces. In: 2019 10th IEEE-GCC Conference and Exhibition (GCCCE). IEEE (2019)
Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format+ schema. Google Inc., White Paper, pp. 1–14 (2011)
Ros, A., Chen, L.Y., Binder, W.: Predicting and mitigating jobs failures in big data clusters. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 221–230 (2015)
Snir, M., et al.: Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28(2), 129–173 (2014)
Sun, Y., Xu, L., Li, Y., Guo, L., Ma, Z., Wang, Y.: Utilizing deep architecture networks of VAE in software fault prediction. In: 2018 IEEE International Conference on Parallel & Distributed Processing with Applications, pp. 870–877. IEEE (2018)
Acknowledgement
The first author would like to thank Umm Al-Qura University, Saudi Arabia for funding this work as part of his graduate scholarship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Jassas, M.S., Mahmoud, Q.H. (2020). Evaluation of a Failure Prediction Model for Large Scale Cloud Applications. In: Goutte, C., Zhu, X. (eds) Advances in Artificial Intelligence. Canadian AI 2020. Lecture Notes in Computer Science(), vol 12109. Springer, Cham. https://doi.org/10.1007/978-3-030-47358-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-47358-7_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-47357-0
Online ISBN: 978-3-030-47358-7
eBook Packages: Computer ScienceComputer Science (R0)