Abstract
Scientific applications adopt cloud environment for executing its workflows as tasks. When a task fails, dependency nature of the workflows affects the overall performance of the execution. An efficient failure prediction mechanism is needed to execute the workflow efficiently. This paper proposes a failure prediction method which is implemented using various machine learning classifiers. Among different classifiers, Naïve Bayes predicts the failure with the highest accuracy of 94.4%. Further, to improve the accuracy of prediction, a novel ensemble method called combine bagging ensemble is introduced and acquires overall accuracy as 95.8%. The validation of proposed method is carried out by comparing simulation and real-time cloud testbed.
Similar content being viewed by others
References
Kumar, S., et al. (2015). Fault Tolerance and Load Balancing algorithm in Cloud Computing: A survey. IJARCCE International Journal of Advanced Research in Computer and Communication Engineering, 4(7), 92–96.
Yu, Z., Wang, C., & Shi, W. (2010). FLAW: FaiLure-Aware Workflow scheduling in high performance computing systems. Journal of Cluster Computing, 13(4), 421–434.
Poola, D., Ramamohanarao, K., & Buyya, R. (2016). Enhancing reliability of workflow execution using task replication and spot instances. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 10(4), 30.
Samak, T., Gunter, D., Goode, M., Deelman, E., Juve, G., Silva, F., & Vahi K. (2012) Failure analysis of distributed scientific workflows executing in the cloud. In Proceedings of the 8th International conference on Network and Service Management (pp. 46–54).
Lin, M., Yao, Z., & Huang, T. (2016). A hybrid push protocol for resource monitoring in cloud computing platforms. Optik-International Journal for Light and Electron Optics, 127(4), 2007–2011.
Huang, H., & Wang, L. (2010). P&p: A combined push–pull model for resource monitoring in cloud computing environment. In IEEE 3rd international conference on cloud computing (CLOUD). IEEE.
Cheraghlou, M. N., Khadem-Zadeh, A., & Haghparast, M. (2015). A survey of fault tolerance architecture in cloud computing. Journal of Network and Computer Applications, 61, 81–92.
Derbeko, P., Dolev, S., Gudes, E., & Sharma, S. (2016). Security and privacy aspects in MapReduce on clouds: a survey. Computer Science Review, 20, 1–28.
Salfner, F., Lenk, M., & Malek, M. (2010). A survey of online failure prediction methods. ACM Computing Surveys, 42, 1–42.
Zheng, Z., Zhou, T. C., Lyu, M. R., & King, I. (2010, November). FTCloud: A component ranking framework for fault-tolerant cloud applications. In IEEE 21st International Symposium on Software Reliability Engineering (ISSRE), 2010 (pp. 398–407), IEEE
Al-Sayed, M. M., Khattab, S., & Omara, F. A. (2016). Prediction mechanisms for monitoring state of cloud resources using Markov chain model. Journal of Parallel and Distributed Computing, 96, 163–171.
Bala, A., & Chana, I. (2015). Intelligent failure prediction models for scientific workflows. Expert Systems with Applications, 42(3), 980–989.
Bui, D. M., & Lee, S. (2016). Fuzzy Fault Detection in IaaS Cloud Computing. In Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication (p. 65), ACM.
Amiri, M., & Mohammad-Khanli, L. (2017). Survey on prediction models of applications for resources provisioning in cloud. Journal of Network and Computer Applications, 82, 93–113.
Deelman, E., et al. (2005). Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming, 13, 219–237.
Deelman, E. (2010). Grids and clouds: Making workflow applications work in heterogeneous distributed environments. The International Journal of High Performance Computing Applications, 24(3), 284–298.
Zhang, Y., Zheng, Z., & Lyu, M. R. (2011, July). BFTCloud: A byzantine fault tolerance framework for voluntary-resource cloud computing. In IEEE International Conference on Cloud Computing (CLOUD), 2011 (pp. 444–451), IEEE.
Pandeeswari, N., & Kumar, G. (2016). Anomaly detection system in cloud environment using fuzzy clustering based ANN. Mobile Networks and Applications, 21(3), 494–505.
Catal, C., & Diri, B. (2009). A systematic review of software fault prediction studies. Expert Systems with Applications, 36, 7346–7354.
Islam, A., Keunga, J., Lee, K., & Liu, A. (2012). Empirical prediction models for adaptive resource provisioning in the cloud. Future Generation Computer Systems, 28, 155–162.
Malhotra, R., & Jain, A. (2012). Fault prediction using statistical and machine learning methods for improving software quality. Journal of information Processing Systems, 8, 241–262.
Islam T, Manivannan D. Predicting Application Failure in Cloud: A Machine Learning Approach. In IEEE International Conference on Cognitive Computing (ICCC), 2017 Jun 25 (pp. 24–31), IEEE.
Bala, A., & Chana, I. (2012). Fault tolerance-challenges, techniques and implementation in cloud computing. IJCSI, 9(1), 288–293.
Gupta, N., Ahuja, N., Malhotra, S., Bala, A., & Kaur, G. (2017). Intelligent heart disease prediction in cloud environment through ensembling. Expert Systems, 34(3), e12207.
Sindrilaru, E., Costan, A., & Cristea, V. (2010, February). Fault tolerance and recovery in grid workflow management systems. In 2010 international conference on complex, intelligent and software intensive systems (pp. 475–480). IEEE.
W. Yoo, A. Sim, and K. Wu, “Machine learning based job status prediction in scientific clusters. In Proceedings 2016 SAI Computing Conference SAI 2016, (pp. 44–53), 2016.
Jhawar, R., Piuri, V., & Santambrogio, M. D. (2012). A comprehensive conceptual system-level approach to fault tolerance in cloud computing. In IEEE international systems conference (pp. 1–5).
Calheiros, R. N., Ranjan, R., Beloglazov, A., Rose, C. A. F. D., & Buyya, R. (2011). CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 41, 23–50.
Chen, W., & Deelman, E. (2012). WorkfowSim: A toolkit for simulating scientific workflows in distributed environments. In IEEE 8th international conference on E-Science, (pp. 1–8).
Juve, G. et al. (2009). Scientific workflow applications on Amazon EC2. In 5th IEEE international conference on E-science workshops, (pp. 59–66).
Amazon Elastic Compute Cloud(Amazon EC2) https://aws.amazon.com/ec2/
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. SIGKDD Explorations, 11.
Catal, C. (2011). Software fault prediction: a literature review and current trends. Expert Systems with Applications, 38(4), 4626–4636.
Mohamed, N, & J. Al-Jaroodi (2012). A collaborative fault-tolerant transfer protocol for replicated data in the cloud. In International Conference on Collaboration Technologies and Systems (CTS), IEEE 2012.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Padmakumari, P., Umamakeswari, A. Task Failure Prediction using Combine Bagging Ensemble (CBE) Classification in Cloud Workflow. Wireless Pers Commun 107, 23–40 (2019). https://doi.org/10.1007/s11277-019-06238-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-019-06238-9