Task Failure Prediction in Cloud Data Centers Using Deep Learning | IEEE Journals & Magazine | IEEE Xplore

Task Failure Prediction in Cloud Data Centers Using Deep Learning


Abstract:

A large-scale cloud data center needs to provide high service reliability and availability with low failure occurrence probability. However, current large-scale cloud dat...Show More

Abstract:

A large-scale cloud data center needs to provide high service reliability and availability with low failure occurrence probability. However, current large-scale cloud data centers still face high failure rates due to many reasons such as hardware and software failures, which often result in task and job failures. Such failures can severely reduce the reliability of cloud services and also occupy huge amount of resources to recover the service from failures. Therefore, it is important to predict task or job failures before occurrence with high accuracy to avoid unexpected wastage. Many machine learning and deep learning based methods have been proposed for the task or job failure prediction by analyzing past system message logs and identifying the relationship between the data and the failures. In order to further improve the failure prediction accuracy of the previous machine learning and deep learning based methods, in this article, we propose a failure prediction algorithm based on multi-layer Bidirectional Long Short Term Memory (Bi-LSTM) to identify task and job failures in the cloud. The goal of Bi-LSTM failure prediction algorithm is to predict whether the tasks and jobs are failed or completed. The trace-driven experiments show that our algorithm outperforms other state-of-art prediction methods with 93 percent accuracy and 87 percent for task failure and job failures respectively.
Published in: IEEE Transactions on Services Computing ( Volume: 15, Issue: 3, 01 May-June 2022)
Page(s): 1411 - 1422
Date of Publication: 11 May 2020

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.