Abstract:
Modern cloud-native applications, built with containerization and microservices architectures, present significant hurdles for proactive failure prediction and classifica...Show MoreMetadata
Abstract:
Modern cloud-native applications, built with containerization and microservices architectures, present significant hurdles for proactive failure prediction and classification. Traditional anomaly detection and threshold-based algorithms often fall short: they either lack proactive capabilities or fail to provide granular enough output for effective remediation and root cause analysis. This paper proposes a novel, multi-stage framework that tackles these limitations. The framework leverages a combination of time-series forecasting models, such as NHITS, to predict resource consumption patterns. Subsequently, machine learning models like KNN and Random Forest are utilized to classify failures across various dimensions, including service names, HTTP methods, and statuses. This multistage approach empowers fine-grained identification of potential failures and their root causes. Our evaluation demonstrates the framework's effectiveness in a simulated environment. The NHITS model outperformed even cutting-edge architectures in predicting resource utilization. Additionally, the chosen machine learning models exhibited promising accuracy in classifying failure types.
Date of Conference: 27-29 November 2024
Date Added to IEEE Xplore: 31 December 2024
ISBN Information: