DeepScaling: Autoscaling Microservices With Stable CPU Utilization for Large Scale Production Cloud Systems | IEEE Journals & Magazine | IEEE Xplore
Scheduled Maintenance: On Tuesday, 25 February, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (1800-2200 UTC). During this time, there may be intermittent impact on performance. We apologize for any inconvenience.

DeepScaling: Autoscaling Microservices With Stable CPU Utilization for Large Scale Production Cloud Systems


Abstract:

Cloud service providers often provision excessive resources to meet the desired Service Level Objectives (SLOs), by setting lower CPU utilization targets. This can result...Show More

Abstract:

Cloud service providers often provision excessive resources to meet the desired Service Level Objectives (SLOs), by setting lower CPU utilization targets. This can result in a waste of resources and a noticeable increase in power consumption in large-scale cloud deployments. To address this issue, this paper presents DeepScaling, an innovative solution for minimizing resource cost while ensuring SLO requirements are met in a dynamic, large-scale production microservice-based system. We propose DeepScaling, which introduces three innovative components to adaptively refine the target CPU utilization of servers in the data center, and we maintain it at a stable value to meet SLO constraints while using minimum amount of system resources. First, DeepScaling forecasts workloads for each service using a Spatio-temporal Graph Neural Network. Secondly, it estimates CPU utilization with a Deep Neural Network, considering factors such as periodic tasks and traffic. Finally, it uses a modified Deep Q-Network (DQN) to generate an autoscaling policy that controls service resources to maximize service stability while meeting SLOs. Evaluation of DeepScaling in Ant Group’s large-scale cloud environment shows that it outperforms state-of-the-art autoscaling approaches in terms of maintaining stable performance and resource savings. The deployment of DeepScaling in the real-world environment of 1900+ microservices saves the provisioning of over 100,000 CPU cores per day, on average.
Published in: IEEE/ACM Transactions on Networking ( Volume: 32, Issue: 5, October 2024)
Page(s): 3961 - 3976
Date of Publication: 31 May 2024

ISSN Information:

Funding Agency:


References

References is not available for this document.