Autonomic decentralized elasticity based on a reinforcement learning controller for cloud applications

doi:10.1016/j.future.2018.11.049

Future Generation Computer Systems

Volume 94, May 2019, Pages 765-780

https://doi.org/10.1016/j.future.2018.11.049 Get rights and content

Highlights

•
we propose and implement a reinforcement learning-based controller that is able to respond to volatile and complex arrival patterns through a set of simple states and actions.
•
The controller is implemented within a distributed architecture.
•
The controller is able to not only scale up quickly to meet rising demand but also scale down by shutting down excess servers to save on ongoing costs.

Abstract

Web applications have stringent performance requirements that are sometimes violated during periods of high demand due to lack of resources. Infrastructure as a Service (IaaS) providers have made it easy to provision and terminate compute resources on demand. However, there is a need for a control mechanism that is able to provision resources and create multiple instances of a web application in response to excess load events. In this paper, we propose and implement a reinforcement learning-based controller that is able to respond to volatile and complex arrival patterns through a set of simple states and actions. The controller is implemented within a distributed architecture that is able to not only scale up quickly to meet rising demand but also scale down by shutting down excess servers to save on ongoing costs. We evaluate this decentralized control mechanism using workloads from real-world use cases and demonstrate that it reduces SLA violations while minimizing cost of provisioning infrastructure.

Introduction

Cloud computing enables allocation of computing resources on demand, and therefore provides us with the opportunity to avoid over-provisioning and under-provisioning. Over-provisioningrefers to the situation where we may employ estimated maximum required resources. This can lead to generate idle resources in low demand periods and increase the cost. On the other hand, under-provisioning refers to scarcity of computing resources during unexpected load spikes that can result in poor performance. Consequently, we may need a resource controller in order to acquire and release the computing resources based on demand.

Web applications workload can be divided into two mainclasses: (i) transactional (e.g. e-commerce websites); and (ii) batch (e.g. text mining, video transcoding, and graphical rendering). Transactional workloads are exemplified by web applications that aim to deal with online users over HTTP.

Web applications, in general, conceptually follow an $N$ -tier [1] (or $N$ -layer) model where the users interact with the presentation layer, and the data is stored in the persistence layer, composed of a database instance, and the data access libraries. In between these two layers, lies one or more stateless logic layers that are responsible for the processing of requests. In practice, the presentation and logic layers are hosted in web application servers, while the databases are separately managed on another set of servers. In the cloud computing model, these servers are virtual machines [2] with different hardware and software configurations (e.g. number of CPU cores, amount of memory, speed of storage and operating system).

User experience for such a system is impacted heavily by its response time, or the time required to process a user request by the server and return the response. Schurman and Brutlag [3] have demonstrated that increasing the response time from 200 ms to 500 ms reduces revenue by 1%. Enterprise users establish service level agreements (SLAs) with web application owners that specify a target in terms of worst-case latency for a specific population of requests. For example, a target response time (e.g. 200 ms.) may be specified on the 95th percentile of a set of requests ordered by increasing response time. Missing this target requires the owner to pay a penalty to the user. The other important factor in the overall profit is the cost of infrastructure In the cloud model, this cost is the rental fee for using the resources that is levied on a per-time period (hourly, per-minute, etc.) by the provider.

Cloud providers provide tools to automatically instantiate or remove resources for a particular user. However, the incoming workload is dynamic and load spikes are unpredictable. A reactive provisioning strategy may provision resources after a load spike has occurred which leads to poor performance and increased resource costs. This motivates the development of a resource provisioning controller that determines: (i) how well an application is performing in a server; (ii) whether the present amount of resources are sufficient for the incoming workload; and (iii) whether the resources are utilized efficiently.

We aim to design a controller to deliver a high performance for a large number of applications distributed among several servers. A centralized controller would need to monitor multiple applications hosted on various servers. This makes the controller complex [4] since state of the applications and the servers may change quickly. A heavy controller may not be able to react in time to adverse events due to the large amount of information required to be processed to select a desirable action. The other drawback of a centralized controller is that there is a considerable risk of system failure due to failure of the controller.

This paper describes a decentralized architecture for provisioning and managing resources for meeting SLA targets on web applications while keeping the cost of infrastructure to a minimum. In the following section, we first discusses the challenges and requirements for designing a resource controller in the cloud environment. Then, in Section 3, we describe the design and architecture of our proposed solution called Autonomic Decentralized Elasticity Controller (ADEC). Section 4 discusses the actual implementation of our system. Section 5 describes the experiments and their results. Finally, we compare our system to existing solutions in literature and conclude the paper.

Section snippets

Problem description

Fig. 1 illustrates the scenario that we are tackling in this paper. IaaS users are able to instantiate virtual machines using images that contain predefined web application hosting environments. A web application server is able to simultaneously host multiple applications with different requirements but is constrained by the resources of the underlying virtual machine. Therefore, a user may start multiple instances to host different applications.

An increase in the arrival rate for one of the

Architecture

We employ a decentralized architecture in which each server is responsible for maintaining the performance of its own-hosted applications. However, the servers take actions such that fulfill the requirements of the whole system, such as reducing overall cost of the system. There have been many research efforts about dynamic resource allocation in a shared pool of resources [8], [9]. However, allocating computing resources in the cloud environment is a different scenario. The general idea in

Implementation

ADEC has been implemented in Java and targeted towards cloud environments that allow provision and release computing resources through an Application Programming Interface (API). ADEC needs at least two machines. One machine is responsible for load balancing and routing the requests to the back-end servers. Others are employed as back-end servers. Hence, ADEC requires the following preparation phases: (i) an instance in the cloud which contains the load balancer; and (ii) another to employ the

Evaluation

In this section, we evaluate our results and compare the ADEC with a system from literature (Unity) and a commercial (Amazon autoscaling service) solution. We first describe the experimental setup and the results of the experiments, and discuss our interpretation of the results.

Related work

Automated resource provisioning, which refers to the type of systems that allocate or release resources on demand automatically, has been extensively studied in literature [30]. A number of solutions have been introduced by academia and cloud providers. Prominent among these are model-based solutions in which application performance model is constructed to predict the required number of application instances in order to meet the QoS requirements of applications based on the incoming workload.

Conclusion

In this paper, we focused on the problem of dynamic resource provisioning for web applications hosted on the cloud. The first contribution of this work is the utility function that enables the system to specify a reasonable trade-off between cost and performance in provisioning resources. We have employed a decentralized architecture which makes the system robust against the centralized controller failure. Allocating resources through local controller instead of a centralized controller allows

Acknowledgments

This research is partially supported by National Nature Science Foundation of China with project ID 61672136, 61828202.

Seyed Mohammad Reza Nouri was a graduate student when he finished most of this research at University of New South Wales, his research interests include scheduling in elastic cloud computing and software engineering.

References (44)

SuttonR.S.
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
DasS. et al.
Elastras: an elastic, scalable, and self-managing transactional database for the cloud
ACM Trans. Database Syst.
(2013)
MenasceD.
Web server software architectures
IEEE Internet Comput.
(2003)
SmithJ.E. et al.
The architecture of virtual machines
Computer
(2005)
SchurmanE. et al.
The user and business impact of server delays, additional bytes, and http chunking in web search
Velocity Web Performance and Operations Conference
(2009)
LiH. et al.
Using reinforcement learning for controlling an elastic web application hosting platform
Proceedings of the 8th ACM international conference on Autonomic computing
(2011)
UrgaonkarB. et al.
Dynamic provisioning of multi-tier internet applications
Autonomic Computing, 2005. ICAC 2005. Proceedings. Second International Conference on
(2005)
KarveA. et al.
Dynamic placement for clustered web applications
Proceedings of the 15th international conference on World Wide Web
(2006)
MorrisR. et al.
Variance of aggregated Web traffic
INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, Vol. 1
(2000)
WalshW. et al.
Utility functions in autonomic systems
Autonomic Computing, 2004. Proceedings. International Conference on
(2004)

TesauroG.

Online resource allocation using decompositional reinforcement learning

AAAI, Vol. 5

(2005)

MenasceD.A. et al.

Performance by Design: Computer Capacity Planning by Example

(2004)

HellersteinJ.L. et al.

Feedback Control of Computing Systems

(2004)

TesauroG.

Reinforcement learning in autonomic computing: a manifesto and case studies

IEEE Internet Comput.

(2007)

LittmanM.L. et al.

Reinforcement learning for autonomic network repair

null

(2004)

DowlingJ. et al.

Building autonomic systems using collaborative reinforcement learning

Knowl. Eng. Rev.

(2006)

TesauroG. et al.

A hybrid Reinforcement Learning Approach to Autonomic Resource Allocation

Autonomic Computing, 2006. ICAC ’06. IEEE International Conference on

(2006)

JohnsonR.

J2EE development frameworks

Computer

(2005)

LevyR. et al.

Performance management for cluster based web services

TesauroG. et al.

Utility-function-driven resource allocation in autonomic systems

Autonomic Computing, 2005. ICAC 2005. Proceedings. Second International Conference on

(2005)

GhanbariH. et al.

Optimal autoscaling in a iaas cloud

Proceedings of the 9th international conference on Autonomic computing

(2012)

WangQ. et al.

When average is not average: large response time fluctuations in n-tier systems

Proceedings of the 9th international conference on Autonomic computing

(2012)

Cited by (48)

Integrating OpenAI Gym and CloudSim Plus: A simulation environment for DRL Agent training in energy-driven cloud scaling
2024, Simulation Modelling Practice and Theory
Experimentation in real cloud environments for training Deep Reinforcement Learning (DRL) agents can be costly, time-consuming, and non-repeatable. To overcome these limitations, simulation-based approaches are promising alternatives. This paper introduces a specialized simulation environment that integrates OpenAI Gym, a popular platform for reinforcement learning, with CloudSim Plus, a versatile cloud simulation framework. The proposed simulator specifically focuses on the case study of energy-driven cloud scaling. By leveraging the strengths of both Python-based OpenAI Gym and Java-based CloudSim Plus, the simulation environment offers a flexible and extensible platform for DRL-Agent training. The integration is facilitated through a gateway that enables seamless interaction between the two frameworks. The simulation environment is designed to support the training process of DRL agents, enabling them to tackle the complexities of cloud scaling in an energy-aware context. It provides configurable settings that represent various cloud scaling scenarios, allowing researchers to explore different parameter configurations and evaluate the performance of DRL agents effectively. Through extensive experimentation, the proposed simulation environment demonstrates its functionality and applicability in measuring the performance of DRL agents with respect to energy-driven cloud scaling. The results obtained from the case study validate the effectiveness and potential of the simulation environment for training DRL agents in cloud scaling scenarios. Overall, this work presents a novel simulation environment that bridges the gap between DRL-Agent training and cloud scaling challenges, offering researchers a valuable tool for advancing the field of energy-driven cloud scaling through reinforcement learning.
Enhancing energy efficiency in cloud scaling: A DRL-based approach incorporating cooling power
2023, Sustainable Energy Technologies and Assessments
The rapid growth of cloud computing significantly boosts energy usage, driven mainly by CPU operations and cooling. While cloud scaling efficiently allocates resources for changing workloads, current energy-driven methods often prioritize energy metrics combined with throughput, execution time, or SLA compliance, neglecting cooling power’s influence on energy consumption. To bridge this gap, we propose a deep reinforcement learning (DRL)-based autoscaler that considers cooling power as a critical factor for decision-making. Our approach employs DRL to dynamically adjust cloud resources, aiming to maximize energy efficiency and meet performance objectives. DRL, unlike RL, uses neural networks to handle the extensive state–action space in cloud scaling, overcoming the challenge of limited memory capacity for storing Q-values. In this study, we evaluate the performance of our proposed solution through a simulation-based experiment. We compare the performance of the proposed DRL-based autoscalers against an RL-based autoscaler. Our findings indicate that the DDQN-based autoscaler consistently outperforms other algorithms by maintaining optimal Power Usage Effectiveness (PUE) levels and improving task execution speed during high workloads. In contrast, the DQN-based autoscaler excels at sustaining optimal PUE levels during lower task loads, with a faster convergence rate at a scaling factor of 2 compared to scaling factor 1.
Multi-search-routes-based methods for minimizing makespan of homogeneous and heterogeneous resources in Cloud computing
2023, Future Generation Computer Systems
Citation Excerpt :
Current scheduling algorithms in Cloud computing include local search algorithm, heuristic algorithm, meta-heuristic algorithm, randomization, machine learning algorithm and hybrid algorithm. Machine Learning in scheduling algorithms mainly contains three types that deep learning (DL) such as DREP [23] and DLSC [24], reinforcement leaning (RL) such as QEEC [25], unified reinforcement learning (URL) [26], adaptive reinforcement learning (ARL) [27] and ADEC [28], as well as Deep reinforcement learning (DRL) such as DRM_Plus [16], A3C RL [29], MDRL [30], DPM [31], DQTS [32] and DQN [15]. A local search algorithm is to select the neighbor solution according to a strategy by comparing the current solution with the neighbor solution, where neighborhood structure and neighborhood selection (search route) are the basic components.
Cloud computing, as a large-scale distributed computing system dynamically providing elastic services, is designed to meet the requirement of delivering computing services to users as subscription-oriented services. In general, the problems of resource scheduling in Cloud computing like minimizing makespan are usually NP-Hard problems. Various common algorithms including heuristic, meta-heuristic and machine learning are applied in resource scheduling of Cloud computing to obtain the solutions, which however are still probable and imperative to be optimized. Through innovatively applying heuristic algorithms namely LPT (Longest Processing Time) and BFD (Best Fit Decreasing) as the basic search routes and integrating these with neighborhood search algorithm namely OneStep, this paper proposes multi-search-routes-based algorithms containing LPT-OneStep, BFD-OneStep and their combinations for the sake of enhancing theoretical performance and improving solutions of scheduling schemes especially for problems of minimizing makespan for homogeneous and heterogeneous resources. Theoretical derivations prove that the proposed algorithms possess better theoretical approximation ratios for $P | | C_{m a x}$ . Extensive experiments on simulation environment demonstrate the proposed algorithms outperform than corresponding compared algorithms for minimizing makespan problems in both homogenous resources and heterogeneous resources, which validates the superiority of the proposed algorithms.
Growable Genetic Algorithm with Heuristic-based Local Search for multi-dimensional resources scheduling of cloud computing
2023, Applied Soft Computing
Multi-Dimensional Resources Scheduling Problem (MDRSP, usually a multi-objective optimization problem) has attracted focus in the management of large-scale cloud computing systems as the collaborative operation of various devices in the cloud affects resource utilization and energy consumption. Effective management of the cloud requires a higher performance method to solve MDRSP. Considering the complex coupling between multi-dimensional resources and focusing on virtual machines allocation, we propose GGA-HLSA-RW (GHW, a novel family of genetic algorithms) to optimize the utilization and energy consumption of the cloud. In GGA-HLSA-RW, we add a growth stage to the genetic algorithm and construct a Growable Genetic Algorithm (GGA) using the Heuristic-based Local Search Algorithm (HLSA) with Random multi-Weights (RW) as the growth route. Based on the GHW, we propose GHW-NSGA II and GHW-MOEA/D by applying the sorting strategies and population regeneration mechanism of NSGA II and MOEA/D. To evaluate the performance of GHW, we carry out extensive experiments on the simulation dataset and AzureTraceforPacking2020 for the problems of minimizing the maximum utilization rate of resources for each dimension and minimizing total energy consumption. Experiment results demonstrate the advantages of growth strategy and dimensionality reduction strategy of GHW, as well as validate the applicability and optimality of GHW in realistic cloud computing. The experiments also demonstrate our proposed GHW-NSGA II and GHW-MOEA/D have better convergence rates and optimality than state-of-the-art NSGA II and MOEA/D.
A Q-learning approach for the autoscaling of scientific workflows in the Cloud
2022, Future Generation Computer Systems
Citation Excerpt :
And finally, the reward function defined in our proposal consider both makespan and execution cost. Some other works are focused on vertical scaling [20], accelerating the learning process [21,22] or combining RL with Neural Networks [16,23,24] or Fuzzy Logic [20,25] (to manage complex state spaces). Our proposal is focused on sequential learning, as we are exploring the impact of other elements in the learning process (i.e., states, actions and reward).
Autoscaling strategies aim to exploit the elasticity, resource heterogeneity and varied prices options of a Cloud infrastructure to improve efficiency in the execution of resource-hungry applications such as scientific workflows. Scientific workflows represent a special type of Cloud application with task dependencies, high-performance computational requirements and fluctuating workloads. Hence, the amount and type of resources needed during workflow execution changes dynamically over time. The well-known autoscaling problem comprises (i) scaling decisions, for adjusting the computing capacity of a virtualized infrastructure to meet the current demand of the application and (ii) task scheduling decisions, for assigning tasks to specific acquired Cloud resources for execution. Both are highly complex sub-problems, even more because of the uncertainty inherent to the Cloud. Reinforcement Learning (RL) provides a solid framework for decision-making problems in stochastic environments. Therefore, RL offers a promising perspective for designing Cloud autoscaling strategies based on an online learning process. In this work, we propose a novel formulation for the problem of infrastructure scaling in the Cloud as a Markov Decision Process, and we use the Q-learning algorithm for learning scaling policies, while demonstrating that considering the specific characteristics of workflow applications when taking autoscaling decisions can lead to more efficient workflow executions. Thus, our RL-based scaling strategy exploits the information available about workflow dependency structures. Simulations performed on four well-known workflows demonstrate significant gains (25%–55%) of our proposal in comparison with a similar state-of-the-art proposal.
Towards Dynamic Request Updating with Elastic Scheduling for Multi-Tenant Cloud-Based Data Center Network
2024, IEEE Transactions on Network Science and Engineering

View all citing articles on Scopus

Han Li, his research interests include scheduling in elastic cloud computing and software engineering. He was a Ph.D. candidate at University of New South Wales when he finished part of this research.

Srikumar Venugopal,is now a research scientist at IBM Ireland Research, his research interests include cloud computing, distributed computing and Software Engineering. He has more than 20 journal/conference publications in related areas.

Wenxia Guo, is a Ph.D. candidate in the School of Information and Software Engineering of University of Electronic Science and Technology of China (UESTC), her research interests include approximation algorithms for NP hard problems and resource scheduling algorithms for Cloud computing and Bigdata processing.

Mingyun He, is an assistant professor in the School of Information and Software Engineering at University of Electronic Science and Technology of China (UESTC). His research interests include algorithms and systems for Cloud computing and Artificial Intelligence.

Wenhong Tian, is now a professor at University of Electronic Science and Technology of China (UESTC). His research interests include scheduling in Cloud Computing and Bigdata platforms, and image recognition by deep learning. He has more than 40 journal/conference publications and 5 books in related areas. He is IEEE fellow.

¹: S. M. R. Nouri, H. Li, and S. Venugopal conducted this research while at University of New South Wales, Australia.

View full text

Autonomic decentralized elasticity based on a reinforcement learning controller for cloud applications

Highlights

Abstract

Introduction

Section snippets

Problem description

Architecture

Implementation

Evaluation

Related work

Conclusion

Acknowledgments

ACM Trans. Database Syst.

Web server software architectures

IEEE Internet Comput.

The architecture of virtual machines

Computer

The user and business impact of server delays, additional bytes, and http chunking in web search

Velocity Web Performance and Operations Conference

Using reinforcement learning for controlling an elastic web application hosting platform

Proceedings of the 8th ACM international conference on Autonomic computing

Dynamic provisioning of multi-tier internet applications

Autonomic Computing, 2005. ICAC 2005. Proceedings. Second International Conference on

Dynamic placement for clustered web applications

Proceedings of the 15th international conference on World Wide Web

Variance of aggregated Web traffic

INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, Vol. 1

Utility functions in autonomic systems

Autonomic Computing, 2004. Proceedings. International Conference on

Online resource allocation using decompositional reinforcement learning

AAAI, Vol. 5

Performance by Design: Computer Capacity Planning by Example

Feedback Control of Computing Systems

Reinforcement learning in autonomic computing: a manifesto and case studies

IEEE Internet Comput.

Reinforcement learning for autonomic network repair

null

Building autonomic systems using collaborative reinforcement learning

Knowl. Eng. Rev.

A hybrid Reinforcement Learning Approach to Autonomic Resource Allocation

Autonomic Computing, 2006. ICAC ’06. IEEE International Conference on

J2EE development frameworks

Computer

Performance management for cluster based web services

Utility-function-driven resource allocation in autonomic systems

Autonomic Computing, 2005. ICAC 2005. Proceedings. Second International Conference on

Optimal autoscaling in a iaas cloud

Proceedings of the 9th international conference on Autonomic computing

When average is not average: large response time fluctuations in n-tier systems

Proceedings of the 9th international conference on Autonomic computing