Autonomic decentralized elasticity based on a reinforcement learning controller for cloud applications

https://doi.org/10.1016/j.future.2018.11.049Get rights and content

Highlights

  • we propose and implement a reinforcement learning-based controller that is able to respond to volatile and complex arrival patterns through a set of simple states and actions.

  • The controller is implemented within a distributed architecture.

  • The controller is able to not only scale up quickly to meet rising demand but also scale down by shutting down excess servers to save on ongoing costs.

Abstract

Web applications have stringent performance requirements that are sometimes violated during periods of high demand due to lack of resources. Infrastructure as a Service (IaaS) providers have made it easy to provision and terminate compute resources on demand. However, there is a need for a control mechanism that is able to provision resources and create multiple instances of a web application in response to excess load events. In this paper, we propose and implement a reinforcement learning-based controller that is able to respond to volatile and complex arrival patterns through a set of simple states and actions. The controller is implemented within a distributed architecture that is able to not only scale up quickly to meet rising demand but also scale down by shutting down excess servers to save on ongoing costs. We evaluate this decentralized control mechanism using workloads from real-world use cases and demonstrate that it reduces SLA violations while minimizing cost of provisioning infrastructure.

Introduction

Cloud computing enables allocation of computing resources on demand, and therefore provides us with the opportunity to avoid over-provisioning and under-provisioning. Over-provisioningrefers to the situation where we may employ estimated maximum required resources. This can lead to generate idle resources in low demand periods and increase the cost. On the other hand, under-provisioning refers to scarcity of computing resources during unexpected load spikes that can result in poor performance. Consequently, we may need a resource controller in order to acquire and release the computing resources based on demand.

Web applications workload can be divided into two mainclasses: (i) transactional (e.g. e-commerce websites); and (ii) batch (e.g. text mining, video transcoding, and graphical rendering). Transactional workloads are exemplified by web applications that aim to deal with online users over HTTP.

Web applications, in general, conceptually follow an N-tier [1] (or N-layer) model where the users interact with the presentation layer, and the data is stored in the persistence layer, composed of a database instance, and the data access libraries. In between these two layers, lies one or more stateless logic layers that are responsible for the processing of requests. In practice, the presentation and logic layers are hosted in web application servers, while the databases are separately managed on another set of servers. In the cloud computing model, these servers are virtual machines [2] with different hardware and software configurations (e.g. number of CPU cores, amount of memory, speed of storage and operating system).

User experience for such a system is impacted heavily by its response time, or the time required to process a user request by the server and return the response. Schurman and Brutlag [3] have demonstrated that increasing the response time from 200 ms to 500 ms reduces revenue by 1%. Enterprise users establish service level agreements (SLAs) with web application owners that specify a target in terms of worst-case latency for a specific population of requests. For example, a target response time (e.g. 200 ms.) may be specified on the 95th percentile of a set of requests ordered by increasing response time. Missing this target requires the owner to pay a penalty to the user. The other important factor in the overall profit is the cost of infrastructure In the cloud model, this cost is the rental fee for using the resources that is levied on a per-time period (hourly, per-minute, etc.) by the provider.

Cloud providers provide tools to automatically instantiate or remove resources for a particular user. However, the incoming workload is dynamic and load spikes are unpredictable. A reactive provisioning strategy may provision resources after a load spike has occurred which leads to poor performance and increased resource costs. This motivates the development of a resource provisioning controller that determines: (i) how well an application is performing in a server; (ii) whether the present amount of resources are sufficient for the incoming workload; and (iii) whether the resources are utilized efficiently.

We aim to design a controller to deliver a high performance for a large number of applications distributed among several servers. A centralized controller would need to monitor multiple applications hosted on various servers. This makes the controller complex [4] since state of the applications and the servers may change quickly. A heavy controller may not be able to react in time to adverse events due to the large amount of information required to be processed to select a desirable action. The other drawback of a centralized controller is that there is a considerable risk of system failure due to failure of the controller.

This paper describes a decentralized architecture for provisioning and managing resources for meeting SLA targets on web applications while keeping the cost of infrastructure to a minimum. In the following section, we first discusses the challenges and requirements for designing a resource controller in the cloud environment. Then, in Section 3, we describe the design and architecture of our proposed solution called Autonomic Decentralized Elasticity Controller (ADEC). Section 4 discusses the actual implementation of our system. Section 5 describes the experiments and their results. Finally, we compare our system to existing solutions in literature and conclude the paper.

Section snippets

Problem description

Fig. 1 illustrates the scenario that we are tackling in this paper. IaaS users are able to instantiate virtual machines using images that contain predefined web application hosting environments. A web application server is able to simultaneously host multiple applications with different requirements but is constrained by the resources of the underlying virtual machine. Therefore, a user may start multiple instances to host different applications.

An increase in the arrival rate for one of the

Architecture

We employ a decentralized architecture in which each server is responsible for maintaining the performance of its own-hosted applications. However, the servers take actions such that fulfill the requirements of the whole system, such as reducing overall cost of the system. There have been many research efforts about dynamic resource allocation in a shared pool of resources [8], [9]. However, allocating computing resources in the cloud environment is a different scenario. The general idea in

Implementation

ADEC has been implemented in Java and targeted towards cloud environments that allow provision and release computing resources through an Application Programming Interface (API). ADEC needs at least two machines. One machine is responsible for load balancing and routing the requests to the back-end servers. Others are employed as back-end servers. Hence, ADEC requires the following preparation phases: (i) an instance in the cloud which contains the load balancer; and (ii) another to employ the

Evaluation

In this section, we evaluate our results and compare the ADEC with a system from literature (Unity) and a commercial (Amazon autoscaling service) solution. We first describe the experimental setup and the results of the experiments, and discuss our interpretation of the results.

Related work

Automated resource provisioning, which refers to the type of systems that allocate or release resources on demand automatically, has been extensively studied in literature [30]. A number of solutions have been introduced by academia and cloud providers. Prominent among these are model-based solutions in which application performance model is constructed to predict the required number of application instances in order to meet the QoS requirements of applications based on the incoming workload.

Conclusion

In this paper, we focused on the problem of dynamic resource provisioning for web applications hosted on the cloud. The first contribution of this work is the utility function that enables the system to specify a reasonable trade-off between cost and performance in provisioning resources. We have employed a decentralized architecture which makes the system robust against the centralized controller failure. Allocating resources through local controller instead of a centralized controller allows

Acknowledgments

This research is partially supported by National Nature Science Foundation of China with project ID 61672136, 61828202.

Seyed Mohammad Reza Nouri was a graduate student when he finished most of this research at University of New South Wales, his research interests include scheduling in elastic cloud computing and software engineering.

References (44)

  • SuttonR.S.

    Integrated architectures for learning, planning, and reacting based on approximating dynamic programming

  • DasS. et al.

    Elastras: an elastic, scalable, and self-managing transactional database for the cloud

    ACM Trans. Database Syst.

    (2013)
  • MenasceD.

    Web server software architectures

    IEEE Internet Comput.

    (2003)
  • SmithJ.E. et al.

    The architecture of virtual machines

    Computer

    (2005)
  • SchurmanE. et al.

    The user and business impact of server delays, additional bytes, and http chunking in web search

    Velocity Web Performance and Operations Conference

    (2009)
  • LiH. et al.

    Using reinforcement learning for controlling an elastic web application hosting platform

    Proceedings of the 8th ACM international conference on Autonomic computing

    (2011)
  • UrgaonkarB. et al.

    Dynamic provisioning of multi-tier internet applications

    Autonomic Computing, 2005. ICAC 2005. Proceedings. Second International Conference on

    (2005)
  • KarveA. et al.

    Dynamic placement for clustered web applications

    Proceedings of the 15th international conference on World Wide Web

    (2006)
  • MorrisR. et al.

    Variance of aggregated Web traffic

    INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, Vol. 1

    (2000)
  • WalshW. et al.

    Utility functions in autonomic systems

    Autonomic Computing, 2004. Proceedings. International Conference on

    (2004)
  • TesauroG.

    Online resource allocation using decompositional reinforcement learning

    AAAI, Vol. 5

    (2005)
  • MenasceD.A. et al.

    Performance by Design: Computer Capacity Planning by Example

    (2004)
  • HellersteinJ.L. et al.

    Feedback Control of Computing Systems

    (2004)
  • TesauroG.

    Reinforcement learning in autonomic computing: a manifesto and case studies

    IEEE Internet Comput.

    (2007)
  • LittmanM.L. et al.

    Reinforcement learning for autonomic network repair

    null

    (2004)
  • DowlingJ. et al.

    Building autonomic systems using collaborative reinforcement learning

    Knowl. Eng. Rev.

    (2006)
  • TesauroG. et al.

    A hybrid Reinforcement Learning Approach to Autonomic Resource Allocation

    Autonomic Computing, 2006. ICAC ’06. IEEE International Conference on

    (2006)
  • JohnsonR.

    J2EE development frameworks

    Computer

    (2005)
  • LevyR. et al.

    Performance management for cluster based web services

  • TesauroG. et al.

    Utility-function-driven resource allocation in autonomic systems

    Autonomic Computing, 2005. ICAC 2005. Proceedings. Second International Conference on

    (2005)
  • GhanbariH. et al.

    Optimal autoscaling in a iaas cloud

    Proceedings of the 9th international conference on Autonomic computing

    (2012)
  • WangQ. et al.

    When average is not average: large response time fluctuations in n-tier systems

    Proceedings of the 9th international conference on Autonomic computing

    (2012)
  • Cited by (48)

    • Multi-search-routes-based methods for minimizing makespan of homogeneous and heterogeneous resources in Cloud computing

      2023, Future Generation Computer Systems
      Citation Excerpt :

      Current scheduling algorithms in Cloud computing include local search algorithm, heuristic algorithm, meta-heuristic algorithm, randomization, machine learning algorithm and hybrid algorithm. Machine Learning in scheduling algorithms mainly contains three types that deep learning (DL) such as DREP [23] and DLSC [24], reinforcement leaning (RL) such as QEEC [25], unified reinforcement learning (URL) [26], adaptive reinforcement learning (ARL) [27] and ADEC [28], as well as Deep reinforcement learning (DRL) such as DRM_Plus [16], A3C RL [29], MDRL [30], DPM [31], DQTS [32] and DQN [15]. A local search algorithm is to select the neighbor solution according to a strategy by comparing the current solution with the neighbor solution, where neighborhood structure and neighborhood selection (search route) are the basic components.

    • A Q-learning approach for the autoscaling of scientific workflows in the Cloud

      2022, Future Generation Computer Systems
      Citation Excerpt :

      And finally, the reward function defined in our proposal consider both makespan and execution cost. Some other works are focused on vertical scaling [20], accelerating the learning process [21,22] or combining RL with Neural Networks [16,23,24] or Fuzzy Logic [20,25] (to manage complex state spaces). Our proposal is focused on sequential learning, as we are exploring the impact of other elements in the learning process (i.e., states, actions and reward).

    View all citing articles on Scopus

    Seyed Mohammad Reza Nouri was a graduate student when he finished most of this research at University of New South Wales, his research interests include scheduling in elastic cloud computing and software engineering.

    Han Li, his research interests include scheduling in elastic cloud computing and software engineering. He was a Ph.D. candidate at University of New South Wales when he finished part of this research.

    Srikumar Venugopal,is now a research scientist at IBM Ireland Research, his research interests include cloud computing, distributed computing and Software Engineering. He has more than 20 journal/conference publications in related areas.

    Wenxia Guo, is a Ph.D. candidate in the School of Information and Software Engineering of University of Electronic Science and Technology of China (UESTC), her research interests include approximation algorithms for NP hard problems and resource scheduling algorithms for Cloud computing and Bigdata processing.

    Mingyun He, is an assistant professor in the School of Information and Software Engineering at University of Electronic Science and Technology of China (UESTC). His research interests include algorithms and systems for Cloud computing and Artificial Intelligence.

    Wenhong Tian, is now a professor at University of Electronic Science and Technology of China (UESTC). His research interests include scheduling in Cloud Computing and Bigdata platforms, and image recognition by deep learning. He has more than 40 journal/conference publications and 5 books in related areas. He is IEEE fellow.

    1

    S. M. R. Nouri, H. Li, and S. Venugopal conducted this research while at University of New South Wales, Australia.

    View full text