On time-sensitive revenue management in green data centers

https://doi.org/10.1016/j.suscom.2017.01.002Get rights and content

Highlights

  • We study the previously known algorithms and conclude that these online algorithms have provable poor performance against their worst-case scenarios.

  • We design a randomized algorithm to schedule jobs in the data centers and prove the algorithm's expected competitive ratio.

  • Our algorithm is theoretical-sound and it outperforms the previously known algorithms in many settings using both real traces and simulated data.

  • An optimal offline algorithm is also implemented as an empirical benchmark.

Abstract

In this paper, we design an analytically and experimentally better online energy and job scheduling algorithm with the objective of maximizing net profit for service providers in green data centers. We first study the previously known algorithms and conclude that these online algorithms have provable poor performance in their worst-case scenarios. To guarantee an online algorithm's performance in hindsight, we design a randomized algorithm to schedule energy and jobs in the data centers and prove the algorithm's expected competitive ratio in a special setting. Our algorithm is theoretical-sound and it outperforms the previously known algorithms in many settings using both real traces and simulated data. An optimal offline algorithm is also provided as an empirical benchmark.

Introduction

A data center is a computing facility used to house computer systems and associated components such as communication and storage subsystems. Usually, a data center stores data and provides computing facilities to its customers. Through charging fees for data access and computing services, a data center gains revenue [6]. At the same time, to maintain its running structure, a data center has to pay operational costs, including hardware costs (such as of upgrading computing and storage devices and air conditioning facilities), electrical bills for power supply, network connection costs, in addition to personnel costs. To maximize a data center's net profit, we need to increase the revenue collected and decrease the operational cost paid concurrently.

The ever increasing power costs and energy consumption in data centers have brought with many serious economic and environmental problems to our society and evoked significant attention recently. As reported, the energy consumption of all data centers consisted of 10% of the total U.S. energy consumption in 2006 and has increased 56% over the past five-year period [4]. The estimates of annual power cost for U.S. data centers in 2010 reached as high as 3.3 billion dollars [1]. As an example, in a modern high-scale data center with 45,000 to 50,000 servers, more than 70% of its operational cost (around half a billion dollars per year) [15] goes to maintaining the servers and providing power supply. Considering both economic and environmental factors, academic researchers and industrial policy makers have investigated revenue management policies and engineering solutions to make data centers work better without sacrificing service qualities and environment sustainability.

A growing trend of reducing energy costs as well as protecting environments is to fuel a data center using renewable energy such as wind power and solar power. Renewable energy has been adopted in multiple fields such as wireless sensors [27] and smart grids [28]. We term this type of energy as “green energy” as it comes from renewable and non-polluting sources. Unfortunately, the amount of green energy is usually intermittent, limited and cannot be fully predicted in a long term. Another type of energy, called “brown energy”, comes from the available electrical power grid in which the power is produced by carbon-intensive means. We would like to minimize the usage of brown energy, although its supply is usually regarded unlimited. A data center with both green energy and brown energy supplies is called a green data center. Due to economic concerns and technical difficulties, no battery is assumed to be available to store any surplus green energy [7].

In this paper, we consider a job and energy scheduling problem in green data centers. The ultimate goal is to optimize green and brown energy usage without sacrificing service qualities. Our work is built upon the study by Goiri et al. [13]. In this problem, jobs arrive at a data center over time. We design a revenue management algorithm whose task is to determine whether, when and which machines to schedule a job request from customers. Committing and finishing a job earns the service provider some revenue. Note that in completing a job, different ways of designating machines, types of energy, and time intervals may result in different operational costs. We target on the following question: How to dispatch jobs and schedule energy to maximize the net profit achieved by a data center's service provider? Recall that the information on later-released jobs and future generated green energy is in general unknown beforehand. What we study in this paper can be regarded as an online version of a multiple machine scheduling problem.

To evaluate an online scheduling algorithm's performance, we address two metrics from two perspectives. In theory, we use competitive ratio [8] to measure an online algorithm's worst-case performance against an adversarial clairvoyant. Competitive analysis has been used widely to analyze online algorithms in computer science and operations research. In practice, we conduct experiments using both real traces and simulated data. The crux of our algorithm's idea is to introduce ‘randomness’ in scheduling energy and jobs. As what we will see in the remaining parts of this paper, randomness helps both theoretically and empirically, particularly in adversarial settings.

In data centers, a service provider is regarded as a resource provider who provides a set of machines that will be shared and used by the data centers’ clients. The clients, regarded as resource consumers, have their jobs processed and in turn, pay the service provider for the service they get. The service provider's revenue management has the objective of maximizing its net profit, defined as the difference between the revenue collected from the clients and the operational costs charged to maintain the computing system. Here the operational costs do not include those for upgrading systems, paying personnel, or training operators.

We model the service provider's revenue management as a job and energy scheduling problem. The components of a computing system within a data center is pictured in Fig. 1 and we introduce each of them in details as below.

Time is discrete. A service provider has M machines (also called nodes) to schedule jobs. At any time, a node can process at most one job. To make these machines function, electrical power resource is consumed at the time when jobs are being executed.

Clients (customers) release jobs to be processed. Jobs arrive over time in an online manner. At a time, some (may be 0) jobs arrive. Each job j has an integer arriving time (also called release time) rj+, an integer deadline dj+, an integer processing time pj+, and an integer node requirement qj+. It takes pj units of time to complete job j. Running one job may require more than one nodes to be simultaneously active at a time. The node requirement qj (≥1) indicates the number of nodes that a job j needs when it is being executed. The total machine resource requirement for a job j is thus qj · pj. Jobs may or may not be executed within a consecutive time interval and we call these settings as job non-preemptive setting and job preemptive setting respectively.

The clients pay to the service provider for their service received. In general, the payoff depends on the job's machine resource requirement. For each job that has been completed within the data center, the client pays for a fee proportional to the job's resource requirement. We assume that a client pays $c · qj · pj upon completion by its deadline and $0 upon no completion by its deadline. Here c is a service charging rate, for instance, as what is specified by Amazon EC2  [6].

Energy is consumed along the course of nodes executing jobs. There are two types of energy resources: green energy and brown energy. Usually, a system is able to predict green energy quantity only within a 48-hour scheduling window. In [13], a scheduling window was defined as a time interval of 48 hours, which was further divided into time slots with a length of 15 minutes. In general, the brown energy supply is assumed unlimited.

Different types of energy cost vary over time. We assume that green energy costs us price $0 per machine time slot. While brown energy's unit-cost is time-sensitive and thus it is a variable related to on-peak/off-peak time periods. A unit of brown energy has price $Bd when at on-peak times (usually at daytime) and price $Bn when at off-peak times (usually at nighttime). This assumption is the most commonly used one in modeling brown electricity pricing [13]. For example, the prices charged by an integrated generation and energy service company in New Jersey [13] are $0.13/kWh and $0.08/kWh at on-peak (from 9 am to 11 pm) and at off-peak (from 11 pm to 9 am) respectively.

Scheduling jobs successfully can earn the service provider some revenue. However, if we pay for the brown energy used in additional to the limited green energy to power the data centers to complete jobs, we have to pay an electrical bill as our operational cost. We definenetprofit=revenueoperationalcost,where revenue is the total job value that we gain through finishing jobs and operational cost is the total brown energy cost that the service provider consumes to run these machines. The objective of revenue management for a service provider within green data centers is to design a scheduler to complete all or part of the released jobs in order to maximize net profit. We call this problem GDC-RM, standing for “Green Data Center's Revenue Management”.

In the remaining parts of this paper, we present a combinatorial optimization algorithm for GDC-RM. As in general the job arriving information is unknown beforehand, GDC-RM is essentially an online decision-making problem. For reference, notations used in this paper are summarized in Table 1.

People have worked on how to use green energy in green data centers in an efficient and effective manner. Although green energy has the advantages of being cost-effective and environmental-friendly, there is a challenge in using it due to their daily seasonal variability. Another challenge is due to customers’ workload fluctuations [16]. There could exist a mismatch between the green energy supply and the workload's energy demand in the time axis—for example, a heavy workload arrives when the green energy supply is low. One previous solution is to “bank” green energy in batteries or on the grid itself [7] for later possible use. However, this approach incurs huge energy lost and high additional maintenance cost [7]. Thus, an online matching of workload and green energy is demanded for green data centers.

The research on scheduling energy and jobs in an online manner has attracted a lot of attentions. Two data center settings have been considered: (1) centralized data centers [13], [14], [17], [5], [20], and (2) geographically distributed data centers [23], [9], [21], [30], [18], [19]. The objectives to optimize are usually classified as (a) to maximize green energy consumption [17], [13], [14], [30], [18], [19], [5]; (b) to minimize brown energy consumption or cost [9], [13], [14], [22], [21]; and (c) to maximize profits [12]. In addition, some researchers incorporated dynamic prices of brown energy [13], [14], [26] in their problem models.

Among the research on centralized data centers, Goiri et al. [13] proposed a greedy parallel batch job scheduler for a data center powered by solar energy with the goal of maximizing green energy power consumption. They further integrated green scheduling in Hadoop [14]. Krioukov et al. [17] studied data intensive applications and proposed a scheduling algorithm with the goal of maximizing green energy consumption while satisfying job deadlines. Aksanli et al. [5] developed a green-aware scheduling algorithm for both online service and batch jobs aiming at improving green energy usage. Liu et al. [22] studied workload and cooling management with the goal to reduce brown energy costs. The algorithms underlying these solutions are known as First-Fit and Best-Fit. For an arriving job, the First-Fit algorithm finds the earliest available time slots to schedule the job according to its resource requirements, while the Best-Fit algorithm locates the most cost-efficient time slots to schedule the job. The First-Fit algorithm, in general, ignores the cost difference in scheduling jobs at various time intervals. The Best-Fit algorithm picks up the best time interval to schedule a job in a myopic way and it does not take later job arrivals or energy supplies into account. Different from the previous study, our research in this paper is to find an ideal tradeoff between these two algorithms by introducing randomness. We prove the algorithm's theoretical bound in a special setting and also demonstrate the performance improvement in various environments.

Research on geographical data centers focuses on distributing the workload among distributed data centers in order to consume the available free green energy or relative cheaper brown energy at other data centers. Chen et al. [9] proposed a centralized scheduler that migrates workload across geographical data centers according to the green energy supply at different data centers. Lin et al. [21] proposed online algorithms for scheduling workloads across geographical data centers with the goal to minimize total energy cost. Although the proposed algorithm did reduce the energy cost but the total energy consumption increased. Liu et al. [23] further studied how the geographical load balancing and the proportional brown energy pricing scheme could help encourage the use of green energy and reduce the use of brown energy. Zhang et al. [30] and Le et al. [18], [19] researched on scheduling online services across multiple data centers to maximize green energy consumption.

Although geographical data centers have become popular nowadays for big companies such as Google and Amazon, a small centralized data center is still important since as reported, numerous small and medium-sized companies are the main contributors to the energy consumed by data centers [4]. On one hand, small data centers owned by small organizations usually have less efficient energy management strategies compared to those big companies. On the other hand, the sizes of small and medium data centers are of numerous amount. These data centers in small or medium-sized companies may range from a few dozen servers housed in a machine room to several hundreds of servers housed in a large enterprise installation. Therefore, there is a huge impact in studying the profit maximization problem for centralized data centers.

Most of the prior work focuses on either maximizing green energy consumption or minimizing brown energy consumption/cost except [12], which studied the net profit maximization problem for centralized data center service providers. Actually, there is a trade-off between the minimization of energy expenditure and the maximization of net profit. Ghamkhari and Rad [12] proposed a systematic approach to maximize green data center's profit with a stochastic assumption on the workload. The workload that they studied is restricted to online service requests with variable arrival rates. In this paper, we make no assumptions over the workload's stochastic property and we allow the workloads to include a batch job which requests to be simultaneously executed on multiple nodes. In addition, we incorporate varying brown energy prices in our model.

Section snippets

Algorithms

For the offline version of the GDC-RM problem in which we have all input information including the future released jobs and later generated green energy beforehand, it is computationally hard as shown in Appendix A.

In reality, job scheduling in data centers is essentially an online problem. For the problem GDC-RM, we first discuss two widely-used heuristic online algorithms First-Fit and Best-Fit and analyze their limitations. Then we propose a randomized algorithm Random-Fit. We conduct

Performance evaluation

In this section, we evaluate the randomized online algorithm Random-Fit against two deterministic online algorithms First-Fit and Best-Fit which have been revised and adopted in previous literature. An offline algorithm is also developed, though its running time is tedious when the input size is large. The algorithms are implemented under both the job preemption setting and job non-preemption setting. For ease of presentation in the figures below, we abbreviate the First-Fit algorithm, the

Conclusions

In this paper we study online scheduling of energy and jobs on multiple machines in a green data center with the objective of maximizing net profits of service providers. This decision-making problem involves three questions: (1) whether to admit a job, (2) when to schedule this job, and (3) which machines and which type of energy designated to run it. In our problem setting, costs are time-sensitive and so is the net profit. Previous work employs deterministic approaches only and the

Acknowledgements

This material is based upon work supported by US NSF under Grant Nos. CCF-0915681 and CCF-1216993, and by National Natural Science Foundation of China under Grants 61373053 and 61572226. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF.

References (30)

  • H. Wang et al.

    Worst-case performance guarantees of scheduling algorithms maximizing weighted throughput in energy-harvesting networks

    Sustain. Comput.: Inform. Syst.

    (2014)
  • Energy Logic. Reducing Data Center Energy Consumption by Creating Savings that Cascade Across Systems....
  • Grid5000 Experimentation Platform....
  • UMass Amherst Computer Science Weather Station....
  • U. E. P. Agency

    Report to Congress on Server and Data Center Energy Efficiency Public Law 109-431

    (2007)
  • B. Aksanli et al.

    Utilizing green energy prediction to schedule mixed batch and service jobs in data centers

  • Amazon

    Amazon EC2 Pricing

    (2013)
  • R. Bianchini

    Leveraging renewable energy in data centers: present and future

  • A. Borodin et al.

    Online Computation and Competitive Analysis

    (1998)
  • C. Chen et al.

    Green-aware workload scheduling in geographically distributed data centers

  • T.H. Cormen et al.

    Introduction to Algorithms

    (2009)
  • M.R. Garey et al.

    Computers and Intractability: A Guide to the Theory of NP-Completeness

    (1979)
  • M. Ghamkhari et al.

    Energy and performance management of green data centers: a profit maximization approach

    IEEE Trans. Smart Grid

    (2013)
  • I. Goiri et al.

    Greenslot: scheduling energy consumption in green datacenters

  • I. Goiri et al.

    Greenhadoop: leveraging green energy in data-processing frameworks

  • Cited by (4)

    • GreenBDT: Renewable-aware scheduling of bulk data transfers for geo-distributed sustainable datacenters

      2018, Sustainable Computing: Informatics and Systems
      Citation Excerpt :

      On one hand, considering the emission and increasing societal awareness of environmental issues, datacenter operators are under pressure to minimize the carbon footprint. As more and more countries begin to levy a tax on carbon emissions, a growing trend of reducing energy cost and protecting environment is to power a datacenter using renewable energy such as solar or wind power [5]. Tech giant companies, e.g. Google, Microsoft, Apple, and Amazon, are working towards sustainable and green datacenters, which are partially or completely powered by renewable energy resources.

    • Modeling, classifying and generating large-scale Google-like workload

      2018, Sustainable Computing: Informatics and Systems
      Citation Excerpt :

      The three main tools actually used are mathematical models, simulations and experimentations on real hardware. Contributions are new scheduling and methods of resource management in order to improve several metrics such as energy, performance, resilience, throughput or dynamism [2,3]. In most works, these contributions are evaluated with perfect hardware.

    • Competitive Analysis of the Online Leasing Problem for Scarce Resources

      2023, International Journal of Environmental Research and Public Health
    • Coordination Planning of Power Supply in Data Center Park Considering Multiple Resources

      2022, Dianli Xitong Zidonghua/Automation of Electric Power Systems

    Part of this work has been published in the proceedings of IGSC’15 [29].

    View full text