Profile-based power-aware workflow scheduling framework for energy-efficient data centers

https://doi.org/10.1016/j.future.2018.11.010Get rights and content

Highlights

  • The concept of power-aware Application Profiles is highlighted through a motivational case study.

  • A power-aware framework for efficient placement of application workloads in a data center is presented.

  • A scheduling algorithm for Application Profile matching is developed based on criteria including CPU, memory, IO, and power consumption requirements.

  • Results from experimental and simulation studies show the effectiveness of the proposed framework.

Abstract

In the age of big data, software-as-a-service (SaaS) clouds provide heterogeneous and multitenant utilization of underlying virtual environments in data centers. Real-time and parallel deployment of applications with data-intensive workloads of various sizes pose challenges in optimal resource scheduling, power utilization, task completion time, network latency, and so on, causing degradation in the quality of service and affecting the user experience. In this paper, we investigate the role of application profiles in addressing the tradeoff between performance and energy efficiency of small- to medium-scale data centers. A power-aware framework for efficient placement of application workloads in the data center is proposed. The framework considers various application workflow constraints, such as CPU, memory, network I/O, and power consumption requirements to develop realistic profiles of application workloads. A system model for the efficient workflow assignment in the data center using a novel scheduler algorithm is presented. The performance of the proposed scheduler is validated through simulation studies. We compare the proposed scheduler with two scheduling algorithms: robust time cost (RTC) and heterogeneous earliest finish time (HEFT). Results show that the proposed scheduler is 19% and 38% more energy efficient than RTC and HEFT, respectively for medium–large sized workloads.

Introduction

In today’s economy, the demand for everyday Internet usage is skyrocketing. Cloud-computing technology enables software-as-a-service (SaaS) over the Internet using the infrastructure in data centers. The large demand of services is escalating the growth of data centers, which predictably affects the demand in energy consumption, consequently increasing the operational costs of data centers [1]. A recent report [2] shows that 1.1% to 1.5% of the total power utilized in the world accounts for the data center energy utilization. In the US alone, 91 billion kWh of electricity were consumed in 2015 by data centers, which accounts for about 1.8% of the total US power consumption [3], [4]. In the same year, 416.2 TWh of electricity were utilized by data centers worldwide. The power consumption of information technology (IT) directly contributes to a larger carbon footprint.

Since 2013, various energy-efficiency measures have been considered to reduce the power consumption in data centers. Such measures include the design and manufacturing of energy-efficient servers, optimal placement of servers in data centers, efficient deployment of power distribution units, energy-efficient air management and cooling systems, and effective use of virtualization technologies [3], [5], [6]. Most of these improvements have been adopted by various hyper-scale data centers owned by IT giants, such as Facebook, Google, and Amazon. In contrast, small- to medium-scale data centers are usually owned by small enterprises, universities, private-sector businesses, and government organizations. These data centers are typically deployed as private cloud infrastructures and provide services to a limited number of clients. According to [7], only 5% of the power consumption related to data centers worldwide is due to hyper-scale data centers; the remaining 95% relates to small- and medium-scale data centers. The NRDC reported that energy management in small- to medium-scale data centers with consistent workloads is more significant than the dynamic nature of the workload in hyper-scale data centers. Furthermore, in these kinds of data centers, the workflows submitted are of homogeneous nature where similar kinds of applications are executed by a small group of clients in a multi-tenant environment.

In small- to medium-scale data centers, server consolidation provides a mechanism for efficient usage of server utilization [8], [9]. As a server consolidation technology, virtualization reduces the underutilization of servers by allowing the multitenancy of applications per physical server, thus maximizing the efficient use of space and reducing the energy, hardware, operational, and deployment costs. However, energy-aware mechanisms for deployment of tasks with varying workloads in virtualized environments is challenging. Zheng et al. [10] presented a distributed traffic-flow consolidation algorithm for distributing workloads in the data center. The proposed algorithm considers the consolidation of traffic flows into a small set of links and switches, shutting off the unused resources. They noted that, with the added complexity of the decentralized approach, they achieve a similar energy-performance tradeoff compared to centralized approaches. Wu et al. [11] proposed a light-weight Virtual Machine (VM) migration algorithm that considers the server utilization threshold to determine workload scheduling in a cluster. The use of a threshold can be controversial since the dynamic workloads can alter the utilization of various servers over a period; therefore, one threshold value may not provide an optimal solution. Wang et al. [12] used integer programming to model the ownership costs of VMs per physical machine (PM). They showed that the complexity of the proposed model has no effect on the performance of the consolidations. Shaw et al. [13] noted that VM consolidation increases the average response time of tasks, negatively affecting the energy-performance tradeoff. They proposed a heuristic approach for a restrictive VM consolidation approach. In addition to the results from the work, deploying an energy-efficient solution is complex, unpredictable, and might degrade the performance of an energy-efficient data center.

To address this challenging issue, we take inspiration from the concept of application profiles (AP) presented in [14]. In this work, based on the size of an application workload, a certain number of VMs are provisioned and deployed on the PMs in the data center. The new energy-management framework proposed in this paper utilizes realistic profiles of application workloads to achieve a greener and more energy-efficient data center while considering the utilization of resources and performance constraints. The framework devises a three-layer architecture: (i) Application Profile layer (APL), (ii) Virtual Machine layer (VML), and (iii) Physical Machine layer (PML). At the APL, APs are kept that contain application details along with the workload, estimated runtime, and resource requirements. The VML considers VM setup parameters, such as the number of CPU cores, memory assignment, and storage allocation. It is also responsible for VM placement, deployment, and migration on PMs. The PML considers on/off operations on PMs, temperature considerations, and dynamic voltage and frequency scaling (DVFS).

The work presented in this paper reviews the current work in VM placement on PMs in data centers. We focus on small- to medium-scale data centers routinely deployed in small organizations and universities. A common characterization of these data centers is the low variability and high certainty in application workloads, resulting in a near constant number of VMs. Due to infrequent variability in data workloads, the policy of hosting a certain number of VMs per PM is rarely updated, and usually, no adjustments are made [15]. A system model for the workflow assignment in the data center using a novel scheduler algorithm is presented. The performance of the proposed scheduler is validated through simulation studies. We compare the proposed scheduler with two scheduling algorithms, namely stochastic heterogeneous earliest finish time (HEFT) [16] and robust time cost (RTC) [17]. Results show that the proposed scheduler is 19% and 38% more energy efficient than RTC and HEFT, respectively for medium to large sized workloads.

The contributions of this paper are in three-fold:

  • The concept of power-aware APs is highlighted through a motivational case study. A realistic workload is created using SentiStrength [18] and is processed on a Hadoop cluster using various configurations of VM deployment per PM. Results from the experimental testbed are used to devise a mechanism for defining power-aware APs.

  • A power-aware framework is proposed for the efficient placement of application workloads in a virtualized data center. The framework utilizes the APs to compute the cost of executing a workflow in the data center, based on the power consumption requirements. A heuristic based scheduling algorithm for AP matching is developed based on criteria including CPU, memory, IO, and power consumption requirements. The run time complexity of the proposed approach is similar to RTC and HEFT schedulers.

  • Extensive simulation studies are carried out to evaluate the proposed framework. The results from the scheduler are compared to the RTC and HEFT schedulers for nine different scenarios. Results show that the proposed algorithm is more efficient in terms of energy utilization.

The rest of the paper is organized as follows. Section 2 provides the background and related works. Section 3 details a motivational case study for the power efficiency of a data center, building the case for the proposed framework based on APs. Section 4 presents details for the proposed power-aware framework. Section 5 presents detailed experimental evaluations followed by the conclusions and future directions in Section 6.

Section snippets

Related works

This section presents an overview of related works in the area of energy efficient workflow scheduling strategies for data centers.

Motivational case study

The VM placement problem is a thoroughly investigated area in cloud computing. Many algorithms have been proposed and developed to optimize the various proposals and techniques [9], [12], [17], [22], [33], [38], [39], [40], [41], [42], [43], [45], [48]. A major facet for research in power-aware placement of VMs is reducing the power consumption of PMs, increasing the efficiency of the data center by tuning into parameters, such as CPU utilization, memory and I/O utilization, and the

Power-aware workflow scheduling framework

The research problem addressed in this paper focuses on optimization of a data center energy utilization. The concept of APs is used to place VMs in the cluster while maintaining a healthy tradeoff between task execution times and power consumption. In this section, we detail an energy-management framework utilizing realistic profiles of workflows with various application workloads to achieve a greener and more energy-efficient data center, while considering the utilization of resources and

Evaluation

This section presents a detailed experimental evaluation of the proposed scheduling algorithm. We choose to compare the proposed algorithm with two scheduling algorithms, namely stochastic HEFT [16] and RTC [17]. We modify the HEFT and RTC algorithms to enable dynamic workflow. This is to allow these algorithms to schedule all tasks in the workflow immediately as a new workflow arrives. Since both algorithms do not consider profiles, we modify the RTC algorithm to include the price factor of a

Conclusions

A significant research problem in cloud computing is finding a tradeoff between power efficiency while maintaining high performance efficiency. In this paper, we provide a detailed case study using various workloads to highlight the inefficient power workflow scheduling in Hadoop. We exploit the concept of building profiles for applications with certain workloads executing in small- to medium-scale data centers. A profile-based energy-efficient framework is proposed with a novel scheduler that

Acknowledgment

This work is partially supported by the Robotics and Internet of Things Lab in the Research and Innovation Center at Prince Sultan University, Saudi Arabia .

Basit Qureshi received his Ph.D. degree in computer science from University of Bradford in the year 2011. Prior to that he received his Master of Science degree in Computer Science from Florida Atlantic University in 2002 and his Bachelor of Science degree in Computer Science from Ohio University, OH USA in 2000. His research interests include Trust, Security and privacy issues in Wireless Networks, Robotics and Smart Cities applications. He is a member of IEEE, IEEE Computer Society, IEEE

References (56)

  • R. Miller, Data centers efficiency will yield $60 billion in savings, Data Center Frontier, (2016 Jun.). [Online]....
  • HossainM.S. et al.

    A belief rule based expert system for datacenter PUE prediction under uncertainty

    IEEE Trans. Sustain. Comput.

    (2017)
  • WanJ. et al.

    Joint cooling and server control in data centers: a cross-layer framework for holistic energy minimization

    IEEE Syst. J.

    (2017)
  • J. Whitney, P. Delforge, Scaling up energy efficiency across the data center industry: evaluating key drivers and...
  • VarastehA. et al.

    Server consolidation techniques in virtualized data centers: a survey

    IEEE Syst. J.

    (2017)
  • S.B. Shaw, J.P. Kumar, A.K. Singh, Energy-performance trade-off through restricted VM consolidation in cloud data...
  • ZhengK. et al.

    DISCO: distributed traffic flow consolidation for power efficient data center network

  • WuX. et al.

    An energy efficient VM migration algorithm in data centers

  • WangB. et al.

    Mathematical programming for server consolidation in cloud data centers

  • F. Alharbi, Y.C. Tain, M. Tang, T.K. Sarker, Profile-based static vm placement for energy-efficient data center, in:...
  • QureshiB. et al.

    Countering the collusion attack with a multidimensional decentralized trust and reputation model

    Springer J. Multimed. Tools Appl.

    (2013)
  • PoolaD. et al.

    Robust scheduling of scientific workflows with deadline and budget constraints in clouds

  • ThelwallM. et al.

    Sentiment strength detection in short informal text

    J. Am. Soc. Inf. Sci. Technol.

    (2010)
  • FellerE. et al.

    Performance and energy efficiency of big data applications in cloud environments: a hadoop case study

    J. Parallel Distrib. Comput.

    (2015)
  • ZhouZ. et al.

    Bilateral electricity trade between smart grids and green datacenters: pricing models and performance evaluation

    IEEE J. Sel. Areas Comm.

    (2016)
  • LiC. et al.

    Oasis: scaling out datacenter sustainably and economically

    IEEE Trans. Parallel Distrib. Syst.

    (2017)
  • TiwariN. et al.

    Identification of critical parameters for mapreduce energy efficiency using statistical design of experiments

  • AlMudarraF. et al.

    Issues in adopting agile development principles for mobile cloud computing applications

  • Cited by (41)

    • Deadline-constrained energy-aware workflow scheduling in geographically distributed cloud data centers

      2022, Future Generation Computer Systems
      Citation Excerpt :

      Then, due to the relative advantages of virtual machine migration and the efficiency of migrating each virtual machine, the virtual machine with the most visible advantage in efficient virtual machine migration is selected as the migration target. Qureshi [35] has developed an energy-aware solution to effectively place application workloads in the data center. He studied the function of the application profile in solving the trade-off between energy and performance while considering the CPU, memory, network I/O, and required energy constraints to develop the actual application workload profile.

    • Development of an adaptive artificial neural network model and optimal control algorithm for a data center cyber–physical system

      2022, Building and Environment
      Citation Excerpt :

      Hardware-based strategies include the efficient arrangement of IT equipment and cooling systems [8,9], use of containment systems [10–12], introduction of high-efficiency IT equipment and cooling systems [4,13–16], improvement of airflow distribution systems [17–19], and outdoor air utilization using economizer mode [20–22]. Software-based approaches include efficient IT resource distribution [23–25], IT resource and cooling system scheduling [26–28], and optimal system control [29–31]. Both types of strategies effectively conserve energy.

    • Efficient scientific workflow scheduling for deadline-constrained parallel tasks in cloud computing environments

      2020, Information Sciences
      Citation Excerpt :

      In their framework, voltage and frequency can be scaled intelligently to match the features of tasks and processors. A profile-based efficient power-aware framework was presented by Qureshi [16] to achieve a good tradeoff among the cost of virtual machines (VMs), CPU utilization, load balance, and power usage in data centers. Several characteristics of application workflow requirements that involve CPU, memory size, network bandwidth, and power budget constraints are considered while assigning application workloads to the cloud center.

    View all citing articles on Scopus

    Basit Qureshi received his Ph.D. degree in computer science from University of Bradford in the year 2011. Prior to that he received his Master of Science degree in Computer Science from Florida Atlantic University in 2002 and his Bachelor of Science degree in Computer Science from Ohio University, OH USA in 2000. His research interests include Trust, Security and privacy issues in Wireless Networks, Robotics and Smart Cities applications. He is a member of IEEE, IEEE Computer Society, IEEE Communication Society and ACM.

    View full text