Holistic thermal-aware workload management and infrastructure control for heterogeneous data centers using machine learning
Introduction
Two percent of power consumption in the United States in 2014 was due to data centers, equivalent to approximately 70 billion kWh [1]. In contrast, the power consumption of data centers in 2000 was 30 billion kWh [2]. It has been estimated that from 2015 to 2020, the incoming load to data centers will double [3]. The increasing number of online and mobile applications, public interest to access cyber entertainment, and cloud services for both personal and business users have a significant role in this jump [4]. Anticipating this increase, in addition to power usage constraints, have led large data center operators and vendors to invest more in the efficient use of power [1].
There are several methods and techniques to reduce power consumption at different levels of a data center. At the device level, some electronic devices support low power states to save energy, if the performance of the device is not impacted [5], [6]. For example, dynamic voltage and frequency scaling (DVFS) is a method that provides different levels of power consumption and performance for processors [7], [8]. At the server level, dynamic suspension of unneeded servers, server consolidation, and the ability to choose different levels of power and performance are vital approaches for energy efficiency. For instance, server consolidation aims to save power by turning unneeded servers off during low workload periods [9], [10], [11]. At the facility level, power efficiency of the cooling system itself is also a significant concern [12], [13], [14].
Different servers and locations in data centers are not cooled equally, resulting in what we call data center thermal heterogeneity. In other words, servers are different in their cooling requirements (server heterogeneity), and locations are also different in their cooling cost (cooling heterogeneity). Cooling heterogeneity refers to the fact that from a particular cooling unit, all locations in a data center do not benefit to the same degree. Related works in the literature have either simplified or ignored heterogeneity that exists in the data center environment when studying workload assignment or cooling control. We have studied the cost-saving opportunities that exist due to server heterogeneity during workload assignment [15], and also due to cooling heterogeneity [16], however no study has considered all aspects of data center thermal heterogeneity to control cooling unit parameters and assign workload.
In this paper, a holistic data center infrastructure control (HDIC) framework is presented. HDIC is a novel method to exploit all aspects of data center thermal heterogeneity and uses them as an opportunity to save power during data center control. The proposed framework employs neural networks to construct thermal models for the data center and individual servers. Server thermal models are used to estimate the core temperature of servers, and a data center thermal model is used to predict the inlet temperatures of servers. These have the attraction of being data-driven models, as building accurate physical models for data center thermal dynamics is notoriously tricky.
The generated thermal models incorporate both cooling and server heterogeneity. These models can then be used by an optimizer to control the system in a power-efficient manner. We demonstrate that the solutions to the underlying optimization problem lead to considerable power savings while maintaining IT performance. Our contributions in this paper can be summarized as follows:
- •
We incorporate low complexity data-driven thermal models to take thermal heterogeneity in data centers into account during workload assignment and cooling control.
- •
We present an optimization framework that can jointly optimize the assignment of workload and the operational parameters of the cooling unit(s), while respecting the expected performance of IT equipment.
- •
We show advantages of using thermal differences between servers and locations in a data center using thermal models.
In the next section, related work is classified and reviewed. In Section 3, the architecture of the system under study is illustrated and the required models to formulate the problem are explained in Section 3.2. In Section 4, the methodology for cooling control and workload assignment is discussed and techniques to optimize the data center control parameters are explained. The solution of the developed optimization problems is discussed in Section 5, and HDIC is compared with other representative methods. Finally, concluding remarks are in Section 6. A summary of the notation used in this paper is listed in Table 1.
Section snippets
Literature review
There is a significant literature on this topic, studying various control methods, workload assignment frameworks, and thermal models for data centers. In this section, a number of previous works related to our contributions are reviewed: data center thermal models, thermal-aware workload assignment frameworks and thermal-aware control methods.
There are various methods of temperature prediction for data centers (data center thermal models). Computational fluid dynamics (CFD) is a traditional
System architecture and models
In this section, the architecture of the data center under study is provided. The steps to acquire data and then to build data center and server thermal models are explained and the power consumption model is formulated.
Thermal-aware cooling control and workload assignment
Exploring data center thermal heterogeneity is possible through thermal models. In this section, two different approaches are discussed to be compared later as a demonstration of the efficiency of HDIC.
In the first approach, cooling heterogeneity is only considered via the data center thermal model. This approach is called cooling heterogeneity-aware infrastructure control or CHIC. The second approach is HDIC which uses both the data center and server thermal models for control decisions.
Results and comparison
Both optimization problems , must be solved by nonlinear solution methods as both the cost function and the thermal models are nonlinear. We used the Matlab fmincon tool with the interior-point option to solve this optimization problem. A complete description of the data center configuration is illustrated in Section 3.1. Briefly, for this data center configuration, the decision variables are the utilizations of 40 servers, the speed of five fans, and one inlet water temperature. Due to the
Conclusion
Considering all aspects of data center thermal heterogeneity for workload assignment and cooling control can result in a considerable amount of savings in cooling power consumption. Data center heterogeneity can be obtained by means of data center and server thermal models. The data center thermal model predicts the temperature of different locations as a function of IT and cooling parameters. This thermal model is used to indirectly calculate the cost of providing cool air for a specific
CRediT authorship contribution statement
SeyedMorteza MirhoseiniNejad: Conceptualization, Methodology, Software, Investigation, Writing - original draft. Ghada Badawy: Methodology, Writing - review & editing, Supervision, Funding acquisition. Douglas G. Down: Methodology, Writing - review & editing, Supervision, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This research was supported by a Collaborative Research and Development grant CRDPI506142-16 from the Natural Sciences and Engineering Research Council of Canada (NSERC) . We would like to acknowledge the useful comments of the anonymous referees.
SeyedMorteza MirhoseiniNejad is a Ph.D. student in Computer Science at McMaster University. He received his M.Sc. degree from Iran University of Science and Technology and his B.Sc. degree from Bahonar University of Kerman, both in Computer Engineering. His research interests are machine learning, optimization, queueing theory, data analysis, resource management, and predictive control and maintenance.
Email: [email protected].
References (40)
- et al.
Joint data center cooling and workload management: A thermal-aware approach
Future Gener. Comput. Syst.
(2020) - et al.
Real-time temperature predictions in IT server enclosures
Int. J. Heat Mass Transfer
(2018) - et al.
Thermosim: Deep learning based framework for modeling and simulation of thermal-aware resource management for cloud computing environments
J. Syst. Softw.
(2020) - et al.
A smart coordinated temperature feedback controller for energy-efficient data centers
Future Gener. Comput. Syst.
(2019) - et al.
Optimization based resource and cooling management for a high performance computing data center
ISA Trans.
(2019) - et al.
Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers
Comput. Netw.
(2009) - et al.
Artificial intelligence techniques for sizing photovoltaic systems: A review
Renew. Sustain. Energy Rev.
(2009) - et al.
United States Data Center Energy Usage ReportTech. rep.
(2016) - et al.
Report to Congress on Server and Data Center Energy Efficiency Public Law 109-431Tech. rep.
(2007) - et al.
Thermal-aware hybrid workload management in a green datacenter towards renewable energy utilization
Energies
(2019)
Data Center Energy Efficiency Investments: Qualitative Evidence from Focus Groups and InterviewsTech. rep.
Using low-power modes for energy conservation in ethernet LANs
Design of one-transistor SRAM cell for low power consumption
Dynamic voltage and frequency scaling enhanced task scheduling technologies toward green cloud computing
Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters
Power management of online data-intensive services
ACM SIGARCH Comput. Archit. News
Dynamic right-sizing for power-proportional data centers
IEEE/ACM Trans. Netw.
[Napsac]: Design and implementation of a power-proportional web cluster
SIGCOMM Comput. Commun. Rev.
Energy-efficient thermal-aware task scheduling for homogeneous high-performance computing data centers: A cyber-physical approach
IEEE Trans. Parallel Distrib. Syst.
Cool job allocation: Measuring the power savings of placing jobs at cooling-efficient locations in the data center
Cited by (24)
Design and performance analysis of modern computational storage devices: A systematic review
2024, Expert Systems with ApplicationsThe sustainability benefits of economization in data centers containing chilled water systems
2023, Resources, Conservation and RecyclingA comprehensive review on deep learning algorithms: Security and privacy issues
2023, Computers and SecurityA time-varying state-space model for real-time temperature predictions in rack-based cooling data centers
2023, Applied Thermal EngineeringServer temperature prediction using deep neural networks to assist thermal-aware scheduling
2022, Sustainable Computing: Informatics and SystemsCitation Excerpt :Finally, DVFS is also employed to save energy wastage. Recently, MirhoseiniNejad et al. [28] study the relationship between heterogeneity in terms of cooling and server capacity. The proposed system integrates neural network model to forecast the inlet temperatures during the workload distribution.
SeyedMorteza MirhoseiniNejad is a Ph.D. student in Computer Science at McMaster University. He received his M.Sc. degree from Iran University of Science and Technology and his B.Sc. degree from Bahonar University of Kerman, both in Computer Engineering. His research interests are machine learning, optimization, queueing theory, data analysis, resource management, and predictive control and maintenance.
Email: [email protected].
Dr. Ghada Badawy is an Adjunct Assistant Professor at the Computing and Software department and a Principal Research Engineer at the Computing Infrastructure Research Center (CIRC) at McMaster University. Before joining CIRC she worked at BlackBerry as an Advanced networks connectivity researcher where she has led multiple video over Wi-Fi and peer to peer research projects and authored multiple patents. She has also worked as a Postdoctoral fellow at McMaster University and Ryerson University and as a senior software engineer at IBM. Ghada received her Ph.D. degree in Computer Engineering from McMaster University.
Email: [email protected]
Douglas G. Down received his B.A.Sc. and M.A.Sc. degrees from the University of Toronto (1986 and 1990) and his Ph.D. from the University of Illinois at Urbana-Champaign (1994). His interests lie in performance evaluation and resource allocation in distributed computer systems. He is currently the Academic Director of the Computing Infrastructure Research Centre at McMaster University, Canada.
Email: [email protected]