GPU Energy optimization based on task balance scheduling

doi:10.1016/j.sysarc.2020.101808

Journal of Systems Architecture

Volume 107, August 2020, 101808

https://doi.org/10.1016/j.sysarc.2020.101808 Get rights and content

Abstract

Graphics processing units (GPUs) can process massive amounts of data efficiently, but the complex computational demands of smart technologies have caused GPUs to consume increasing amounts of power. Moreover, current task scheduling strategies do not consider the loss of energy consumption due to task migration. To reduce GPU power usage, we proposed a dynamic GPU task balance scheduling called coefficient of balance and equipment history ratio value (CB-HRV) task scheduling. The CB-HRV task scheduling method was developed to reduce system energy consumption during task execution by allocating tasks based on workload balance, thereby achieving improved GPU energy use. The CB-HRV algorithm was shown to be more balanced, and it allowed the computing device to be utilized more reasonably and efficiently. To demonstrate the effectiveness of the proposed approach, we compared the energy consumption of the CB-HRV method with that of some common scheduling methods. The results showed that the CB-HRV task scheduling algorithm yielded an energy savings of 7.84%12.92% over existing methods.

Introduction

Machine learning increases the computational complexity and dramatically increases the power consumption of the entire graphics processing unit (GPU) involved. However, a challenge remains in how to decrease the amount of energy consumed by GPUs while optimizing their computing power. In fact, the effectiveness of scheduling schemes is critical in determining GPU power consumption. If the GPU cannot allocate tasks dynamically and rationally, this shortcoming can have a significant negative impact on the utilization and power efficiency of GPUs.

To reduce the power consumption of GPUs, scholars have proposed various strategies and models based on either static or dynamic scheduling, as detailed in Section 2 of this study. The efficiency of task scheduling performed by a GPU’s streaming multiprocessors (SMs) has a significant impact on energy consumption. However, current task scheduling strategies rarely consider the balance of the task scheduling strategy. The result is that some SM workloads in the GPU are overloaded, and others are not saturated. This can lead to a loss in the overall power consumption of GPUs.

The imbalance of GPU task scheduling is mainly due to two reasons: First, task migration often results from limiting energy consumption. Zeng et al. [1] described the task migration phenomenon in multiprocessors environment similar with GPU’s. Task migration causes system energy loss and can have a great impact on the stability and robustness of GPUs. Current task scheduling strategies often cannot reduce this migration phenomenon. Second, common task scheduling procedures rely on either SM-based device utilization or task-based features for energy optimization. For example, in [2], the author proposed a new algorithm for optimizing task scheduling dynamic time and local remaining execution time plane abstraction in a multiprocessor. This algorithm considers the processor’s execution capabilities but ignores the task-related characteristics of the processor and the cooperative relationship between these execution tasks and the SM. Ren et al. [3] proposed a workload-aware harmonic partition scheduling scheme for periodic probabilistic real-time tasks on multiple processors. This scheme sorts tasks based on workload and packs them into processors one by one. Although this solution considers the workload characteristics of executing tasks, it ignores the matching relationship between tasks and SMs in the processor. These strategies often fail to combine device utilization and task characteristics, reducing due effect of energy saving.

We proposed an approach based on task balancing and dynamic scheduling called the coefficient of balance and history ratio value (CB-HRV) task scheduling strategy. This scheduling strategy was based on the theory of load balancing and an algorithm for task scheduling. First, we analyzed the factors that influenced the amount of energy used for task scheduling in the GPU environment, then we abstracted the task balance impact factor (also known as coefficient of balance, or simply CB) and the streaming multiprocessor historical utilization values (HRVs) that affected task migration. We used this information to reduce the migration of tasks by balancing task assignments among the various SMs. Finally, we reduced task migration, thereby reducing the energy loss in the GPU. The algorithm combined the SM device utilization status and the task characteristics. By rationally assigning sorting tasks to sorting SMs, this method considered both the resource attributes of the SM and the migration characteristics of the tasks with respect to execution in the SM. This way, the balance-based GPU energy consumption optimization scheduling method was realized, and energy loss was reduced.

Our research contributed to the study of GPU energy optimization in the following ways:

(1)
We constructed an innovative task-balanced scheduling algorithm named CB-HRV. We used the algorithm to reduce the migration phenomenon in GPUs, thereby reducing the energy loss caused by that phenomenon.
(2)
We creatively combined the computing resource attributes (SM historical resource utilization) and task characteristics (CB values) to achieve better energy optimization strategies.
(3)
We constructed and implemented the algorithm framework and pseudocode of CB-HRV. The task scheduling strategy was optimized by sorting the task and sorting the utilization of the SMs to realize the comprehensive energy consumption optimization of the GPU.

The structure of the remaining portion of this study is as follows: In Section 2, we provide an overview of the current popular GPU task scheduling methods for energy consumption and analyze the advantages and disadvantages of these scheduling algorithms. In Section 3, we describe the energy consumption model for multi-SM in GPU and analyze the balance effect in task scheduling. In Section 4, we describe the proposed CB-HRV approach based on balance scheduling and demonstrate the energy consumption mechanism of the CB-HRV. We describe details about the implementation and pseudocode of the CB-HRV. In Section 5, we make an empirical comparison between the CB-HRV and three common task allocation scheduling algorithms, and show that the results verified the advantages of the proposed method. Our conclusion is given in Section 6.

Section snippets

Related work

Energy optimization through task scheduling in the GPU can be achieved using both hardware and software strategies.

Software-based energy optimization strategies are regarded as effective energy optimization tools because of their low hardware costs and relative ease of implementation. In general, software-based task scheduling strategies are divided into two categories: static scheduling methods, which allocate the required SM resources directly through programming, and dynamic scheduling

SM-Based energy consumption model

The computing component of the GPU is composed of multiple SMs and an L1 Cache. SMs are the core components and are composed of several high-speed pipelines to complete task calculations quickly. Since this module is responsible for high-speed computing functions, it consumes about 40% of the GPU’s power [11].

For our research, we used a GPU power optimization research scenario that was based on multi-SM work scenarios. Fig. 1 shows its architecture. Tasks entering the GPU are assigned to the

Proposed scheduling strategy: Methodology

In order to optimize the power consumption of the GPU, we proposed a dynamic GPU task balancing strategy called CB-HRV. The strategy was designed to achieve three goals: first, to avoid unnecessary migration of tasks in the SM; second, to use resource attributes and task characteristics comprehensively, and third, to employ the balance strategy to schedule tasks in SMs in a manner that would provide optimal energy consumption in the GPU.

Experiment and simulation

To validate the feasibility and effectiveness of our proposed method, we analyzed and compared the energy consumption performance and execution efficiency of the proposed CB-HRV method with that of three existing scheduling methods: RAD, DFB, and PHB.

Conclusion

We proposed a dynamic scheduling strategy called CB-HRV, which was designed to reduce GPU energy use through an innovative approach to GPU task balancing. First, we analyzed the factors that influenced energy use during task scheduling in the GPU environment, and then we abstracted the CB and SM HRV that affected task migration. We used these data to reduce task migration in scheduling. During task execution, the balance strategy scheduled the tasks in the SM so that GPU energy use was

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant No. 61772352; the Science and Technology Planning Project of Sichuan Province under Grant No. 2019YFG0400, 2018GZDZX0031, 2018GZDZX0004, 2017GZDZX0003, 2018JY0182, 19ZDYF1286.

Declaration of Competing Interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled ``GPU Energy Optimization Based on Task Balance Scheduling''.

Acknowledgements

We thank the anonymous editors for their linguistic assistance during the preparation of this manuscript. We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript.

Yanhui Huang received his B.S. and the M.S. degree in Radio Electronics and Computer Science from Sichuan University in 1997 and 2002 respectively. Currently he is a lecture in the school of computer science at Sichuan University. His current research interests include embedded real-time system and green computing.

References (20)

G. Zeng et al.
Energy-aware task migration for multiprocessor real-time systems
Future Gener. Comput. Syst.
(2016)
N.S. Pham et al.
Reduction of task migrations and preemptions in optimal real–time scheduling for multiprocessors by using dynamic t-l plane
J. Syst. Archit.
(2017)
A. Wicaksana et al.
Prototyping dynamic task migration on heterogeneous reconfigurable systems
Proceedings of RSP ’17
(2017)
J. Leng et al.
GPUWattch: enabling energy optimizations in GPGPUs
Proceedings of ISCA ’13
(2013)
J. Ren et al.
Workload-aware harmonic partitioned scheduling for fixed-priority probabilistic real-time tasks on multiprocessors
J. Syst. Archit.
(2019)
T. Li et al.
A power-aware symbiotic scheduling algorithm for concurrent GPU kernels
Proceedings of ICPADS 2015
(2015)
J. Li et al.
Low-energy kernel scheduling approach for energy saving
Proceedings of ICESS 2016
(2016)
S. Jin et al.
Preemption-aware kernel scheduling for GPUs
Proceedings of ISPA/IUCC 2017
(2017)
R. Mohammadi et al.
A dynamic special-purpose scheduler for concurrent kernels on GPU
Proceedings of ICCKE 2016
(2016)
S. Paul et al.
Dynamic task mapping and scheduling with temperature-awareness on network-on-chip based multicore systems
J. Syst. Archit
(2019)

There are more references available in the full text version of this article.

Cited by (5)

Energy aware fixed priority scheduling in mixed-criticality systems
2023, Computer Standards and Interfaces
Citation Excerpt :
In addition, these studies exploited the slack time generated from tasks to save energy while meeting deadline constraints. Recently, MC scheduling had attracted the attention of many researchers [1–3,21–24]. The first work about the MC scheduling problem had been studied in [1].
Most of studies about energy management for MC systems are based on dynamic priority scheme. The disadvantages of dynamic priority scheme are high system overhead and poor predictability. Unlike previous studies, we focus on the problem of scheduling mixed-criticality (MC) periodic tasks with minimizing energy consumption in MC systems based on fixed priority scheme. Firstly, we explain a criticality rate monotonic scheduling (CRMS) and propose the sufficient schedulability condition of CRMS. Secondly, we compute the energy minimization uniform scaled speed and present an optimal static solution algorithm based on CRMS. The extra workload of the high criticality level (HI) task executes with the maximum processor speed in the high criticality mode (HI-mode). But this algorithm does not exploit the slack time generated from the HI task in the low criticality mode (LO-mode). For energy efficiency, we propose a dynamic fixed priority energy minimization algorithm which exploits the slack time generated from the HI task in LO-mode to save energy. In addition, it combines a dynamic voltage and frequency scaling technique and a dynamic power management technique to reduce energy consumption. Finally, the experiments are applied to evaluate the performance of the proposed algorithm and the experimental results show that the proposed algorithm can save up 23.89% energy compared with other existing algorithms.
Energy efficient EDF-VD-based mixed-criticality scheduling with shared resources
2021, Journal of Systems Architecture
Citation Excerpt :
Many studies [25–28] employed DVFS techniques to dynamically adjust processor speed to save energy in a traditional real-time system. However, few studies focused on energy efficient scheduling in MC systems [29]. Huang et al. [10] first applied DVFS based on static slack time to reduce energy consumption in MC systems, but the energy saving was far from satisfactory.
In this paper, we consider simultaneously energy consumption and resource synchronization in mixed-criticality (MC) single processor systems. First, we give a feasibility analysis of single processor systems that execute real-time MC tasks. Second, an energy efficient speed based on sufficient feasibility condition is computed. In addition, we propose a single speed energy efficient algorithm for MC tasks with shared resources (ASS). The ASS algorithm scheduling tasks with $S_{L O}$ is too conservative with much room to save more energy. For energy efficiency, dual speeds scheduling algorithm (DSS) based on ASS algorithm is proposed. It schedules tasks at $S_{L O}$ with blocking and $S_{L O}^{'}$ without blocking. Blocking means that the higher priority tasks cannot preempt the execution of current low-priority tasks because low-priority tasks occupy the shared resources required for the higher priority tasks. Moreover, we prove that the DSS algorithm is feasible. Finally, the real-life synthetic application and extensive simulation are applied to validate the proposed algorithm. The experimental results show that the DSS algorithm can reduce energy consumption up to 11.82% compared with the existing approaches.
A two-layer optimal scheduling framework for energy savings in a data center for Cyber–Physical–Social Systems
2021, Journal of Systems Architecture
Citation Excerpt :
However, one common drawback of the empirical methods is that the methods estimate the energy consumption of the chiller and ignore some important system controllable variables, such as some variables of the coupling equipment, resulting in poor prediction accuracy and systems operation scheduling. Optimization methods for the parameter management of a cooling system include expert system-based methods [16–18], mixed integer linear or non-linear programming (MILP or MINP) and artificial intelligence (AI) methods. An online dynamic expert system-based scheduling strategy is applied to system-on-programmable chip (SOPC) based reconfigurable cluster systems, which can reconfigure or shut down field programmable gate arrays (FPGA) nodes based on workload changes to reduce runtime energy consumption [17].
In recent years, big data and data analytics based on Cyber–Physical–Social Systems (CPSS) have become increasingly popular in providing valued services to humans. For many applications of CPSS, adequate computing infrastructure, which can be realized using powerful data centers (DCs), is needed. These DCs can then provide CPSS application developers with flexible and efficient High-Performance-Computing-Communications services. In DCs, the energy consumption of the cooling system which dissipates the heat generated by information technology (IT) devices should be optimized. Since the cooling system is one of the main energy consumers of DCs, optimization of its energy consumption can drastically reduce the operating costs while maintaining stable operation of the IT devices by efficient heat dissipation. Therefore, there is continuing development on improving the performance of cooling systems for DCs using different optimization strategies. In particular, model-based optimization algorithms have had impressive advances, but their deployment in real physical systems often becomes difficult due to limited data, poor optimization efficiency, and potential operation risks. In this paper, we propose a two-layer optimal scheduling framework for room-level cooling of DCs. In the global layer, we use limited data to build a set of novel physically-based empirical models to achieve accurate system energy tracking. Then with defined equipment operating constraints, a genetic algorithm efficiently obtains the optimal plan of all equipment control while ensuring safe system operations. In the local layer, through interactions with the global layer, local precision air conditioners are regulated to stabilize the room temperature within a safe range. To test our solution in a real physical system, we deployed the two-layer optimal scheduling technique in the real DC cooling system of Postal Savings Bank of China in Hefei, China. Our solution achieved an impressive average reduction of 6.1% on cooling load factor.
Cluster equilibrium scheduling method based on backpressure flow control in railway power supply systems
2020, PLoS ONE
GPU-NEST: Characterizing Energy Efficiency of Multi-GPU Inference Servers
2020, IEEE Computer Architecture Letters

Bing Guo received his B.S. degree in Computer Science from the Beijing Institute of Technology in China, and M.S. and Ph.D. degrees in Computer Science from the University of Electronic Science and Technology of China, China, in 1991, 1999, and 2002, respectively. He is currently a Professor in the School of Computer Science at the Sichuan University, China. His current research interests include embedded real-time system and green computing.

Yan Shen received her M.S. degree in Mechatronics Engineering and Ph.D. degree in Measuring and Testing Technology and Instruments from University of Electronic Science and Technology of China in 2001 and 2004, respectively. Currently she is a professor in the Control Engineering College, Chengdu University of Information and Technology. Her main research interests include distributed measurement systems, embedded system development, wireless sensor networks, and robotics.

View full text

GPU Energy optimization based on task balance scheduling

Abstract

Introduction

Section snippets

Related work

SM-Based energy consumption model

Proposed scheduling strategy: Methodology

Experiment and simulation

Conclusion

Funding

Declaration of Competing Interest

Acknowledgements

Future Gener. Comput. Syst.

J. Syst. Archit.

Workload-aware harmonic partitioned scheduling for fixed-priority probabilistic real-time tasks on multiprocessors

J. Syst. Archit.

A power-aware symbiotic scheduling algorithm for concurrent GPU kernels

Proceedings of ICPADS 2015

Low-energy kernel scheduling approach for energy saving

Proceedings of ICESS 2016

Preemption-aware kernel scheduling for GPUs

Proceedings of ISPA/IUCC 2017

A dynamic special-purpose scheduler for concurrent kernels on GPU

Proceedings of ICCKE 2016

Dynamic task mapping and scheduling with temperature-awareness on network-on-chip based multicore systems

J. Syst. Archit