GPU Energy optimization based on task balance scheduling
Introduction
Machine learning increases the computational complexity and dramatically increases the power consumption of the entire graphics processing unit (GPU) involved. However, a challenge remains in how to decrease the amount of energy consumed by GPUs while optimizing their computing power. In fact, the effectiveness of scheduling schemes is critical in determining GPU power consumption. If the GPU cannot allocate tasks dynamically and rationally, this shortcoming can have a significant negative impact on the utilization and power efficiency of GPUs.
To reduce the power consumption of GPUs, scholars have proposed various strategies and models based on either static or dynamic scheduling, as detailed in Section 2 of this study. The efficiency of task scheduling performed by a GPU’s streaming multiprocessors (SMs) has a significant impact on energy consumption. However, current task scheduling strategies rarely consider the balance of the task scheduling strategy. The result is that some SM workloads in the GPU are overloaded, and others are not saturated. This can lead to a loss in the overall power consumption of GPUs.
The imbalance of GPU task scheduling is mainly due to two reasons: First, task migration often results from limiting energy consumption. Zeng et al. [1] described the task migration phenomenon in multiprocessors environment similar with GPU’s. Task migration causes system energy loss and can have a great impact on the stability and robustness of GPUs. Current task scheduling strategies often cannot reduce this migration phenomenon. Second, common task scheduling procedures rely on either SM-based device utilization or task-based features for energy optimization. For example, in [2], the author proposed a new algorithm for optimizing task scheduling dynamic time and local remaining execution time plane abstraction in a multiprocessor. This algorithm considers the processor’s execution capabilities but ignores the task-related characteristics of the processor and the cooperative relationship between these execution tasks and the SM. Ren et al. [3] proposed a workload-aware harmonic partition scheduling scheme for periodic probabilistic real-time tasks on multiple processors. This scheme sorts tasks based on workload and packs them into processors one by one. Although this solution considers the workload characteristics of executing tasks, it ignores the matching relationship between tasks and SMs in the processor. These strategies often fail to combine device utilization and task characteristics, reducing due effect of energy saving.
We proposed an approach based on task balancing and dynamic scheduling called the coefficient of balance and history ratio value (CB-HRV) task scheduling strategy. This scheduling strategy was based on the theory of load balancing and an algorithm for task scheduling. First, we analyzed the factors that influenced the amount of energy used for task scheduling in the GPU environment, then we abstracted the task balance impact factor (also known as coefficient of balance, or simply CB) and the streaming multiprocessor historical utilization values (HRVs) that affected task migration. We used this information to reduce the migration of tasks by balancing task assignments among the various SMs. Finally, we reduced task migration, thereby reducing the energy loss in the GPU. The algorithm combined the SM device utilization status and the task characteristics. By rationally assigning sorting tasks to sorting SMs, this method considered both the resource attributes of the SM and the migration characteristics of the tasks with respect to execution in the SM. This way, the balance-based GPU energy consumption optimization scheduling method was realized, and energy loss was reduced.
Our research contributed to the study of GPU energy optimization in the following ways:
- (1)
We constructed an innovative task-balanced scheduling algorithm named CB-HRV. We used the algorithm to reduce the migration phenomenon in GPUs, thereby reducing the energy loss caused by that phenomenon.
- (2)
We creatively combined the computing resource attributes (SM historical resource utilization) and task characteristics (CB values) to achieve better energy optimization strategies.
- (3)
We constructed and implemented the algorithm framework and pseudocode of CB-HRV. The task scheduling strategy was optimized by sorting the task and sorting the utilization of the SMs to realize the comprehensive energy consumption optimization of the GPU.
The structure of the remaining portion of this study is as follows: In Section 2, we provide an overview of the current popular GPU task scheduling methods for energy consumption and analyze the advantages and disadvantages of these scheduling algorithms. In Section 3, we describe the energy consumption model for multi-SM in GPU and analyze the balance effect in task scheduling. In Section 4, we describe the proposed CB-HRV approach based on balance scheduling and demonstrate the energy consumption mechanism of the CB-HRV. We describe details about the implementation and pseudocode of the CB-HRV. In Section 5, we make an empirical comparison between the CB-HRV and three common task allocation scheduling algorithms, and show that the results verified the advantages of the proposed method. Our conclusion is given in Section 6.
Section snippets
Related work
Energy optimization through task scheduling in the GPU can be achieved using both hardware and software strategies.
Software-based energy optimization strategies are regarded as effective energy optimization tools because of their low hardware costs and relative ease of implementation. In general, software-based task scheduling strategies are divided into two categories: static scheduling methods, which allocate the required SM resources directly through programming, and dynamic scheduling
SM-Based energy consumption model
The computing component of the GPU is composed of multiple SMs and an L1 Cache. SMs are the core components and are composed of several high-speed pipelines to complete task calculations quickly. Since this module is responsible for high-speed computing functions, it consumes about 40% of the GPU’s power [11].
For our research, we used a GPU power optimization research scenario that was based on multi-SM work scenarios. Fig. 1 shows its architecture. Tasks entering the GPU are assigned to the
Proposed scheduling strategy: Methodology
In order to optimize the power consumption of the GPU, we proposed a dynamic GPU task balancing strategy called CB-HRV. The strategy was designed to achieve three goals: first, to avoid unnecessary migration of tasks in the SM; second, to use resource attributes and task characteristics comprehensively, and third, to employ the balance strategy to schedule tasks in SMs in a manner that would provide optimal energy consumption in the GPU.
Experiment and simulation
To validate the feasibility and effectiveness of our proposed method, we analyzed and compared the energy consumption performance and execution efficiency of the proposed CB-HRV method with that of three existing scheduling methods: RAD, DFB, and PHB.
Conclusion
We proposed a dynamic scheduling strategy called CB-HRV, which was designed to reduce GPU energy use through an innovative approach to GPU task balancing. First, we analyzed the factors that influenced energy use during task scheduling in the GPU environment, and then we abstracted the CB and SM HRV that affected task migration. We used these data to reduce task migration in scheduling. During task execution, the balance strategy scheduled the tasks in the SM so that GPU energy use was
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant No. 61772352; the Science and Technology Planning Project of Sichuan Province under Grant No. 2019YFG0400, 2018GZDZX0031, 2018GZDZX0004, 2017GZDZX0003, 2018JY0182, 19ZDYF1286.
Declaration of Competing Interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled ``GPU Energy Optimization Based on Task Balance Scheduling''.
Acknowledgements
We thank the anonymous editors for their linguistic assistance during the preparation of this manuscript. We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript.
Yanhui Huang received his B.S. and the M.S. degree in Radio Electronics and Computer Science from Sichuan University in 1997 and 2002 respectively. Currently he is a lecture in the school of computer science at Sichuan University. His current research interests include embedded real-time system and green computing.
References (20)
- et al.
Energy-aware task migration for multiprocessor real-time systems
Future Gener. Comput. Syst.
(2016) - et al.
Reduction of task migrations and preemptions in optimal real–time scheduling for multiprocessors by using dynamic t-l plane
J. Syst. Archit.
(2017) - et al.
Prototyping dynamic task migration on heterogeneous reconfigurable systems
Proceedings of RSP ’17
(2017) - et al.
GPUWattch: enabling energy optimizations in GPGPUs
Proceedings of ISCA ’13
(2013) - et al.
Workload-aware harmonic partitioned scheduling for fixed-priority probabilistic real-time tasks on multiprocessors
J. Syst. Archit.
(2019) - et al.
A power-aware symbiotic scheduling algorithm for concurrent GPU kernels
Proceedings of ICPADS 2015
(2015) - et al.
Low-energy kernel scheduling approach for energy saving
Proceedings of ICESS 2016
(2016) - et al.
Preemption-aware kernel scheduling for GPUs
Proceedings of ISPA/IUCC 2017
(2017) - et al.
A dynamic special-purpose scheduler for concurrent kernels on GPU
Proceedings of ICCKE 2016
(2016) - et al.
Dynamic task mapping and scheduling with temperature-awareness on network-on-chip based multicore systems
J. Syst. Archit
(2019)
Cited by (5)
Energy aware fixed priority scheduling in mixed-criticality systems
2023, Computer Standards and InterfacesCitation Excerpt :In addition, these studies exploited the slack time generated from tasks to save energy while meeting deadline constraints. Recently, MC scheduling had attracted the attention of many researchers [1–3,21–24]. The first work about the MC scheduling problem had been studied in [1].
Energy efficient EDF-VD-based mixed-criticality scheduling with shared resources
2021, Journal of Systems ArchitectureCitation Excerpt :Many studies [25–28] employed DVFS techniques to dynamically adjust processor speed to save energy in a traditional real-time system. However, few studies focused on energy efficient scheduling in MC systems [29]. Huang et al. [10] first applied DVFS based on static slack time to reduce energy consumption in MC systems, but the energy saving was far from satisfactory.
A two-layer optimal scheduling framework for energy savings in a data center for Cyber–Physical–Social Systems
2021, Journal of Systems ArchitectureCitation Excerpt :However, one common drawback of the empirical methods is that the methods estimate the energy consumption of the chiller and ignore some important system controllable variables, such as some variables of the coupling equipment, resulting in poor prediction accuracy and systems operation scheduling. Optimization methods for the parameter management of a cooling system include expert system-based methods [16–18], mixed integer linear or non-linear programming (MILP or MINP) and artificial intelligence (AI) methods. An online dynamic expert system-based scheduling strategy is applied to system-on-programmable chip (SOPC) based reconfigurable cluster systems, which can reconfigure or shut down field programmable gate arrays (FPGA) nodes based on workload changes to reduce runtime energy consumption [17].
GPU-NEST: Characterizing Energy Efficiency of Multi-GPU Inference Servers
2020, IEEE Computer Architecture Letters
Yanhui Huang received his B.S. and the M.S. degree in Radio Electronics and Computer Science from Sichuan University in 1997 and 2002 respectively. Currently he is a lecture in the school of computer science at Sichuan University. His current research interests include embedded real-time system and green computing.
Bing Guo received his B.S. degree in Computer Science from the Beijing Institute of Technology in China, and M.S. and Ph.D. degrees in Computer Science from the University of Electronic Science and Technology of China, China, in 1991, 1999, and 2002, respectively. He is currently a Professor in the School of Computer Science at the Sichuan University, China. His current research interests include embedded real-time system and green computing.
Yan Shen received her M.S. degree in Mechatronics Engineering and Ph.D. degree in Measuring and Testing Technology and Instruments from University of Electronic Science and Technology of China in 2001 and 2004, respectively. Currently she is a professor in the Control Engineering College, Chengdu University of Information and Technology. Her main research interests include distributed measurement systems, embedded system development, wireless sensor networks, and robotics.