Three-phase time-aware energy minimization with DVFS and unrolling for Chip Multiprocessors
Introduction
Computer architecture has evolved from single core to multi-core. Energy dissipation in a Chip Multiprocessor (CMP) system is becoming a major concern, especially for handset/embedded devices with CMP systems. These devices are usually powered by battery, and their sizes are quite small. Higher power consumption produces more heat. Accumulated heat degrades device reliability. On the other hand, in order to extend the battery lifetime, the power consumption of a CMP system must be kept low. Therefore, low-power design is essential for battery powered CMP systems. In addition to dynamic power management (DPM) [1], another effective mechanism to reduce power consumption is Dynamic Voltage and Frequency Scaling (DVFS) [2], [3], which achieves power savings by dynamically adjusting the supply voltage and operating frequency [4] of the processing element/cores, subject to data dependencies and timing constraints of the system. For the sake of convenience, core, processor, and processing element (PE) are used interchangeably in this paper.
Early DVFS research [5] focused on a uniprocessor system executing independent tasks. In order to handle applications with a multiprocessor system, several DVFS algorithms [6], [7], [8], [9], [10], [11] have been proposed recently to deal with task stretching and scheduling under data dependency and timing constraints. DVFS algorithms for a CMP system are usually developed based on the task graph. A heuristic DVFS algorithm by searching the critical path in a task graph was presented in [6]. It uniformly stretches the execution time of each task until none of the task’s execution time could be further stretched in this critical path. After that, timing constraints for the task graph are updated and the new critical path is searched. If no task in the new critical path could be further stretched, then the algorithm terminates. Otherwise the above procedures are applied to the new critical path. However, the algorithm is suboptimal for applications where different tasks have different stretching ability, or where they are executed on heterogeneous distributed architectures, in which different processors have different voltage scaling characteristics. Luo et al. [7] and Schmitz et al. [12] proposed two DVFS algorithms to overcome these drawbacks. In these algorithms, the concept of energy gradient is used for distributing slacks among tasks. The slack of a task is always allocated to the processor that has the largest energy gradient among all processors in the system. In their work, DVFS is applied with the assumption that the supply voltage of the PE can be adjusted continuously. Zhang et al. [8] formulated the DVFS problem as an Integer Linear Programming (ILP) problem, which is efficiently solved by approximation. However, the power profile (available voltage levels and correspondent operation frequencies) of the PE is not considered in this work. In addition, these algorithms have a common limitation in that they only look at one period () of the task graph, i.e., a directed acyclic graph (DAG). In a real-life application, the deadline of a task may be greater than of the DAG. Considering a sensing and data processing application that is implemented on a CMP system, the system consists of a sensor and 2 PEs. The sensor acquires data and then feeds them into PE1 for task1; PE1’s output feeds PE2 for task2. If the deadline of the tasks are 3 and 4 ms, respectively, and the sensor sampling interval is 1 ms, we need to finish the whole task graph in 1 ms, assuming no delayed processing. Hence, is 1 ms and the deadline of both task1 (3 ms) and task2 (4 ms) are larger than the task graph period (1 ms).
The authors of [13] proposed a DVFS algorithm based on task graph unrolling. They formulated the DVFS problem as an ILP problem, and then solved the ILP to obtain the optimal scheme to stretch task’s execution time, lower operating frequencies of processors and achieve energy dissipation reduction. However, the ILP based approach suffers from high computational complexity. Furthermore, the power profile of a processor is not considered in their formulation. The authors of [14] used nonlinear programming and mixed integer linear programming to minimize energy consumption. They only focused on voltage selection problem and the approach is very complicated. In paper [15], the authors used a hardware-controlled energy management approach, DVFS, with an earliest deadline first method to reduce energy consumption. However, their approach can be systematically improved.
DVFS has proved to be a powerful approach to reduce the energy consumption [16], [17], [18], [3]. Effective exploitation of task slacks is the key to the power reduction of a system. Unrolling task graphs clusters task slacks together, which provides a good opportunity for exploiting DVFS to reduce the energy consumption. In this paper, we propose a three-phase discrete DVFS algorithm that considers power profiles of PEs and uses unrolled task graph for task scheduling. We first present a new discrete DVFS algorithm that takes into account power profiles of PEs. Furthermore, we propose a three-phase DVFS algorithm that achieves better energy saving by clustering task slacks via task graph unrolling. In the first phase, we propose to use a task-scheduling heuristic to assign tasks to PEs. In the second phase, the proposed discrete DVFS algorithm is applied to the given task graph for only one period [19]. In the last phase, the task graphs resulted from the first two phases are unrolled and are chained together to obtain a new task graph. In the new task graph, new task slacks are generated so that the discrete DVFS algorithm could be applied again to further reduce the energy consumption.
Experimental results show that the proposed algorithm reduces the energy dissipation by up to 25% on average, compared to the existing approaches. In addition to achieving more energy savings, our proposed algorithm also reduces the number of idle intervals of the PEs.
The rest of this paper is organized as follows. In Section 2, a motivational example demonstrates that the overall energy dissipation can be further reduced by applying DVFS to an unrolled task graph. Section 3 formally formulates the problem. In Section 4, the proposed algorithm is explained in detail. Sections 5 Experimental results, 6 Conclusion give the experimental results and conclusions, respectively.
Section snippets
A motivational example
In this section, we show the effectiveness of energy saving by combining DVFS and task graph unrolling with a simple example. We consider a DVFS-enabled processor similar to Intel’s Xscale processor [20]. Fig. 1 shows a task graph with 5 tasks and their data dependencies. The period of the task graph is 2.6 ms. There are two identical processing elements in the system: PE0 and PE1. The deadline, the worst-case execution time (WCET) and the correspondent energy dissipation are given in Table 1.
Problem formulation and assumptions
The application tasks and their precedence constraints are usually modeled as a directed acyclic graph (DAG), i.e., the task graph. Given a DAG , where node denotes a task and edge denotes a precedence constraint and data dependency between tasks and . Each task is associated with a deadline , by which the task must finish its execution. can be larger than T, the period of the task graph.
In this paper, we assume that the target processor has homogeneous
Proposed DVFS algorithm
In this section, we will propose a three-phase algorithm that saves more energy by unrolling the task graph. In the first phase, we propose to use a task-scheduling heuristic to assign tasks to PEs. In the second phase, the proposed discrete DVFS algorithm is applied to the given task graph for only one period. In the last phase, the task graphs resulted from the first two phases are unrolled and are chained together to obtain a new task graph. In the new task graph, new task slacks are
Experimental results
In this section, we will show that the proposed three-phase DVFS algorithm results in more energy reduction, compared to the approach in which the DVFS technique is limited to one period of the task graph. In our experimental setup, we consider a DVFS-enabled processor similar to Intel’s Xscale processor [20]. There are two PEs in the system. Each PE has identical voltage levels and frequency levels as shown in Table 2.
Thirteen task graphs are generated using TGFF [26], as shown in Fig. 6. The
Conclusion
In this paper, we proposed a three-phase DVFS algorithm for a CMP system. This algorithm is dedicated to applications where the deadline of a task is larger than one period of the applications task graph. In the first phase, we propose to use a task-scheduling heuristic to assign tasks to PEs. In the second phase, the proposed DVFS algorithm is used, limited to one period of the task graph. Since the deadline of task is larger than one period of the task graph in these applications, we unroll
Acknowledgements
This work was supported in part by the NSF CNS-1249223, NSFC 61071061; NSFC 61170077, SZ-HK Innovation Circle Proj. ZYB200907060012A, NSF GD:10351806001000000, S & T Proj. of SZ JC200903120046A.
Meikang Qiu received the B.E. and M.E. degrees from Shanghai Jiao Tong University, China. He received the M.S. and Ph.D. degrees of Computer Science from University of Texas at Dallas in 2003 and 2007, respectively. He had worked at Chinese Helicopter R&D Institute and IBM. Currently, he is an assistant professor of ECE at University of Kentucky. He is an IEEE Senior member and has published more than 140 papers, including 50+ journals. He is the recipient of the ACM Transactions on Design
References (26)
- Q. Qiu, S. Liu, Q. Wu, Task merging for dynamic power management of cyclic applications in real-time multi-processor...
- M. Weiser, B. Welch, A. Demers, S. Shenker, Scheduling for reduced CPU energy, in: USENIX Symposium on Operating...
- S. Liu, Q. Wu, Q. Qiu, An adaptive scheduling and voltage/frequency selection algorithm for real-time energy harvesting...
- et al.
A dynamic voltage scaled microprocessor system
IEEE Journal of Solid-State Circuits
(2000) - F. Yao, A. Demers, S. Shenker, A scheduling model for reduced CPU energy, in: IEEE Symposium on Foundations of Comp....
- J. Luo, N.K. Jha, Static and dynamic variable voltage scheduling algorithms for real-time heterogeneous distributed...
- M.T. Schmitz, B.M. Al-Hashimi, Considering power variations of DVS processing elements for energy minimization in...
- Y. Zhang, X. Hu, D.Z. Chen, Task scheduling and voltage selection for energy minimization, in: Proc. of Design...
- et al.
Dynamic and leakage energy minimization with soft real-time loop scheduling and voltage assignment
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
(2010) - et al.
Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
(2009)
EAD and PEBD: two energy-aware duplication scheduling algorithms for parallel tasks on homogeneous clusters
IEEE Transactions on Computers
Cited by (113)
CPU frequency scheduling of real-time applications on embedded devices with temporal encoding-based deep reinforcement learning
2023, Journal of Systems ArchitectureA facial geometry based detection model for face manipulation using CNN-LSTM architecture
2023, Information SciencesA fault detection model for edge computing security using imbalanced classification
2022, Journal of Systems ArchitectureA Novel Model Based on Big Data Environment for Text Content Security Recognition
2024, Journal of Signal Processing SystemsBig Data Approach for Fire Prevention and Warning for Power Systems
2023, Journal of Signal Processing SystemsA Comprehensive Confirmation-based Selfish Node Detection Algorithm for Socially Aware Networks
2023, Journal of Signal Processing Systems
Meikang Qiu received the B.E. and M.E. degrees from Shanghai Jiao Tong University, China. He received the M.S. and Ph.D. degrees of Computer Science from University of Texas at Dallas in 2003 and 2007, respectively. He had worked at Chinese Helicopter R&D Institute and IBM. Currently, he is an assistant professor of ECE at University of Kentucky. He is an IEEE Senior member and has published more than 140 papers, including 50+ journals. He is the recipient of the ACM Transactions on Design Automation of Electronic Systems (TODAES) 2011 Best Paper Award. He also received four other Best Paper Awards (IEEE ICESS’12, IEEE/ACM GreenCom’10, IEEE CSE’10, and IEEE EUC’09) and one Best Paper Nomination. His paper about cloud computing has been ranked as the most downloaded paper of JPDC in 2012. He also holds 2 patents and has published 3 books. His research has been supported by NSF, ONR, and Air Force. He has also been awarded Naval Summer Faculty 2012 and SFFP Air Force summer faculty 2009. He has been on various chairs and TPC members for many international conferences. He served as the Program Chair of IEEE EmbeddCom’09 and EM-Com’09. His research interests include embedded systems, computer security, and wireless sensor networks.
Zhong Ming is a professor at College of Computer and Software Engineering of Shenzhen University. He is a member of a council and senior member of China Computer Federation. His major research interests are software engineering and embedded systems. He led two projects of National Natural Science Foundation, and two projects of Natural Science Foundation of Guangdong province, China.
Jiayin Li received the B.E. and M.E. degrees from Huazhong University of Science and Technology (HUST), China, in 2002 and 2006, respectively. He obtained Ph.D. degree from the Department of Electrical and Computer Engineering (ECE), University of Kentucky in May 2012. His research interests include software/hardware co-design for embedded system and high performance computing.
Shaobo Liu received the B.S. degree in material science and engineering from Wuhan University of Technology, Wuhan, China, in 2001, the M.S. degree in electrical engineering from Zhejiang University, Hangzhou, China, in 2004, and the Ph.D. degree in electrical and computer engineering from State University of New York, Binghamton, in 2010. He is currently with Marvell Semiconductor, Inc., Marlborough, MA. His research interests include power/thermal analysis and optimization, leakage estimation and minimization, energy harvesting system design, and energy aware computing.
Bin Wang obtained his B.S. from Zhejiang University in 1992, M.S. from University of Louisville in 1994, and Ph.D. from Ohio State University in 2000. He is a professor at Computer Science department in Wright State University. He obtained US Department of Energy Early Career Award in 2003. His research interests include wireless sensor networks, Communication, and network security.
Zhonghai Lu received the B.Sc. degree in Radio & Electronics from Beijing Normal University, Beijing, China, in 1989, the M.Sc. degree in System-on-Chip Design and the Ph.D. degree in Electronic and Computer Systems Design from KTH Royal Institute of Technology, Stockholm, Sweden, in 2002 and 2007, respectively. From 1989 to 2000, he was an Engineer in the area of electronic and embedded systems. He is currently an Associate Professor with the Department of Electronic Systems, School for Information and Communication Technology, KTH. His research interests include computer and communication system architectures, cyber-physical systems, and performance analysis. He has published over 100 papers in these areas.