PEPS: predictive energy-efficient parallel scheduler for multi-core processors

Maghsoud, Zeinab; Noori, Hamid; Pour Mozaffari, Saadat

doi:10.1007/s11227-020-03562-x

PEPS: predictive energy-efficient parallel scheduler for multi-core processors

Published: 02 January 2021

Volume 77, pages 6566–6585, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

350 Accesses
4 Citations
Explore all metrics

Abstract

In multi-core processors, energy efficiency and performance consideration are essential issues. Usually, energy-saving techniques result in performance loss and vice versa. Therefore, energy delay product (EDP) is used broadly in many applications as a trade-off between energy saving and performance improvement. This paper presents a technique to perform work-stealing scheduling in the operating system kernel without needing any modification to the user-space program. The proposed scheduling uses predictive models to determine the optimal active number of cores and clock frequency of the processor as an optimum configuration at runtime for any running program to achieve the minimum EDP value. Since EDP is considered as a long-term metric, at runtime, in each specific time frame, PEPS uses the instruction per watt (IPW) to determine the best configuration. By using performance and power predicting models, PEPS finds the optimal configuration in terms of energy efficiency for the next time interval. Because different workloads at runtime have different behaviors and programs with different degrees of parallelization acted variously, the proposed method uses performance counters as a factor for workload characterization. Compared to the Linux scheduler, the proposed algorithm has up to 25% improvement in energy saving at the cost of 7% performance loss. Moreover, while reducing the temperature by 24%, it results in 19% improvement in EDP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Article Open access 06 April 2024

Peter Thoman & Philip Salzmann

Analyzing the impact of various parameters on job scheduling in the Google cluster dataset

Article 29 March 2024

Danyal Shahmirzadi, Navid Khaledian & Amir Masoud Rahmani

Task scheduling algorithms for energy optimization in cloud environment: a comprehensive review

Article 05 January 2022

R. Ghafari, F. Hassani Kabutarkhani & N. Mansouri

References

Hennessy J, Patterson D (2006) Computer architecture: a quantitative approach, vol 4. Morgan Kaufman, San Francisco
MATH Google Scholar
Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38:114–117
Google Scholar
Blumofe RD (1995) Executing multithreaded programs efficiently. Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
Gautier T, Besseron X, Pigeon L (2007). Kaapi: a thread scheduling runtime system for data flow computations on cluster of multiprocessors. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation. ACM, New York, pp 15–23
Leiserson CE, Charles E (2009) The Cilk++ concurrency platform. In: Proceedings of the 46th Annual Design Automation Conference (DAC09), pp 522–527
Duran A, Corbal J and Ayguad Eduard (2008). Evaluation of OpenMP task scheduling strategies. In: Eigenmann R, de Supinski BR (eds) OpenMP in a New Era of Parallelism. IWOMP. Lecture Notes in Computer Science, vol 5004. Springer, Berlin
Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: OOPSLA’05: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, New York, pp 519–538
Horowitz M, Indermaur T, González R (1994) Low-power digital design. In: Proceedings of 1994 IEEE Symposium on Low Power Electronics, pp 8–11
Sergey Z, Carlos SJ, Sergey B, Alexandra F, Manuel P (2013) Survey of energy-cognizant scheduling techniques. IEEE Trans Parallel Distrib Syst 24:1447–1464
Article Google Scholar
Shinde J, Salankar SS (2011) Clock gating—a power optimizing technique for VLSI circuits. In: 2011 Annual IEEE India Conference, IEEE
Nandita S, Prakash NS, Shalakha D, Sivaranjani D (2015) Power Reduction by clock gating technique. Procedia Technol 21:631–635
Article Google Scholar
Chien TH, Chang RG (2015) Dynamic voltage and frequency scaling optimization for multi-core architectures. In: Intelligent Systems and Applications: Proceedings of the International
Donald J, Martonosi M (2006) Techniques for multi-core thermal management: classification and new exploration. ACM SIGARCH Comput Archit News 34:2
Article Google Scholar
Zanini F, Atienza D, Benini L, Micheli G (2009) Multi-core thermal management with model predictive control. In: European Conference Circuit Theory and Design (ECCTD), vol 1, pp 711–714
Wang Y, Ma K, Wang X (2009) Temperature-constrained power control for chip multiprocessors with online model estimation. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, pp 314–324
Cui Y, Zhang W, He B (2017) A variation-aware adaptive fuzzy control system for thermal management of microprocessors. IEEE Trans Large Scale Integr (VLSI) Syst 25:683–695
Article Google Scholar
Alrabea A, Alzubi OA, Alzubi JA (2020) A task-based model for minimizing energy consumption in WSNs. Energy Syst 29:1423–1431
Google Scholar
Lawler EL, Labetoulle J (1978) On preemptive scheduling of unrelated parallel processors by linear programming. J ACM (JACM) 25:612–619
Article MathSciNet Google Scholar
Bailis P, Reddi VJ, Gandhi S, Brooks D, Seltzer M (2011) Dimetrodon: processor-level preventive thermal management via idle cycle injection. In: IEEE 48th ACM/EDAC/IEEE Design Automation Conference (DAC), New York, USA
Chadha G, Mahlke S, Narayanasamy S (2012) When less is more (LIMO): controlled parallelism for improved efficiency. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, 2012. CASES, pp 141–150
Charr JC, Couturier R, Fanfakh A, Giersch A (2014) Dynamic frequency scaling for energy consumption reduction in synchronous distributed applications. In: IEEE International Symposium on Parallel and Distributed Processing with Applications
Chien TH, Chang RG (2015) Dynamic voltage and frequency scaling optimization for multi-core architectures. In: Intelligent System and Applications, 2015
Chen Q, Guo M (2018) Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans Comput 67:784–798
Article MathSciNet Google Scholar
Cochran R, Hankendi C, Coskun A, Reda S (2011) Identifying the optimal energy-efficient operating points of parallel workloads. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Ju T et al (2016) Thread count prediction model: dynamically adjusting threads for heterogeneous many-core systems. In: IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)
Wang W, Davidson JW, Soffa ML (2016) Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain
De Daniele S, Torquati M, Danelutto M (2016) A reconfiguration algorithm for power-aware parallel applications. ACM Trans Archit Code Optim 43:1–25
Google Scholar
Silva VRG, Furtunato A, Georgiou K, Eder K, Xavier-de-Souza S (2018) Energy-optimal configuration for single-node HPC applications. http://arxiv.org/abs/1805.00998
Blumofe RD, Leiserson CE, Santa Fe (1995) Scheduling multithreaded computations by work stealing. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science, vol 46. Journal of the ACM, New Mexico, pp 356–368
Imam S, Sarkar V, Träff J, Hunold S, Versaci F (2015) Load balancing prioritized tasks via work-stealing. In: Euro-Par 2015: Parallel Processing. Lecture notes in Computer Science, vol 9233
Guo Y et al (2010) SLAW: a scalable locality-aware adaptive work-stealing scheduler. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, USA, pp 1–12
Liu YD, Binghamton SUNY (2012) Green thieves in work stealing. In: ASPLOS’12 (Provactive Ideas session)
Ribic H, Liu YD (2014) Energy-efficient work-stealing language runtimes. ACM SIGARCH Comput Archit News 4:513–528
Article Google Scholar
Shankar S, Lakomski G, Alvarado C, Hay R (2014) Power aware work-stealing in homogeneous multi-core systems. In: FUTURE COMPUTING: the Sixth International Conference on Future Computational Technologies and Applications
Chen Q, Zheng L, Guo M, Phoenix HZ (2014) EEWA: energy-efficient workload-aware task scheduling in multi-core architectures. IEEE, AZ, USA
Quan C, Minyi G (2018) Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans Comput 67:784–798
Article MathSciNet Google Scholar
https://github.com/SakalisC/Splash-3/tree/master/codes. Accessed 26 Mar 2020
Al-hayanni MA et al (2020) PARMA: parallelization-aware run-time management for energy-efficient many-core systems. IEEE Trans Comput (Early Access) 69:1507–1518
Article MathSciNet Google Scholar
Salami B, Noori H, Naghibzadeh M (2020) Fairness-aware energy efficient scheduling on heterogeneous multi-core processors. IEEE Trans Comput 70:72–82
Article Google Scholar
Blumofe RD, Leiserson CE (1994) Scheduling multithreaded computations by work stealing. In: Proceeding of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico, pp 356–368
Bircher WL, John LK, San J (2007) Complete system power estimation: a trickle-down approach based on performance events. In: IEEE International Symposium on Performance Analysis of Systems & Software, CA, USA
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Brodowski D, Golde N (2015) CPU frequency and voltage scaling code in the Linux (TM) kernel. Linux CPUFreq. CPUFreq Governors
Kim S-W, Lee JJ-S, Dugar V, De Vega J (2014) Intel® power gadget. Intel Corporation, vol 7
Eranian S (2006) Perfmon2: a flexible performance monitoring interface for Linux. In: Proceeding of the Ottawa Linux Symposium

Download references

Author information

Authors and Affiliations

Amirkabir University of Technology, 4413-15875, Tehran, Iran
Zeinab Maghsoud & Saadat Pour Mozaffari
Computer Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, 9177948974, Mashhad, Iran
Hamid Noori

Authors

Zeinab Maghsoud
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Noori
View author publications
You can also search for this author in PubMed Google Scholar
Saadat Pour Mozaffari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hamid Noori.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maghsoud, Z., Noori, H. & Pour Mozaffari, S. PEPS: predictive energy-efficient parallel scheduler for multi-core processors. J Supercomput 77, 6566–6585 (2021). https://doi.org/10.1007/s11227-020-03562-x

Download citation

Accepted: 07 December 2020
Published: 02 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11227-020-03562-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PEPS: predictive energy-efficient parallel scheduler for multi-core processors

Abstract

Access this article

Similar content being viewed by others

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Analyzing the impact of various parameters on job scheduling in the Google cluster dataset

Task scheduling algorithms for energy optimization in cloud environment: a comprehensive review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

PEPS: predictive energy-efficient parallel scheduler for multi-core processors

Abstract

Access this article

Similar content being viewed by others

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Analyzing the impact of various parameters on job scheduling in the Google cluster dataset

Task scheduling algorithms for energy optimization in cloud environment: a comprehensive review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation