Skip to main content

Advertisement

Log in

PEPS: predictive energy-efficient parallel scheduler for multi-core processors

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In multi-core processors, energy efficiency and performance consideration are essential issues. Usually, energy-saving techniques result in performance loss and vice versa. Therefore, energy delay product (EDP) is used broadly in many applications as a trade-off between energy saving and performance improvement. This paper presents a technique to perform work-stealing scheduling in the operating system kernel without needing any modification to the user-space program. The proposed scheduling uses predictive models to determine the optimal active number of cores and clock frequency of the processor as an optimum configuration at runtime for any running program to achieve the minimum EDP value. Since EDP is considered as a long-term metric, at runtime, in each specific time frame, PEPS uses the instruction per watt (IPW) to determine the best configuration. By using performance and power predicting models, PEPS finds the optimal configuration in terms of energy efficiency for the next time interval. Because different workloads at runtime have different behaviors and programs with different degrees of parallelization acted variously, the proposed method uses performance counters as a factor for workload characterization. Compared to the Linux scheduler, the proposed algorithm has up to 25% improvement in energy saving at the cost of 7% performance loss. Moreover, while reducing the temperature by 24%, it results in 19% improvement in EDP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Hennessy J, Patterson D (2006) Computer architecture: a quantitative approach, vol 4. Morgan Kaufman, San Francisco

    MATH  Google Scholar 

  2. Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38:114–117

    Google Scholar 

  3. Blumofe RD (1995) Executing multithreaded programs efficiently. Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology

  4. Gautier T, Besseron X, Pigeon L (2007). Kaapi: a thread scheduling runtime system for data flow computations on cluster of multiprocessors. In: Proceedings of the 2007 International Workshop on Parallel Symbolic Computation. ACM, New York, pp 15–23

  5. Leiserson CE, Charles E (2009) The Cilk++ concurrency platform. In: Proceedings of the 46th Annual Design Automation Conference (DAC09), pp 522–527

  6. Duran A, Corbal J and Ayguad Eduard (2008). Evaluation of OpenMP task scheduling strategies. In: Eigenmann R, de Supinski BR (eds) OpenMP in a New Era of Parallelism. IWOMP. Lecture Notes in Computer Science, vol 5004. Springer, Berlin

  7. Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: OOPSLA’05: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, New York, pp 519–538

  8. Horowitz M, Indermaur T, González R (1994) Low-power digital design. In: Proceedings of 1994 IEEE Symposium on Low Power Electronics, pp 8–11

  9. Sergey Z, Carlos SJ, Sergey B, Alexandra F, Manuel P (2013) Survey of energy-cognizant scheduling techniques. IEEE Trans Parallel Distrib Syst 24:1447–1464

    Article  Google Scholar 

  10. Shinde J, Salankar SS (2011) Clock gating—a power optimizing technique for VLSI circuits. In: 2011 Annual IEEE India Conference, IEEE

  11. Nandita S, Prakash NS, Shalakha D, Sivaranjani D (2015) Power Reduction by clock gating technique. Procedia Technol 21:631–635

    Article  Google Scholar 

  12. Chien TH, Chang RG (2015) Dynamic voltage and frequency scaling optimization for multi-core architectures. In: Intelligent Systems and Applications: Proceedings of the International

  13. Donald J, Martonosi M (2006) Techniques for multi-core thermal management: classification and new exploration. ACM SIGARCH Comput Archit News 34:2

    Article  Google Scholar 

  14. Zanini F, Atienza D, Benini L, Micheli G (2009) Multi-core thermal management with model predictive control. In: European Conference Circuit Theory and Design (ECCTD), vol 1, pp 711–714

  15. Wang Y, Ma K, Wang X (2009) Temperature-constrained power control for chip multiprocessors with online model estimation. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, pp 314–324

  16. Cui Y, Zhang W, He B (2017) A variation-aware adaptive fuzzy control system for thermal management of microprocessors. IEEE Trans Large Scale Integr (VLSI) Syst 25:683–695

    Article  Google Scholar 

  17. Alrabea A, Alzubi OA, Alzubi JA (2020) A task-based model for minimizing energy consumption in WSNs. Energy Syst 29:1423–1431

    Google Scholar 

  18. Lawler EL, Labetoulle J (1978) On preemptive scheduling of unrelated parallel processors by linear programming. J ACM (JACM) 25:612–619

    Article  MathSciNet  Google Scholar 

  19. Bailis P, Reddi VJ, Gandhi S, Brooks D, Seltzer M (2011) Dimetrodon: processor-level preventive thermal management via idle cycle injection. In: IEEE 48th ACM/EDAC/IEEE Design Automation Conference (DAC), New York, USA

  20. Chadha G, Mahlke S, Narayanasamy S (2012) When less is more (LIMO): controlled parallelism for improved efficiency. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, 2012. CASES, pp 141–150

  21. Charr JC, Couturier R, Fanfakh A, Giersch A (2014) Dynamic frequency scaling for energy consumption reduction in synchronous distributed applications. In: IEEE International Symposium on Parallel and Distributed Processing with Applications

  22. Chien TH, Chang RG (2015) Dynamic voltage and frequency scaling optimization for multi-core architectures. In: Intelligent System and Applications, 2015

  23. Chen Q, Guo M (2018) Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans Comput 67:784–798

    Article  MathSciNet  Google Scholar 

  24. Cochran R, Hankendi C, Coskun A, Reda S (2011) Identifying the optimal energy-efficient operating points of parallel workloads. In: IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

  25. Ju T et al (2016) Thread count prediction model: dynamically adjusting threads for heterogeneous many-core systems. In: IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)

  26. Wang W, Davidson JW, Soffa ML (2016) Predicting the memory bandwidth and optimal core allocations for multi-threaded applications on large-scale NUMA machines. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain

  27. De Daniele S, Torquati M, Danelutto M (2016) A reconfiguration algorithm for power-aware parallel applications. ACM Trans Archit Code Optim 43:1–25

    Google Scholar 

  28. Silva VRG, Furtunato A, Georgiou K, Eder K, Xavier-de-Souza S (2018) Energy-optimal configuration for single-node HPC applications. http://arxiv.org/abs/1805.00998

  29. Blumofe RD, Leiserson CE, Santa Fe (1995) Scheduling multithreaded computations by work stealing. In: Proceedings of the 35th Annual Symposium on Foundations of Computer Science, vol 46. Journal of the ACM, New Mexico, pp 356–368

  30. Imam S, Sarkar V, Träff J, Hunold S, Versaci F (2015) Load balancing prioritized tasks via work-stealing. In: Euro-Par 2015: Parallel Processing. Lecture notes in Computer Science, vol 9233

  31. Guo Y et al (2010) SLAW: a scalable locality-aware adaptive work-stealing scheduler. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS), Atlanta, GA, USA, pp 1–12

  32. Liu YD, Binghamton SUNY (2012) Green thieves in work stealing. In: ASPLOS’12 (Provactive Ideas session)

  33. Ribic H, Liu YD (2014) Energy-efficient work-stealing language runtimes. ACM SIGARCH Comput Archit News 4:513–528

    Article  Google Scholar 

  34. Shankar S, Lakomski G, Alvarado C, Hay R (2014) Power aware work-stealing in homogeneous multi-core systems. In: FUTURE COMPUTING: the Sixth International Conference on Future Computational Technologies and Applications

  35. Chen Q, Zheng L, Guo M, Phoenix HZ (2014) EEWA: energy-efficient workload-aware task scheduling in multi-core architectures. IEEE, AZ, USA

  36. Quan C, Minyi G (2018) Contention and locality-aware work-stealing for iterative applications in multi-socket computers. IEEE Trans Comput 67:784–798

    Article  MathSciNet  Google Scholar 

  37. https://github.com/SakalisC/Splash-3/tree/master/codes. Accessed 26 Mar 2020

  38. Al-hayanni MA et al (2020) PARMA: parallelization-aware run-time management for energy-efficient many-core systems. IEEE Trans Comput (Early Access) 69:1507–1518

    Article  MathSciNet  Google Scholar 

  39. Salami B, Noori H, Naghibzadeh M (2020) Fairness-aware energy efficient scheduling on heterogeneous multi-core processors. IEEE Trans Comput 70:72–82

    Article  Google Scholar 

  40. Blumofe RD, Leiserson CE (1994) Scheduling multithreaded computations by work stealing. In: Proceeding of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico, pp 356–368

  41. Bircher WL, John LK, San J (2007) Complete system power estimation: a trickle-down approach based on performance events. In: IEEE International Symposium on Performance Analysis of Systems & Software, CA, USA

  42. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  43. Brodowski D, Golde N (2015) CPU frequency and voltage scaling code in the Linux (TM) kernel. Linux CPUFreq. CPUFreq Governors

  44. Kim S-W, Lee JJ-S, Dugar V, De Vega J (2014) Intel® power gadget. Intel Corporation, vol 7

  45. Eranian S (2006) Perfmon2: a flexible performance monitoring interface for Linux. In: Proceeding of the Ottawa Linux Symposium

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Noori.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maghsoud, Z., Noori, H. & Pour Mozaffari, S. PEPS: predictive energy-efficient parallel scheduler for multi-core processors. J Supercomput 77, 6566–6585 (2021). https://doi.org/10.1007/s11227-020-03562-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03562-x

Keywords

Navigation