Abstract
The energy consumption of an exascale High-Performance Computing (HPC) supercomputer rivals that of tens of thousands of people in terms of electricity demand. Given the substantial energy footprint of exascale HPC systems and the increasing strain on power grids due to climate-related events, electricity providers are starting to impose power caps during critical periods to their users. In this context, it becomes crucial to implement strategies that manage the power consumption of supercomputers while simultaneously ensuring their uninterrupted operation.
This paper investigates the proposition that HPC users can willingly sacrifice some processing performance to contribute to a global energy-saving initiative. With the objective of offering an efficient energy-saving strategy by involving users, we introduce a user-assisted supercomputer power-capping methodology. In this approach, users have the option to voluntarily permit their applications to operate in a power-capped mode, denoted as ’Eco-Mode’, as necessary.
Leveraging HPC simulations, along with energy traces and application metadata derived from a recent Top500 HPC supercomputer, we conducted an experimental campaign to quantify the effects of Eco-Mode on energy conservation and on user experience. Specifically, our study aimed to demonstrate that, with a sufficient number of users choosing Eco-Mode, the supercomputer maintains good performances within the specified power cap. Furthermore, we sought to determine the optimal conditions regarding the number of users embracing Eco-Mode and the magnitude of power capping required for applications (i.e., the intensity of Eco-Mode).
Our findings indicate that decreasing the speed of jobs can decrease significantly the number of jobs that must be killed. Moreover, as the adoption of Eco-Mode increases among users, the likelihood of every job to be killed also decreases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
TOP500.org: Green500, TOP500 Supercomputer Sites (2018). https://www.top500.org/
Oak Ridge National Laboratory: Frontier’s architecture (2023). https://olcf.ornl.gov/wp-content/uploads/Frontiers-Architecture-Frontier-Training-Series-final.pdf
Wikipedia: 2021 Texas power crisis (2023). https://en.wikipedia.org/wiki/2021_Texas_power_crisis
Borghesi, A., Collina, F., Lombardi, M., Milano, M., Benini, L.: Power capping in high performance computing systems, vol. 9255 (2015). https://doi.org/10.1007/978-3-319-23219-5_37
Kontorinis, V., et al.: Managing distributed ups energy for effective power capping in data centers. In: 2012 39th Annual International Symposium on Computer Architecture, ISCA 2012, Proceedings - International Symposium on Computer Architecture, pp. 488–499 (2012). https://doi.org/10.1109/ISCA.2012.6237042
Nana, R., Tadonki, C., Dokládal, P., Mesri, Y.: Energy concerns with HPC systems and applications. arXiv preprint arXiv:2309.08615 (2023)
Maiterth, M., et al.: Energy and power aware job scheduling and resource management: global survey - initial analysis. In: 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 685–693. IEEE (2018). https://doi.org/10.1109/IPDPSW.2018.00111
Kocot, B., Czarnul, P., Proficz, J.: Energy-aware scheduling for high-performance computing systems: a survey. Energies 16(2), 890 (2023)
Pierson, J.-M., et al.: DATAZERO: datacenter with zero emission and robust management using renewable energy. IEEE Access 7, 103209–103230 (2019). https://doi.org/10.1109/ACCESS.2019.2930368
Chasapis, D., Moretó, M., Schulz, M., Rountree, B., Valero, M., Casas, M.: Power efficient job scheduling by predicting the impact of processor manufacturing variability. In: Proceedings of the ACM International Conference on Supercomputing, pp. 296–307 (2019)
Hu, Q., Sun, P., Yan, S., Wen, Y., Zhang, T.: Characterization and prediction of deep learning workloads in large-scale GPU datacenters. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15 (2021)
D’Amico, M., Gonzalez, J.C.: Energy hardware and workload aware job scheduling towards interconnected HPC environments. IEEE Trans. Parallel Distrib. Syst. (2021)
Khan, N.K., et al.: Energy measurement and modeling in high performance computing with intel’s RAPL (2018)
Saurav, S.K., GL, G.P., Chauhan, M.: Adaptive power management for HPC applications. In: 2016 2nd International Conference on Green High Performance Computing (ICGHPC), pp. 1–7. IEEE (2016)
Patel, T., Wagenhäuser, A., Eibel, C., Hönig, T., Zeiser, T., Tiwari, D.: What does power consumption behavior of HPC jobs reveal? Demystifying, quantifying, and predicting power consumption characteristics. In: 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 799–809. IEEE (2020)
Shin, W., Oles, V., Karimi, A.M., Ellis, J.A., Wang, F.: Revealing power, energy and thermal dynamics of a 200PF Pre-Exascale supercomputer. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14 (2021)
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_14
Chiesi, M., Vanzolini, L., Mucci, C., Scarselli, E.F., Guerrieri, R.: Power-aware job scheduling on heterogeneous multicore architectures. IEEE Trans. Parallel Distrib. Syst. 26(3), 868–877 (2014)
Borghesi, A., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Predictive modeling for job power consumption in HPC systems. In: Kunkel, J.M., Balaji, P., Dongarra, J. (eds.) High Performance Computing, pp. 181–199. Springer, Cham (2016)
Frey, N.C., et al.: Benchmarking resource usage for efficient distributed deep learning. In: 2022 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–8. IEEE (2022)
Sinha, P., Guliani, A., Jain, R., Tran, B., Sinclair, M.D., Venkataraman, S.: Not all GPUs are created equal: characterizing variability in large-scale, accelerator-rich systems. In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 01–15. IEEE (2022)
Borghesi, A., Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Scheduling-based power capping in high performance computing systems. Sustain. Comput. Inf. Syst. 19, 1–13 (2018)
Etinski, M., Corbalan, J., Labarta, J., Valero, M.: Parallel job scheduling for power constrained HPC systems. Parallel Comput. 38(12), 615–630 (2012)
Georgiou, Y., Glesser, D., Trystram, D.: Adaptive resource and job management for limited power consumption. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 863–870 (2015). https://doi.org/10.1109/IPDPSW.2015.118
Zhao, D., et al.: Sustainable supercomputing for AI: GPU power capping at HPC scale. In: Proceedings of the 2023 ACM Symposium on Cloud Computing, pp. 588–596 (2023)
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib. Syst. 12(6), 529–543 (2001)
Feitelson, D.G., Tsafrir, D., Krakov, D.: Experience with using the parallel workloads archive. J. Parallel Distrib. Comput. 74(10), 2967–2982 (2014)
Borghesi, A., et al.: M100 ExaData: a data collection campaign on the CINECA’s Marconi100 Tier-0 supercomputer. Sci. Data 10(1), 288 (2023)
www.rte-france.com: RTE, le gestionnaire du réseau de transport d’électricité français. https://www.rte-france.com/. Accessed 18 Feb 2024
Dutot, P.-F., Mercier, M., Poquet, M., Richard, O.: Batsim: a realistic language-independent resources and jobs management systems simulator. In: Desai, N., Cirne, W. (eds.) Job Scheduling Strategies for Parallel Processing, pp. 178–197. Springer, Cham (2017)
Zacharov, I., et al.: zhores petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in SKOLKOVO institute of science and technology. Open Eng. 9(1), 512–520 (2019). https://doi.org/10.1515/eng-2019-0059
Dutot, P.-F., Georgiou, Y., Glesser, D., Lefevre, L., Poquet, M., Rais, I.: Towards energy budget control in HPC. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp. 381–390 (2017). https://doi.org/10.1109/CCGRID.2017.16
Acknowledgements
This work was supported by the REGALE (H2020-JTI-EuroHPC-2019-1 agreement n. 956560), and LIGHTAIDGE (HORIZON-MSCA-2022-PF-01 agreement n. 101107953) european projects. We also thank Francesco Antici for curating and sharing the Marconi100 dataset.
Author information
Authors and Affiliations
Contributions
The authors are listed in alphabetic order. All the authors participated to the discussions and elaboration of this work. Danilo contributed to the data processing, Luc implemented the methods in Batsim simulation, conducted the experimental protocol and provided experimental results. Pierre-François helped with implementation and debugging of the methods and the Batsim simulator. All authors participated into the analysis and results interpretation. Danilo was the main writer of Sects. 1, 2, 3, and 4. Luc was the main writer of Sects. 5, and 6. Finally, all authors reviewed the final manuscript.
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Angelelli, L., Carastan-Santos, D., Dutot, PF. (2025). Run Your HPC Jobs in Eco-Mode: Revealing the Potential of User-Assisted Power Capping in Supercomputing Systems. In: Klusáček, D., Corbalán, J., Rodrigo, G.P. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2024. Lecture Notes in Computer Science, vol 14591. Springer, Cham. https://doi.org/10.1007/978-3-031-74430-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-74430-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74429-7
Online ISBN: 978-3-031-74430-3
eBook Packages: Computer ScienceComputer Science (R0)