Towards a Predictive Energy Model for HPC Runtime Systems Using Supervised Learning

Ozer, Gence; Garg, Sarthak; Davoudi, Neda; Poerwawinata, Gabrielle; Maiterth, Matthias; Netti, Alessio; Tafani, Daniele

doi:10.1007/978-3-030-48340-1_48

Gence Ozer²²,
Sarthak Garg²²,
Neda Davoudi²²,
Gabrielle Poerwawinata²²,
Matthias Maiterth²³,
Alessio Netti^22,24 &
…
Daniele Tafani²⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11997))

Included in the following conference series:

European Conference on Parallel Processing

1389 Accesses
6 Citations

Abstract

High-Performance Computing systems collect vast amounts of operational data with the employment of monitoring frameworks, often augmented with additional information from schedulers and runtime systems. This amount of data can be used and turned into a benefit for operational requirements, rather than being a data pool for post-mortem analysis. This work focuses on deriving a model with supervised learning which enables optimal selection of CPU frequency during the execution of a job, with the objective of minimizing the energy consumption of a HPC system. Our model is trained utilizing sensor data and performance metrics collected with two distinct open-source frameworks for monitoring and runtime optimization. Our results show good prediction of CPU power draw and number of instructions retired on realistic dynamic runtime settings within a relatively low error margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Agelastos, A., Allan, B., Brandt, J., Cassella, P., et al.: The lightweight distributed metric service: a scalable infrastructure for continuous monitoring of large scale computing systems and applications. In: Proceedings of SC 2014, pp. 154–165 (2014)
Google Scholar
Auweter, A., Bode, A., Brehm, M., Brochard, L., Hammer, N., et al.: A case study of energy aware scheduling on SuperMUC. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 394–409. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_25
Chapter Google Scholar
Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis, Forecasting and Control, 4th edn, p. Chapter 3.2. Wiley, Hoboken (2008)
Google Scholar
Eastep, J., Sylvester, S., Cantalupo, C., Geltz, B., Ardanaz, F., et al.: Global extensible open power manager: a vehicle for HPC community collaboration on co-designed energy management solutions. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds.) ISC 2017. LNCS, vol. 10266, pp. 394–412. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58667-0_21
Chapter Google Scholar
Jones, N.: How to stop data centres from gobbling up the worlds electricity. Nature 561, 163–166 (2018)
Article Google Scholar
Koomey, J.G.: Worldwide electricity used in data centers. Environ. Res. Lett. 3(3), 034008 (2008)
Article Google Scholar
Koomey, J.G.: Growth in data center electricity use 2005 to 2010. Analytics Press, New york (2011). http://www.analyticspress.com/datacenters.html
Google Scholar
Kumar, A.S., Mazumdar, S.: Forecasting HPC workload using ARMA models and SSA. In: 2016 Proceedings of ICIT, pp. 294–297 (2016)
Google Scholar
Kunkel, J., Dolz, M.F.: Understanding hardware and software metrics with respect to power consumption. Sustain. Comput. Inf. Syst. 17, 43–54 (2018)
Google Scholar
Lin, X., Wang, Y., Pedram, M.: A reinforcement learning-based power management framework for green computing data centers. In: 2016 Proceedings of IC2E, pp. 135–138. IEEE (2016)
Google Scholar
Netti, A., Mueller, M., Auweter, A., Guillen, C., et al.: From facility to application sensor data: modular, continuous and holistic monitoring with DCDB. In: 2019 Proceedings of SC. ACM (2019)
Google Scholar
Triki, M., Wang, Y., Ammari, A., Pedram, M.: Hierarchical power management of a system with autonomously power-managed components using reinforcement learning. Integr. VLSI J. 48(C), 10–20 (2015)
Article Google Scholar
Tuncer, O., Ates, E., Zhang, Y., Turk, A., et al.: Online diagnosis of performance variation in HPC systems using machine learning. IEEE Trans. Para. Distrib. Syst. 30(04), 883–896 (2018)
Article Google Scholar
Wang, B., Terboven, C., Mller, M.S.: Performance prediction under power capping. In: 2018 Proceedings of HPCS, pp. 308–313. IEEE (2018)
Google Scholar
Wang, Y., Xie, Q., Ammari, A., Pedram, M.: Deriving a near-optimal power management policy using model-free reinforcement learning and bayesian classification. In: 2011 Proceedings of DAC, pp. 41–46 (2011)
Google Scholar
Wang, Z., Tian, Z., Xu, J., Maeda, R.K.V., Li, H., et al.: Modular reinforcement learning for self-adaptive energy efficiency optimization in multicore system. In: 2017 Proceedings of ASP-DAC, pp. 684–689. IEEE (2017)
Google Scholar
Weaver, V.M.: Linux perf\_event features and overhead. In: 2013 Proceedings of the FastPath Workshop, vol. 13 (2013)
Google Scholar
Wilde, T., Auweter, A., Shoukourian, H.: The 4 pillar framework for energy efficient HPC data centers. Comput. Sci. - R&D 29(3–4), 241–251 (2014)
Google Scholar
Yang, S., Shafik, R.A., Merrett, G.V., Stott, E., Levine, J.M., et al.: Adaptive energy minimization of embedded heterogeneous systems using regression-based learning. In: 2017 Proceedings of the PATMOS Workshop (2015)
Google Scholar

Download references

Acknowledgements.

This work originated from the TUM Data Innovation Lab, and was further supported by Intel Deutschland GmbH and LRZ.

Author information

Authors and Affiliations

Technische Universität München, Boltzmannstr. 3, 85748, Garching, Germany
Gence Ozer, Sarthak Garg, Neda Davoudi, Gabrielle Poerwawinata & Alessio Netti
Intel Deutschland GmbH, Dornacher Str. 1, 85622, Feldkirchen, Germany
Matthias Maiterth
Leibniz-Rechenzentrum, Boltzmannstr. 1, 85748, Garching, Germany
Alessio Netti & Daniele Tafani

Authors

Gence Ozer
View author publications
You can also search for this author in PubMed Google Scholar
Sarthak Garg
View author publications
You can also search for this author in PubMed Google Scholar
Neda Davoudi
View author publications
You can also search for this author in PubMed Google Scholar
Gabrielle Poerwawinata
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Maiterth
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Netti
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Tafani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniele Tafani .

Editor information

Editors and Affiliations

Gesellschaft für Wissenschaftliche Datenverarbeitung mbH, Göttingen, Germany
Ulrich Schwardmann
Gesellschaft für Wissenschaftliche Datenverarbeitung mbH, Göttingen, Germany
Christian Boehme
CiTIUS, Santiago de Compostela, Spain
Dora B. Heras
University of Rome "Tor Vergata", Rome, Italy
Valeria Cardellini
Inria Bordeaux Sud-Ouest, Talence, France
Emmanuel Jeannot
Engineering Sardegna, Cagliari, Italy
Antonio Salis
University of Turin, Torino, Italy
Claudio Schifanella
University College Dublin, Dublin, Ireland
Ravi Reddy Manumachu
DLR-AS, Göttingen, Germany
Dieter Schwamborn
University of Pisa, Pisa, Italy
Laura Ricci
Ajou University, Suwon, Korea (Republic of)
Oh Sangyoon
RRZE Friedrich-Alexander-Universität, Erlangen, Germany
Thomas Gruber
ICAR-CNR, Napoli, Italy
Laura Antonelli
Tennessee Technological University, Cookeville, TN, USA
Stephen L. Scott

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ozer, G. et al. (2020). Towards a Predictive Energy Model for HPC Runtime Systems Using Supervised Learning. In: Schwardmann, U., et al. Euro-Par 2019: Parallel Processing Workshops. Euro-Par 2019. Lecture Notes in Computer Science(), vol 11997. Springer, Cham. https://doi.org/10.1007/978-3-030-48340-1_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-48340-1_48
Published: 29 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48339-5
Online ISBN: 978-3-030-48340-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics