Skip to main content

Towards a Predictive Energy Model for HPC Runtime Systems Using Supervised Learning

  • Conference paper
  • First Online:
Book cover Euro-Par 2019: Parallel Processing Workshops (Euro-Par 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11997))

Included in the following conference series:

Abstract

High-Performance Computing systems collect vast amounts of operational data with the employment of monitoring frameworks, often augmented with additional information from schedulers and runtime systems. This amount of data can be used and turned into a benefit for operational requirements, rather than being a data pool for post-mortem analysis. This work focuses on deriving a model with supervised learning which enables optimal selection of CPU frequency during the execution of a job, with the objective of minimizing the energy consumption of a HPC system. Our model is trained utilizing sensor data and performance metrics collected with two distinct open-source frameworks for monitoring and runtime optimization. Our results show good prediction of CPU power draw and number of instructions retired on realistic dynamic runtime settings within a relatively low error margin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://man7.org/linux/man-pages/man5/proc.5.html.

  2. 2.

    https://asc.llnl.gov/coral-2-benchmarks/.

  3. 3.

    https://www.lrz.de/services/compute/linux-cluster/coolmuc3/.

References

  1. Agelastos, A., Allan, B., Brandt, J., Cassella, P., et al.: The lightweight distributed metric service: a scalable infrastructure for continuous monitoring of large scale computing systems and applications. In: Proceedings of SC 2014, pp. 154–165 (2014)

    Google Scholar 

  2. Auweter, A., Bode, A., Brehm, M., Brochard, L., Hammer, N., et al.: A case study of energy aware scheduling on SuperMUC. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 394–409. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_25

    Chapter  Google Scholar 

  3. Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis, Forecasting and Control, 4th edn, p. Chapter 3.2. Wiley, Hoboken (2008)

    Google Scholar 

  4. Eastep, J., Sylvester, S., Cantalupo, C., Geltz, B., Ardanaz, F., et al.: Global extensible open power manager: a vehicle for HPC community collaboration on co-designed energy management solutions. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds.) ISC 2017. LNCS, vol. 10266, pp. 394–412. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58667-0_21

    Chapter  Google Scholar 

  5. Jones, N.: How to stop data centres from gobbling up the worlds electricity. Nature 561, 163–166 (2018)

    Article  Google Scholar 

  6. Koomey, J.G.: Worldwide electricity used in data centers. Environ. Res. Lett. 3(3), 034008 (2008)

    Article  Google Scholar 

  7. Koomey, J.G.: Growth in data center electricity use 2005 to 2010. Analytics Press, New york (2011). http://www.analyticspress.com/datacenters.html

    Google Scholar 

  8. Kumar, A.S., Mazumdar, S.: Forecasting HPC workload using ARMA models and SSA. In: 2016 Proceedings of ICIT, pp. 294–297 (2016)

    Google Scholar 

  9. Kunkel, J., Dolz, M.F.: Understanding hardware and software metrics with respect to power consumption. Sustain. Comput. Inf. Syst. 17, 43–54 (2018)

    Google Scholar 

  10. Lin, X., Wang, Y., Pedram, M.: A reinforcement learning-based power management framework for green computing data centers. In: 2016 Proceedings of IC2E, pp. 135–138. IEEE (2016)

    Google Scholar 

  11. Netti, A., Mueller, M., Auweter, A., Guillen, C., et al.: From facility to application sensor data: modular, continuous and holistic monitoring with DCDB. In: 2019 Proceedings of SC. ACM (2019)

    Google Scholar 

  12. Triki, M., Wang, Y., Ammari, A., Pedram, M.: Hierarchical power management of a system with autonomously power-managed components using reinforcement learning. Integr. VLSI J. 48(C), 10–20 (2015)

    Article  Google Scholar 

  13. Tuncer, O., Ates, E., Zhang, Y., Turk, A., et al.: Online diagnosis of performance variation in HPC systems using machine learning. IEEE Trans. Para. Distrib. Syst. 30(04), 883–896 (2018)

    Article  Google Scholar 

  14. Wang, B., Terboven, C., Mller, M.S.: Performance prediction under power capping. In: 2018 Proceedings of HPCS, pp. 308–313. IEEE (2018)

    Google Scholar 

  15. Wang, Y., Xie, Q., Ammari, A., Pedram, M.: Deriving a near-optimal power management policy using model-free reinforcement learning and bayesian classification. In: 2011 Proceedings of DAC, pp. 41–46 (2011)

    Google Scholar 

  16. Wang, Z., Tian, Z., Xu, J., Maeda, R.K.V., Li, H., et al.: Modular reinforcement learning for self-adaptive energy efficiency optimization in multicore system. In: 2017 Proceedings of ASP-DAC, pp. 684–689. IEEE (2017)

    Google Scholar 

  17. Weaver, V.M.: Linux perf\_event features and overhead. In: 2013 Proceedings of the FastPath Workshop, vol. 13 (2013)

    Google Scholar 

  18. Wilde, T., Auweter, A., Shoukourian, H.: The 4 pillar framework for energy efficient HPC data centers. Comput. Sci. - R&D 29(3–4), 241–251 (2014)

    Google Scholar 

  19. Yang, S., Shafik, R.A., Merrett, G.V., Stott, E., Levine, J.M., et al.: Adaptive energy minimization of embedded heterogeneous systems using regression-based learning. In: 2017 Proceedings of the PATMOS Workshop (2015)

    Google Scholar 

Download references

Acknowledgements.

This work originated from the TUM Data Innovation Lab, and was further supported by Intel Deutschland GmbH and LRZ.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniele Tafani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ozer, G. et al. (2020). Towards a Predictive Energy Model for HPC Runtime Systems Using Supervised Learning. In: Schwardmann, U., et al. Euro-Par 2019: Parallel Processing Workshops. Euro-Par 2019. Lecture Notes in Computer Science(), vol 11997. Springer, Cham. https://doi.org/10.1007/978-3-030-48340-1_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-48340-1_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-48339-5

  • Online ISBN: 978-3-030-48340-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics