research-article

Open access

Comparing Power Signatures of HPC Workloads: Machine Learning vs Simulation

Authors:

Sridutt Bhalachandra,

Hai Ah NamAuthors Info & Claims

SC-W '23: Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 1890 - 1893

https://doi.org/10.1145/3624062.3624274

Published: 12 November 2023 Publication History

All formats PDF

Abstract

Power is a limiting factor for supercomputers limiting their scale and operation. Characterizing the power signatures of different application types can enable data centers to operate efficiently, even when power constrained. This paper investigates power profiles of diverse scientific applications, spanning both traditional simulations and modern machine learning (ML) running on the Perlmutter supercomputer at the National Energy Research Scientific Computing Center (NERSC). Our findings indicate that traditional simulations typically consume more power on average than ML workloads. Furthermore, ML applications exhibit periodic power fluctuations attributed to epoch transitions during training. Finally, we discuss the potential implications of the research insights toward automatic demand response (ADR) and considerations for designing future systems.

Supplemental Material

MP4 File - Conference presentation recording

Recording of "Comparing Power Signatures of HPC Workloads: Machine Learning vs Simulation" presentation at the Sustainable Supercomputing (SusSup23) Workshop

Download
151.59 MB

References

[1]

[1] Brian Austin. 2020. https://portal.nersc.gov/project/m888/nersc10/workload/N10_Workload_Analysis.latest.pdf

[2]

Amrita Mathuriya et al.2018. CosmoFlow: Using Deep Learning to Learn the Universe at Scale. arxiv:1808.04728 [astro-ph.CO]

[3]

Aidan P. Thompson et al.2022. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Computer Physics Communications 271 (2022), 108171. https://doi.org/10.1016/j.cpc.2021.108171

[4]

Elizabeth Bautista et al.2019. Collecting, Monitoring, and Analyzing Facility and Systems Data at the National Energy Research Scientific Computing Center. In Workshop Proceedings of the 48th International Conference on Parallel Processing (Kyoto, Japan) (ICPP Workshops ’19). Association for Computing Machinery, New York, NY, USA, Article 10, 9 pages. https://doi.org/10.1145/3339186.3339213

Digital Library

[5]

Eva García-Martín et al.2019. Estimation of energy consumption in machine learning. J. Parallel and Distrib. Comput. 134 (2019), 75–88. https://doi.org/10.1016/j.jpdc.2019.07.007

Digital Library

[6]

Gustaf Ahdritz et al.2022. OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. bioRxiv (2022), 45 pages. https://doi.org/10.1101/2022.11.20.517210

[7]

John Jumper et al.2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589. https://doi.org/10.1038/s41586-021-03819-2

[8]

Keren Bergman et al.2008. Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep 15 (2008), 181.

[9]

Nathan Baker et al.2019. Workshop Report on Basic Research Needs for Scientific Machine Learning: Core Technologies for Artificial Intelligence. (2 2019). https://doi.org/10.2172/1478744

[10]

Steven Farrell et al.2021. MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems. CoRR abs/2110.11466 (2021), 15 pages. arXiv:2110.11466https://arxiv.org/abs/2110.11466

[11]

Steven Martin et al.2018. How to write a plugin to export job, power, energy, and system environmental data from your Cray® XC™ system. Concurrency and Computation: Practice and Experience 30, 1 (2018), e4299.

[12]

Thorsten Kurth et al.2018. Exascale Deep Learning for Climate Analytics. arxiv:1810.01993 [cs.DC]

[13]

Yijia Zhang et al.2022. HPC Data Center Participation in Demand Response: An Adaptive Policy With QoS Assurance. IEEE Transactions on Sustainable Computing 7, 1 (2022), 157–171. https://doi.org/10.1109/TSUSC.2021.3077254

[14]

Zhengji Zhao et al.[n. d.]. VASP Performance on Cray EX Based on NVIDIA A100 GPUs and AMD Milan CPUs. https://drive.google.com/file/d/1kPFNc-y0ezn_ANatYDpE04x603U-gxlL/view

[15]

Steven Gottlieb. 2011. MILC. Springer US, Boston, MA, 1130–1140. https://doi.org/10.1007/978-0-387-09766-4_109

[16]

Jürgen Hafner. 2008. Ab-initio simulations of materials using VASP: Density-functional theory and beyond. Journal of Computational Chemistry 29, 13 (2008), 2044–2078. https://doi.org/10.1002/jcc.21057 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.21057

[17]

LLNL. [n. d.]. ASCR@40: Highlights and Impacts of ASCR’s Programs. https://computing.llnl.gov/misc/[email protected]

[18]

NERSC. 2023. CosmoFlow Dataset. NERSC. https://portal.nersc.gov/project/m3363/

[19]

NERSC. 2023. NERSC-10 Benchmark Suite. NERSC. https://gitlab.com/NERSC/N10-benchmarks/

[20]

OpenFold. 2021. OpenProteinSet. https://registry.opendata.aws/openfold/

[21]

Sridutt Bhalachandra. 2023. Perlmutter OMNI Analysis. NERSC. https://gitlab.com/NERSC/perlmutter-omni-analysis

[22]

Top500. 2023. Green500. https://www.top500.org/lists/green500/

Cited By

Yue AYew PMehta S(2025)EVeREST: An Effective and Versatile Runtime Energy Saving Tool for GPUsProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710875(57-69)Online publication date: 28-Feb-2025
https://dl.acm.org/doi/10.1145/3710848.3710875
Rrapaj EBhalachandra SZhao ZAustin BNam HWright N(2024)Power Consumption Trends in Supercomputers: A Study of NERSC's Cori and Perlmutter MachinesISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528943(1-10)Online publication date: May-2024
https://doi.org/10.23919/ISC.2024.10528943
Zhao ZAustin BRrapaj EWright N(2024)Understanding VASP Power Profiles on NVIDIA A100 GPUsProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00189(1496-1505)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00189
Show More Cited By

Index Terms

Index terms have been assigned to the content through auto-classification.

Recommendations

Information theoretic models for signatures in VLSI power delivery systems
WESS '14: Proceedings of the 9th Workshop on Embedded Systems Security

We propose several abstract models for power distribution systems (PDSs) and show how the physics of PDNs pose limits on our ability to protect against power attacks. Integrated circuits increasingly use integrated voltage regulators (IVRs) to condition ...
Power Tuning HPC Jobs on Power-Constrained Systems
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

As we approach the exascale era, power has become a primary bottleneck. The US Department of Energy has set a power constraint of 20MW on each exascale machine. To be able achieve one exaflop under this constraint, it is necessary that we use power ...
An Accurate Power Analysis Model Based on MAC Layer for the DCF of 802.11n
ISPA '10: Proceedings of the International Symposium on Parallel and Distributed Processing with Applications

802.11 Wireless Local Area Network (WLAN) technology is now common in power sensitive devices like smart phones and personal digital assistants (PDAs). In this article, we present an accurate power consumption model based on the Bianchi model [13] and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SC-W '23: Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis

November 2023

2180 pages

ISBN:9798400707858

DOI:10.1145/3624062

Copyright © 2023 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Department of Energy

Conference

SC-W 2023

SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

November 12 - 17, 2023

CO, Denver, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
276
Total Downloads

Downloads (Last 12 months)225
Downloads (Last 6 weeks)51

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yue AYew PMehta S(2025)EVeREST: An Effective and Versatile Runtime Energy Saving Tool for GPUsProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3710848.3710875(57-69)Online publication date: 28-Feb-2025
https://dl.acm.org/doi/10.1145/3710848.3710875
Rrapaj EBhalachandra SZhao ZAustin BNam HWright N(2024)Power Consumption Trends in Supercomputers: A Study of NERSC's Cori and Perlmutter MachinesISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528943(1-10)Online publication date: May-2024
https://doi.org/10.23919/ISC.2024.10528943
Zhao ZAustin BRrapaj EWright N(2024)Understanding VASP Power Profiles on NVIDIA A100 GPUsProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00189(1496-1505)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00189
Newkirk AHanus NPayne C(2024)Expert and operator perspectives on barriers to energy efficiency in data centersEnergy Efficiency10.1007/s12053-024-10244-717:6Online publication date: 17-Jul-2024
https://doi.org/10.1007/s12053-024-10244-7

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten