skip to main content
10.1145/3624062.3624262acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

An End-to-End HPC Framework for Dynamic Power Objectives

Published: 12 November 2023 Publication History

Abstract

High-Performance Computing (HPC) centers demand a lot of power, and continue to grow through the Exascale era. This work establishes the need for a multi-tiered, feedback-driven power management framework to follow dynamic power objectives while maximizing job performance, highlighting the need to respond to external factors (e.g., power constraints), and internal factors (e.g., performance variation). We present a practical implementation of this framework on a real-world cluster in addition to conducting simulations for larger data centers. We accurately track a moving power target for demand response while reacting to incomplete or inaccurate prior knowledge about job power and performance properties. We demonstrate that online performance feedback from a job runtime enables a cluster power management policy to recover most of the performance degradation introduced by job-type misclassification.

Supplemental Material

MP4 File
Recording of "An End-to-End HPC Framework for Dynamic Power Objectives" presentation at SHiPS 2023.
MP4 File - Conference presentation recording
Recording of "An End-to-End HPC Framework for Dynamic Power Objectives" presentation at The 1st International Workshop on the Environmental Sustainability of High-Performance Software (SHiPS)

References

[1]
Eishi Arima, A. Isaías Comprés, and Martin Schulz. 2022. On the Convergence of Malleability and the HPC PowerStack: Exploiting Dynamism in Over-Provisioned and Power-Constrained HPC Systems. In High Performance Computing. ISC High Performance 2022 International Workshops, Hartwig Anzt, Amanda Bienz, Piotr Luszczek, and Marc Baboulin (Eds.). Springer International Publishing, Cham, 206–217.
[2]
Christopher Cantalupo, Jonathan Eastep, Siddhartha Jana, Masaaki Kondo, Matthias Maiterth, Aniruddha Marathe, Tapasya Patki, Barry Rountree, Ryuichi Sakamoto, Martin Schulz, and Carsten Trinitis. 2018. A Strawman for an HPC PowerStack. (8 2018). https://doi.org/10.2172/1466153
[3]
Hao Chen, Yijia Zhang, Michael C. Caramanis, and Ayse K. Coskun. 2019. EnergyQARE: QoS-Aware Data Center Participation in Smart Grid Regulation Service Reserve Provision. ACM Trans. Model. Perform. Eval. Comput. Syst. 4, 1, Article 2 (jan 2019), 31 pages. https://doi.org/10.1145/3243172
[4]
Shuang Chen, Angela Jin, Christina Delimitrou, and José F. Martínez. 2022. ReTail: Opting for Learning Simplicity to Enable QoS-Aware Power Management in the Cloud. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, Seoul, Korea, 155–168. https://doi.org/10.1109/HPCA53966.2022.00020
[5]
Anders Clausen, Gregory Koenig, Sonja Klingert, Girish Ghatikar, Peter M. Schwartz, and Natalie Bates. 2019. An Analysis of Contracts and Relationships between Supercomputing Centers and Electricity Service Providers. In Workshop Proceedings of the 48th International Conference on Parallel Processing (Kyoto, Japan) (ICPP Workshops ’19). Association for Computing Machinery, New York, NY, USA, Article 4, 8 pages. https://doi.org/10.1145/3339186.3339209
[6]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices 49, 4 (2014), 127–144.
[7]
Jonathan Eastep, Steve Sylvester, Christopher Cantalupo, Brad Geltz, Federico Ardanaz, Asma Al-Rawi, Kelly Livingston, Fuat Keceli, Matthias Maiterth, and Siddhartha Jana. 2017. Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration on Co-Designed Energy Management Solutions. In High Performance Computing, Julian M. Kunkel, Rio Yokota, Pavan Balaji, and David Keyes (Eds.). Springer International Publishing, Cham, 394–412.
[8]
Daniel A. Ellsworth, Allen D. Malony, Barry Rountree, and Martin Schulz. 2015. POW: System-wide dynamic reallocation of limited power in HPC. HPDC 2015 - Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (2015), 145–148. https://doi.org/10.1145/2749246.2749277
[9]
Neha Gholkar, Frank Mueller, Barry Rountree, and Aniruddha Marathe. 2018. PShifter: Feedback-Based Dynamic Power Shifting within HPC Jobs for Performance. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (Tempe, Arizona) (HPDC ’18). Association for Computing Machinery, New York, NY, USA, 106–117. https://doi.org/10.1145/3208040.3208047
[10]
IEA 2022. Demand Response. IEA. Retrieved March 14, 2023 from https://www.iea.org/reports/demand-response
[11]
Ali Jahanshahi, Nanpeng Yu, and Daniel Wong. 2022. PowerMorph: QoS-aware server power reshaping for data center regulation service. ACM Transactions on Architecture and Code Optimization (TACO) 19, 3 (2022), 1–27.
[12]
Bran Knowles. 2021. ACM TechBrief: Computing and Climate Change. ACM Technology Policy Council (Nov. 2021).
[13]
Jacklin Kwan. 2022. Climate change threatens supercomputers. Science (New York, NY) 378, 6616 (2022), 124–124. https://www.science.org/doi/10.1126/science.adf2882
[14]
Zhenhua Liu, Yuan Chen, Cullen Bash, Adam Wierman, Daniel Gmach, Zhikui Wang, Manish Marwah, and Chris Hyser. 2012. Renewable and Cooling Aware Workload Management for Sustainable Data Centers, In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems. SIGMETRICS Perform. Eval. Rev. 40, 1, 175–186. https://doi.org/10.1145/2318857.2254779
[15]
Zhenhua Liu, Adam Wierman, Yuan Chen, Benjamin Razon, and Niangjun Chen. 2013. Data center demand response: Avoiding the coincident peak via workload shifting and local generation. In Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems. Elsevier, New York, NY, USA, 341–342.
[16]
nasa.gov. 2022. NAS Parallel Benchmarks. https://www.nas.nasa.gov/software/npb.html.
[17]
Tirthak Patel, Adam Wagenhäuser, Christopher Eibel, Timo Hönig, Thomas Zeiser, and Devesh Tiwari. 2020. What does power consumption behavior of hpc jobs reveal?: Demystifying, quantifying, and predicting power consumption characteristics. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, IEEE, New Orleans, LA, USA, 799–809.
[18]
Tapasya Patki, David K. Lowenthal, Anjana Sasidharan, Matthias Maiterth, Barry L. Rountree, Martin Schulz, and Bronis R. de Supinski. 2015. Practical Resource Management in Power-Constrained, High Performance Computing. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (Portland, Oregon, USA) (HPDC ’15). Association for Computing Machinery, New York, NY, USA, 121–132. https://doi.org/10.1145/2749246.2749262
[19]
Issa Saba, Eishi Arima, Dai Liu, and Martin Schulz. 2022. Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning. In Architecture of Computing Systems, Martin Schulz, Carsten Trinitis, Nikela Papadopoulou, and Thilo Pionteck (Eds.). Springer International Publishing, Cham, 51–67.
[20]
Théo Saillant, Jean-Christophe Weill, and Mathilde Mougeot. 2020. Predicting job power consumption based on rjms submission data in hpc systems. In High Performance Computing: 35th International Conference, ISC High Performance 2020, Frankfurt/Main, Germany, June 22–25, 2020, Proceedings 35. Springer, Springer, Cham, Frankfurt/Main, Germany, 63–82.
[21]
Varun Sakalkar, Vasileios Kontorinis, David Landhuis, Shaohong Li, Darren De Ronde, Thomas Blooming, Anand Ramesh, James Kennedy, Christopher Malone, Jimmy Clidaras, 2020. Data center power oversubscription with a medium voltage power plane and priority-aware capping. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, Lausanne, Switzerland, 497–511.
[22]
Slurm Power Management Guide 2018. Slurm Power Management Guide. https://slurm.schedmd.com/power_mgmt.html. [Online; accessed 2022-06-13].
[23]
TOP500 List - June 2023 2023. TOP500 List - June 2023. https://www.top500.org/lists/top500/list/2023/06/.
[24]
Daniel C. Wilson, Asma H. Al-rawi, Lowren H. Lawson, Siddhartha Jana, Federico Ardanaz, Jonathan M. Eastep, and Ayse K. Coskun. 2022. Guiding Hardware-Driven Turbo with Application Performance Awareness. In 2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC). 1–8. https://doi.org/10.1109/IGSC55832.2022.9969356
[25]
Daniel C. Wilson, Siddhartha Jana, Aniruddha Marathe, Stephanie Brink, Christopher M. Cantalupo, Diana R. Guttman, Brad Geltz, Lowren H. Lawson, Asma H. Al-rawi, Ali Mohammad, Fuat Keceli, Federico Ardanaz, Jonathan M. Eastep, and Ayse K. Coskun. 2021. Introducing Application Awareness Into a Unified Power Management Stack. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 320–329. https://doi.org/10.1109/IPDPS49936.2021.00040
[26]
Qiang Wu, Qingyuan Deng, Lakshmi Ganesh, Chang-Hong Hsu, Yun Jin, Sanjeev Kumar, Bin Li, Justin Meza, and Yee Jiun Song. 2016. Dynamo: Facebook’s data center-wide power management system. ACM SIGARCH Computer Architecture News 44, 3 (2016), 469–480.
[27]
Xingfu Wu, Aniruddha Marathe, Siddhartha Jana, Ondrej Vysocky, Jophin John, Andrea Bartolini, Lubomir Riha, Michael Gerndt, Valerie Taylor, and Sridutt Bhalachandra. 2020. Toward an End-to-End Auto-tuning Framework in HPC PowerStack. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). 473–483. https://doi.org/10.1109/CLUSTER49012.2020.00068
[28]
Yijia Zhang, Daniel C. Wilson, Ioannis Ch. Paschalidis, and Ayse K. Coskun. 2021. A Data Center Demand Response Policy for Real-World Workload Scenarios in HPC. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, Grenoble, France, 282–287. https://doi.org/10.23919/DATE51398.2021.9474075
[29]
Yijia Zhang, Daniel Curtis Wilson, Ioannis Ch. Paschalidis, and Ayse K. Coskun. 2022. HPC Data Center Participation in Demand Response: An Adaptive Policy With QoS Assurance. IEEE Transactions on Sustainable Computing 7, 1 (2022), 157–171. https://doi.org/10.1109/TSUSC.2021.3077254
[30]
Jiajia Zheng, Andrew A Chien, and Sangwon Suh. 2020. Mitigating curtailment and carbon emissions through load migration between data centers. Joule 4, 10 (2020), 2208–2222.

Cited By

View all
  • (2024)Impact of Data Centers on Power Consumption, Climate Change, and SustainabilityComputational Intelligence for Green Cloud Computing and Digital Waste Management10.4018/979-8-3693-1552-1.ch004(60-83)Online publication date: 27-Feb-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SC-W '23: Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis
November 2023
2180 pages
ISBN:9798400707858
DOI:10.1145/3624062
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data centers
  2. demand response
  3. high performance computing
  4. power management
  5. quality of service

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SC-W 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)130
  • Downloads (Last 6 weeks)24
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Impact of Data Centers on Power Consumption, Climate Change, and SustainabilityComputational Intelligence for Green Cloud Computing and Digital Waste Management10.4018/979-8-3693-1552-1.ch004(60-83)Online publication date: 27-Feb-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media