skip to main content
10.1145/3545008.3545047acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Penelope: Peer-to-peer Power Management

Published:13 January 2023Publication History

ABSTRACT

Large scale distributed computing setups rely on power management systems to enforce tight power budgets. Existing systems use a central authority that redistributes excess power to power-hungry nodes. This central authority, however, is both a single point of failure and a critical bottleneck—especially at large scale. To address these limitations we propose Penelope, a distributed power management system which shifts power through peer-to-peer transactions, ensuring that it remains robust in faulty environments and at large scale. We implement Penelope and compare its achieved performance to SLURM, a centralized power manager, under a variety of power budgets. We find that under normal conditions SLURM and Penelope achieve almost equivalent performance; however in faulty environments, Penelope achieves 8–15% mean application performance gains over SLURM. At large scale and with increasing frequency of messages, Penelope maintains its performance in contrast to centralized approaches which degrade and become unusable.

References

  1. Dong H. Ahn, Ned Bass, Albert Chu, Jim Garlick, Mark Grondona, Stephen Herbein, Helgi I. Ingólfsson, Joseph Koning, Tapasya Patki, Thomas R.W. Scogland, Becky Springmeyer, and Michela Taufer. 2020. Flux: Overcoming scheduling challenges for exascale workflows. Future Generation Computer Systems 110 (2020), 202–213. https://doi.org/10.1016/j.future.2020.04.006Google ScholarGoogle ScholarCross RefCross Ref
  2. Peter E Bailey, Aniruddha Marathe, David K Lowenthal, Barry Rountree, and Martin Schulz. 2015. Finding the limits of power-constrained application performance. In SC. ACM, Austin Texas, 1–12. https://doi.org/10.1145/2807591.2807637Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Pete Beckman, Ron Brightwell, Maya Gokhale, Bronis R. de Supinski, Steven Hofmeyr, Sriram Krishnamoorthy, Mike Lang, Barney Maccabe, John Shalf, and Marc Snir. 2012. Exascale Operating Systems and Runtime Software Report. (12 2012). https://doi.org/10.2172/1471119Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] NAS Parallel Benchmark.[n.d.]. https://www.nas.nasa.gov/publications/npb.html.Google ScholarGoogle Scholar
  5. Keren Bergman, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Kerry Hill, Jon Hiller, 2008. Exascale computing study: Technology challenges in achieving exascale systems. DARPA IPTO, Tech. Rep 15(2008).Google ScholarGoogle Scholar
  6. Stephanie Brink, Matthew Larsen, Hank Childs, and Barry Rountree. 2021. Evaluating adaptive and predictive power management strategies for optimizing visualization performance on supercomputers. Parallel Comput. 104-105(2021), 102782. https://doi.org/10.1016/j.parco.2021.102782Google ScholarGoogle Scholar
  7. Rolando Brondolin, Marco Arnaboldi, and Marco D. Santambrogio. 2020. Power Consumption Management under a Low-Level Performance Constraint in the Xen Hypervisor. SIGBED Rev. 17, 1 (July 2020), 42–48. https://doi.org/10.1145/3412821.3412828Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ramon Canal, Carles Hernandez, Rafa Tornero, Alessandro Cilardo, Giuseppe Massari, Federico Reghenzani, William Fornaciari, Marina Zapater, David Atienza, Ariel Oleksiak, Wojciech Piundefinedtek, and Jaume Abella. 2020. Predictive Reliability and Fault Management in Exascale Systems: State of the Art and Perspectives. ACM Comput. Surv. 53, 5, Article 95 (Sept. 2020), 32 pages. https://doi.org/10.1145/3403956Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J Chen, Alok Choudhary, S Feldman, B Hendrickson, CR Johnson, R Mount, V Sarkar, V White, and D Williams. 2013. Synergistic Challenges in Data-Intensive Science and Exascale Computing: DOE ASCAC Data Subcommittee Report. Department of Energy Office of Science. Type: Report.Google ScholarGoogle Scholar
  10. Jian Chen and Lizy Kurian John. 2011. Predictive coordination of multiple on-chip resources for chip multiprocessors. In ICS ’11. ACM Press, Tucson, Arizona, USA, 192–201. https://doi.org/10.1145/1995896.1995927Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Anwesha Das, Frank Mueller, and Barry Rountree. 2020. Aarohi: Making Real-Time Node Failure Prediction Feasible. In 2020 IPDPS. 1092–1101. https://doi.org/10.1109/IPDPS47924.2020.00115Google ScholarGoogle Scholar
  12. H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. 2010. RAPL: Memory power estimation and capping. In 2010 ACM/IEEE ISLPED. 189–194. https://doi.org/10.1145/1840845.1840883Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F Wenisch, and Ricardo Bianchini. 2012. CoScale: Coordinating CPU and memory system DVFS in server systems. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 143–154. https://doi.org/10.1109/MICRO.2012.22Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F Wenisch, and Ricardo Bianchini. 2012. MultiScale: memory system DVFS with multiple memory controllers. In ISLPED ’12. ACM Press, Redondo Beach, California, USA, 297–302. https://doi.org/10.1145/2333660.2333727Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bruno Diniz, Dorgival Guedes, Wagner Meira Jr, and Ricardo Bianchini. 2007. Limiting the power consumption of main memory. In ISCA ’07. ACM Press, San Diego, California, USA, 290–301. https://doi.org/10.1145/1250662.1250699Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Daniel Ellsworth, Tapasya Patki, Martin Schulz, Barry Rountree, and Allen Malony. 2017. Simulating Power Scheduling at Scale(E2SC’17). Association for Computing Machinery, New York, NY, USA, Article 2, 8 pages. https://doi.org/10.1145/3149412.3149414Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Daniel A Ellsworth, Allen D Malony, Barry Rountree, and Martin Schulz. 2015. Dynamic power sharing for higher job throughput. In SC’15. IEEE, 1–11. https://doi.org/10.1145/2807591.2807643Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Daniel A Ellsworth, Allen D Malony, Barry Rountree, and Martin Schulz. 2015. POW: System-wide Dynamic Reallocation of Limited Power in HPC. In HPDC. ACM, Portland Oregon USA, 145–148. https://doi.org/10.1145/2749246.2749277Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Keiichiro Fukazawa, Masatsugu Ueda, Mutsumi Aoyagi, Tomonori Tsuhata, Kyohei Yoshida, Aruta Uehara, Masakazu Kuze, Yuichi Inadomi, and Koji Inoue. 2014. Power consumption evaluation of an mhd simulation with cpu power capping. In 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 612–617. https://doi.org/10.1109/CCGrid.2014.47Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power tuning HPC jobs on power-constrained systems. In 2016 PACT. IEEE, 179–190. https://doi.org/10.1145/2967938.2967961Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Neha Gholkar, Frank Mueller, and Barry Rountree. 2019. Uncore Power Scavenger: A Runtime for Uncore Power Conservation on HPC Systems(SC ’19). Association for Computing Machinery, New York, NY, USA, Article 27, 23 pages. https://doi.org/10.1145/3295500.3356150Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Neha Gholkar, Frank Mueller, Barry Rountree, and Aniruddha Marathe. 2018. PShifter: feedback-based dynamic power shifting within HPC jobs for performance. In HPDC. ACM, Tempe Arizona, 106–117. https://doi.org/10.1145/3208040.3208047Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Henry Hoffmann, Jim Holt, George Kurian, Eric Lau, Martina Maggio, Jason E Miller, Sabrina M Neuman, Mahmut Sinangil, Yildiz Sinangil, Anant Agarwal, 2012. Self-aware computing in the Angstrom processor. In DAC ’12. ACM Press, 259–264. https://doi.org/10.1145/2228360.2228409Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Henry Hoffmann and Martina Maggio. 2014. PCP: A Generalized Approach to Optimizing Performance Under Power Constraints through Resource Management. In ICAC ’14. 241–247.Google ScholarGoogle Scholar
  25. Connor Imes and Henry Hoffmann. 2016. Bard: A unified framework for managing soft timing and power constraints. In AMOS. IEEE, 31–38. https://doi.org/10.1109/SAMOS.2016.7818328Google ScholarGoogle Scholar
  26. Connor Imes, Steven Hofmeyr, and Henry Hoffmann. 2018. Energy-efficient Application Resource Scheduling using Machine Learning Classifiers. In Proceedings of the 47th International Conference on Parallel Processing. ACM, Eugene OR USA, 1–11. https://doi.org/10.1145/3225058.3225088Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. David E Keyes, Lois C McInnes, Carol Woodward, William Gropp, Eric Myra, Michael Pernice, John Bell, Jed Brown, Alain Clo, Jeffrey Connors, 2013. Multiphysics simulations: Challenges and opportunities. The International Journal of High Performance Computing Applications 27, 1(2013), 4–83. https://doi.org/10.1177/1094342012468181 arXiv:https://doi.org/10.1177/1094342012468181Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mohammed G Khatib and Zvonimir Bandic. 2016. PCAP: Performance-aware Power Capping for the Disk Drive in the Cloud. In FAST. USENIX Association, Santa Clara, CA, 227–240. https://www.usenix.org/conference/fast16/technical-sessions/presentation/khatibGoogle ScholarGoogle Scholar
  29. Charles Lefurgy, Xiaorui Wang, and Malcolm Ware. 2008. Power capping: a prelude to power shifting. Cluster Computing 11, 2 (June 2008), 183–195. https://doi.org/10.1007/s10586-007-0045-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Matthias Maiterth, Torsten Wilde, David Lowenthal, Barry Rountree, Martin Schulz, Jonathan Eastep, and Dieter Kranzlmüller. 2017. Power Aware High Performance Computing: Challenges and Opportunities for Application and System Developers — Survey Tutorial. In HPCS. 3–10. https://doi.org/10.1109/HPCS.2017.11Google ScholarGoogle Scholar
  31. Tapasya Patki, Zachary Frye, Harsh Bhatia, Francesco Di Natale, James Glosli, Helgi Ingolfsson, and Barry Rountree. 2019. Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow. In WORKS. 31–39. https://doi.org/10.1109/WORKS49585.2019.00009Google ScholarGoogle Scholar
  32. Tapasya Patki, Zachary Frye, Harsh Bhatia, Francesco Di Natale, James Glosli, Helgi Ingolfsson, and Barry Rountree. 2019. Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow. In WORKS. IEEE, 31–39.Google ScholarGoogle Scholar
  33. Tapasya Patki, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R De Supinski. 2013. Exploring hardware overprovisioning in power-constrained, high performance computing. In ICS. ACM Press, 173–182. https://doi.org/10.1145/2464996.2465009Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tapasya Patki, David K Lowenthal, Anjana Sasidharan, Matthias Maiterth, Barry L Rountree, Martin Schulz, and Bronis R De Supinski. 2015. Practical Resource Management in Power-Constrained, High Performance Computing. In HPDC. ACM, Portland Oregon USA, 121–132. https://doi.org/10.1145/2749246.2749262Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No ”power” struggles: coordinated multi-level power management for the data center. ACM SIGARCH Computer Architecture News 36, 1 (March 2008), 48–59. https://doi.org/10.1145/1353534.1346289Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Haris Ribic and Yu David Liu. 2016. AEQUITAS: Coordinated Energy Management Across Parallel Applications. In ICS. ACM, Istanbul Turkey, 1–12. https://doi.org/10.1145/2925426.2926260Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Barry Rountree, Dong H Ahn, Bronis R De Supinski, David K Lowenthal, and Martin Schulz. 2012. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 947–953. https://doi.org/10.1109/IPDPSW.2012.116Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ryuichi Sakamoto, Tapasya Patki, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Daniel Ellsworth, Barry Rountree, and Martin Schulz. 2018. Analyzing Resource Trade-offs in Hardware Overprovisioned Supercomputers. In 2018 IPDPS. 526–535. https://doi.org/10.1109/IPDPS.2018.00062Google ScholarGoogle Scholar
  39. Ahmed Salem, Theodoros Salonidis, Nirmit Desai, and Tamer Nadeem. 2017. Kinaara: Distributed discovery and allocation of mobile edge resources. In MASS. IEEE, 153–161. https://doi.org/10.1109/MASS.2017.10Google ScholarGoogle Scholar
  40. Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. 2014. Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget. In SC ’14. IEEE, 807–818. https://doi.org/10.1109/SC.2014.71Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Osman Sarood, Akhil Langer, Laxmikant Kalé, Barry Rountree, and Bronis De Supinski. 2013. Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems. In CLUSTER. IEEE, 1–8. https://doi.org/10.1109/CLUSTER.2013.6702684Google ScholarGoogle ScholarCross RefCross Ref
  42. Lee Savoie, David K. Lowenthal, Bronis R. De Supinski, Tanzima Islam, Kathryn Mohror, Barry Rountree, and Martin Schulz. 2016. I/O Aware Power Shifting. In IPDPS. IEEE, Chicago, IL, 740–749. https://doi.org/10.1109/IPDPS.2016.15Google ScholarGoogle Scholar
  43. SLURM. [n.d.]. The SLURM Workload Manager. https://slurm.schedmd.com.Google ScholarGoogle Scholar
  44. Giacomo Tanganelli, Carlo Vallati, and Enzo Mingozzi. 2017. Edge-Centric Distributed Discovery and Access in the Internet of Things. IEEE Internet of Things Journal 5, 1 (2017), 425–438. https://doi.org/10.1109/JIOT.2017.2767381Google ScholarGoogle ScholarCross RefCross Ref
  45. ExaOSR Team. [n.d.]. Key Challenges for Exascale OS/R. https://collab.cels.anl.gov/display/exaosr/Challenges.Google ScholarGoogle Scholar
  46. Andy B Yoo, Morris A Jette, and Mark Grondona. 2003. Slurm: Simple linux utility for resource management. In Job Scheduling Strategies for Parallel Processing, Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 44–60.Google ScholarGoogle Scholar
  47. Javad Zarrin, Rui L Aguiar, and João Paulo Barraca. 2018. Resource discovery for distributed computing systems: A comprehensive survey. J. Parallel and Distrib. Comput. 113 (2018), 127–166. https://doi.org/10.1016/j.jpdc.2017.11.010Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Huazhe Zhang. [n.d.]. A quantitative evaluation of the RAPL power control system. ([n. d.]).Google ScholarGoogle Scholar
  49. Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. ACM SIGPLAN Notices 51, 4 (June 2016), 545–559. https://doi.org/10.1145/2954679.2872375Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Huazhe Zhang and Henry Hoffmann. 2018. Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps. In ICPP. ACM, Eugene OR USA, 1–11. https://doi.org/10.1145/3225058.3225098Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Huazhe Zhang and Henry Hoffmann. 2019. PoDD: power-capping dependent distributed applications. In SC. ACM, Denver Colorado, 1–23. https://doi.org/10.1145/3295500.3356174Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Penelope: Peer-to-peer Power Management

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICPP '22: Proceedings of the 51st International Conference on Parallel Processing
      August 2022
      976 pages
      ISBN:9781450397339
      DOI:10.1145/3545008

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 January 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate91of313submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format