ABSTRACT
Large scale distributed computing setups rely on power management systems to enforce tight power budgets. Existing systems use a central authority that redistributes excess power to power-hungry nodes. This central authority, however, is both a single point of failure and a critical bottleneck—especially at large scale. To address these limitations we propose Penelope, a distributed power management system which shifts power through peer-to-peer transactions, ensuring that it remains robust in faulty environments and at large scale. We implement Penelope and compare its achieved performance to SLURM, a centralized power manager, under a variety of power budgets. We find that under normal conditions SLURM and Penelope achieve almost equivalent performance; however in faulty environments, Penelope achieves 8–15% mean application performance gains over SLURM. At large scale and with increasing frequency of messages, Penelope maintains its performance in contrast to centralized approaches which degrade and become unusable.
- Dong H. Ahn, Ned Bass, Albert Chu, Jim Garlick, Mark Grondona, Stephen Herbein, Helgi I. Ingólfsson, Joseph Koning, Tapasya Patki, Thomas R.W. Scogland, Becky Springmeyer, and Michela Taufer. 2020. Flux: Overcoming scheduling challenges for exascale workflows. Future Generation Computer Systems 110 (2020), 202–213. https://doi.org/10.1016/j.future.2020.04.006Google ScholarCross Ref
- Peter E Bailey, Aniruddha Marathe, David K Lowenthal, Barry Rountree, and Martin Schulz. 2015. Finding the limits of power-constrained application performance. In SC. ACM, Austin Texas, 1–12. https://doi.org/10.1145/2807591.2807637Google ScholarDigital Library
- Pete Beckman, Ron Brightwell, Maya Gokhale, Bronis R. de Supinski, Steven Hofmeyr, Sriram Krishnamoorthy, Mike Lang, Barney Maccabe, John Shalf, and Marc Snir. 2012. Exascale Operating Systems and Runtime Software Report. (12 2012). https://doi.org/10.2172/1471119Google ScholarCross Ref
- [4] NAS Parallel Benchmark.[n.d.]. https://www.nas.nasa.gov/publications/npb.html.Google Scholar
- Keren Bergman, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Kerry Hill, Jon Hiller, 2008. Exascale computing study: Technology challenges in achieving exascale systems. DARPA IPTO, Tech. Rep 15(2008).Google Scholar
- Stephanie Brink, Matthew Larsen, Hank Childs, and Barry Rountree. 2021. Evaluating adaptive and predictive power management strategies for optimizing visualization performance on supercomputers. Parallel Comput. 104-105(2021), 102782. https://doi.org/10.1016/j.parco.2021.102782Google Scholar
- Rolando Brondolin, Marco Arnaboldi, and Marco D. Santambrogio. 2020. Power Consumption Management under a Low-Level Performance Constraint in the Xen Hypervisor. SIGBED Rev. 17, 1 (July 2020), 42–48. https://doi.org/10.1145/3412821.3412828Google ScholarDigital Library
- Ramon Canal, Carles Hernandez, Rafa Tornero, Alessandro Cilardo, Giuseppe Massari, Federico Reghenzani, William Fornaciari, Marina Zapater, David Atienza, Ariel Oleksiak, Wojciech Piundefinedtek, and Jaume Abella. 2020. Predictive Reliability and Fault Management in Exascale Systems: State of the Art and Perspectives. ACM Comput. Surv. 53, 5, Article 95 (Sept. 2020), 32 pages. https://doi.org/10.1145/3403956Google ScholarDigital Library
- J Chen, Alok Choudhary, S Feldman, B Hendrickson, CR Johnson, R Mount, V Sarkar, V White, and D Williams. 2013. Synergistic Challenges in Data-Intensive Science and Exascale Computing: DOE ASCAC Data Subcommittee Report. Department of Energy Office of Science. Type: Report.Google Scholar
- Jian Chen and Lizy Kurian John. 2011. Predictive coordination of multiple on-chip resources for chip multiprocessors. In ICS ’11. ACM Press, Tucson, Arizona, USA, 192–201. https://doi.org/10.1145/1995896.1995927Google ScholarDigital Library
- Anwesha Das, Frank Mueller, and Barry Rountree. 2020. Aarohi: Making Real-Time Node Failure Prediction Feasible. In 2020 IPDPS. 1092–1101. https://doi.org/10.1109/IPDPS47924.2020.00115Google Scholar
- H. David, E. Gorbatov, U. R. Hanebutte, R. Khanna, and C. Le. 2010. RAPL: Memory power estimation and capping. In 2010 ACM/IEEE ISLPED. 189–194. https://doi.org/10.1145/1840845.1840883Google ScholarDigital Library
- Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F Wenisch, and Ricardo Bianchini. 2012. CoScale: Coordinating CPU and memory system DVFS in server systems. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 143–154. https://doi.org/10.1109/MICRO.2012.22Google ScholarDigital Library
- Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F Wenisch, and Ricardo Bianchini. 2012. MultiScale: memory system DVFS with multiple memory controllers. In ISLPED ’12. ACM Press, Redondo Beach, California, USA, 297–302. https://doi.org/10.1145/2333660.2333727Google ScholarDigital Library
- Bruno Diniz, Dorgival Guedes, Wagner Meira Jr, and Ricardo Bianchini. 2007. Limiting the power consumption of main memory. In ISCA ’07. ACM Press, San Diego, California, USA, 290–301. https://doi.org/10.1145/1250662.1250699Google ScholarDigital Library
- Daniel Ellsworth, Tapasya Patki, Martin Schulz, Barry Rountree, and Allen Malony. 2017. Simulating Power Scheduling at Scale(E2SC’17). Association for Computing Machinery, New York, NY, USA, Article 2, 8 pages. https://doi.org/10.1145/3149412.3149414Google ScholarDigital Library
- Daniel A Ellsworth, Allen D Malony, Barry Rountree, and Martin Schulz. 2015. Dynamic power sharing for higher job throughput. In SC’15. IEEE, 1–11. https://doi.org/10.1145/2807591.2807643Google ScholarDigital Library
- Daniel A Ellsworth, Allen D Malony, Barry Rountree, and Martin Schulz. 2015. POW: System-wide Dynamic Reallocation of Limited Power in HPC. In HPDC. ACM, Portland Oregon USA, 145–148. https://doi.org/10.1145/2749246.2749277Google ScholarDigital Library
- Keiichiro Fukazawa, Masatsugu Ueda, Mutsumi Aoyagi, Tomonori Tsuhata, Kyohei Yoshida, Aruta Uehara, Masakazu Kuze, Yuichi Inadomi, and Koji Inoue. 2014. Power consumption evaluation of an mhd simulation with cpu power capping. In 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 612–617. https://doi.org/10.1109/CCGrid.2014.47Google ScholarDigital Library
- Neha Gholkar, Frank Mueller, and Barry Rountree. 2016. Power tuning HPC jobs on power-constrained systems. In 2016 PACT. IEEE, 179–190. https://doi.org/10.1145/2967938.2967961Google ScholarDigital Library
- Neha Gholkar, Frank Mueller, and Barry Rountree. 2019. Uncore Power Scavenger: A Runtime for Uncore Power Conservation on HPC Systems(SC ’19). Association for Computing Machinery, New York, NY, USA, Article 27, 23 pages. https://doi.org/10.1145/3295500.3356150Google ScholarDigital Library
- Neha Gholkar, Frank Mueller, Barry Rountree, and Aniruddha Marathe. 2018. PShifter: feedback-based dynamic power shifting within HPC jobs for performance. In HPDC. ACM, Tempe Arizona, 106–117. https://doi.org/10.1145/3208040.3208047Google ScholarDigital Library
- Henry Hoffmann, Jim Holt, George Kurian, Eric Lau, Martina Maggio, Jason E Miller, Sabrina M Neuman, Mahmut Sinangil, Yildiz Sinangil, Anant Agarwal, 2012. Self-aware computing in the Angstrom processor. In DAC ’12. ACM Press, 259–264. https://doi.org/10.1145/2228360.2228409Google ScholarDigital Library
- Henry Hoffmann and Martina Maggio. 2014. PCP: A Generalized Approach to Optimizing Performance Under Power Constraints through Resource Management. In ICAC ’14. 241–247.Google Scholar
- Connor Imes and Henry Hoffmann. 2016. Bard: A unified framework for managing soft timing and power constraints. In AMOS. IEEE, 31–38. https://doi.org/10.1109/SAMOS.2016.7818328Google Scholar
- Connor Imes, Steven Hofmeyr, and Henry Hoffmann. 2018. Energy-efficient Application Resource Scheduling using Machine Learning Classifiers. In Proceedings of the 47th International Conference on Parallel Processing. ACM, Eugene OR USA, 1–11. https://doi.org/10.1145/3225058.3225088Google ScholarDigital Library
- David E Keyes, Lois C McInnes, Carol Woodward, William Gropp, Eric Myra, Michael Pernice, John Bell, Jed Brown, Alain Clo, Jeffrey Connors, 2013. Multiphysics simulations: Challenges and opportunities. The International Journal of High Performance Computing Applications 27, 1(2013), 4–83. https://doi.org/10.1177/1094342012468181 arXiv:https://doi.org/10.1177/1094342012468181Google ScholarDigital Library
- Mohammed G Khatib and Zvonimir Bandic. 2016. PCAP: Performance-aware Power Capping for the Disk Drive in the Cloud. In FAST. USENIX Association, Santa Clara, CA, 227–240. https://www.usenix.org/conference/fast16/technical-sessions/presentation/khatibGoogle Scholar
- Charles Lefurgy, Xiaorui Wang, and Malcolm Ware. 2008. Power capping: a prelude to power shifting. Cluster Computing 11, 2 (June 2008), 183–195. https://doi.org/10.1007/s10586-007-0045-4Google ScholarDigital Library
- Matthias Maiterth, Torsten Wilde, David Lowenthal, Barry Rountree, Martin Schulz, Jonathan Eastep, and Dieter Kranzlmüller. 2017. Power Aware High Performance Computing: Challenges and Opportunities for Application and System Developers — Survey Tutorial. In HPCS. 3–10. https://doi.org/10.1109/HPCS.2017.11Google Scholar
- Tapasya Patki, Zachary Frye, Harsh Bhatia, Francesco Di Natale, James Glosli, Helgi Ingolfsson, and Barry Rountree. 2019. Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow. In WORKS. 31–39. https://doi.org/10.1109/WORKS49585.2019.00009Google Scholar
- Tapasya Patki, Zachary Frye, Harsh Bhatia, Francesco Di Natale, James Glosli, Helgi Ingolfsson, and Barry Rountree. 2019. Comparing GPU Power and Frequency Capping: A Case Study with the MuMMI Workflow. In WORKS. IEEE, 31–39.Google Scholar
- Tapasya Patki, David K Lowenthal, Barry Rountree, Martin Schulz, and Bronis R De Supinski. 2013. Exploring hardware overprovisioning in power-constrained, high performance computing. In ICS. ACM Press, 173–182. https://doi.org/10.1145/2464996.2465009Google ScholarDigital Library
- Tapasya Patki, David K Lowenthal, Anjana Sasidharan, Matthias Maiterth, Barry L Rountree, Martin Schulz, and Bronis R De Supinski. 2015. Practical Resource Management in Power-Constrained, High Performance Computing. In HPDC. ACM, Portland Oregon USA, 121–132. https://doi.org/10.1145/2749246.2749262Google ScholarDigital Library
- Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No ”power” struggles: coordinated multi-level power management for the data center. ACM SIGARCH Computer Architecture News 36, 1 (March 2008), 48–59. https://doi.org/10.1145/1353534.1346289Google ScholarDigital Library
- Haris Ribic and Yu David Liu. 2016. AEQUITAS: Coordinated Energy Management Across Parallel Applications. In ICS. ACM, Istanbul Turkey, 1–12. https://doi.org/10.1145/2925426.2926260Google ScholarDigital Library
- Barry Rountree, Dong H Ahn, Bronis R De Supinski, David K Lowenthal, and Martin Schulz. 2012. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 947–953. https://doi.org/10.1109/IPDPSW.2012.116Google ScholarDigital Library
- Ryuichi Sakamoto, Tapasya Patki, Thang Cao, Masaaki Kondo, Koji Inoue, Masatsugu Ueda, Daniel Ellsworth, Barry Rountree, and Martin Schulz. 2018. Analyzing Resource Trade-offs in Hardware Overprovisioned Supercomputers. In 2018 IPDPS. 526–535. https://doi.org/10.1109/IPDPS.2018.00062Google Scholar
- Ahmed Salem, Theodoros Salonidis, Nirmit Desai, and Tamer Nadeem. 2017. Kinaara: Distributed discovery and allocation of mobile edge resources. In MASS. IEEE, 153–161. https://doi.org/10.1109/MASS.2017.10Google Scholar
- Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant Kale. 2014. Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget. In SC ’14. IEEE, 807–818. https://doi.org/10.1109/SC.2014.71Google ScholarDigital Library
- Osman Sarood, Akhil Langer, Laxmikant Kalé, Barry Rountree, and Bronis De Supinski. 2013. Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems. In CLUSTER. IEEE, 1–8. https://doi.org/10.1109/CLUSTER.2013.6702684Google ScholarCross Ref
- Lee Savoie, David K. Lowenthal, Bronis R. De Supinski, Tanzima Islam, Kathryn Mohror, Barry Rountree, and Martin Schulz. 2016. I/O Aware Power Shifting. In IPDPS. IEEE, Chicago, IL, 740–749. https://doi.org/10.1109/IPDPS.2016.15Google Scholar
- SLURM. [n.d.]. The SLURM Workload Manager. https://slurm.schedmd.com.Google Scholar
- Giacomo Tanganelli, Carlo Vallati, and Enzo Mingozzi. 2017. Edge-Centric Distributed Discovery and Access in the Internet of Things. IEEE Internet of Things Journal 5, 1 (2017), 425–438. https://doi.org/10.1109/JIOT.2017.2767381Google ScholarCross Ref
- ExaOSR Team. [n.d.]. Key Challenges for Exascale OS/R. https://collab.cels.anl.gov/display/exaosr/Challenges.Google Scholar
- Andy B Yoo, Morris A Jette, and Mark Grondona. 2003. Slurm: Simple linux utility for resource management. In Job Scheduling Strategies for Parallel Processing, Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 44–60.Google Scholar
- Javad Zarrin, Rui L Aguiar, and João Paulo Barraca. 2018. Resource discovery for distributed computing systems: A comprehensive survey. J. Parallel and Distrib. Comput. 113 (2018), 127–166. https://doi.org/10.1016/j.jpdc.2017.11.010Google ScholarDigital Library
- Huazhe Zhang. [n.d.]. A quantitative evaluation of the RAPL power control system. ([n. d.]).Google Scholar
- Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. ACM SIGPLAN Notices 51, 4 (June 2016), 545–559. https://doi.org/10.1145/2954679.2872375Google ScholarDigital Library
- Huazhe Zhang and Henry Hoffmann. 2018. Performance & Energy Tradeoffs for Dependent Distributed Applications Under System-wide Power Caps. In ICPP. ACM, Eugene OR USA, 1–11. https://doi.org/10.1145/3225058.3225098Google ScholarDigital Library
- Huazhe Zhang and Henry Hoffmann. 2019. PoDD: power-capping dependent distributed applications. In SC. ACM, Denver Colorado, 1–23. https://doi.org/10.1145/3295500.3356174Google ScholarDigital Library
Index Terms
- Penelope: Peer-to-peer Power Management
Recommendations
Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsPower and thermal dissipation constrain multicore performance scaling. Modern processors are built such that they could sustain damaging levels of power dissipation, creating a need for systems that can implement processor power caps. A particular ...
Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques
ASPLOS '16Power and thermal dissipation constrain multicore performance scaling. Modern processors are built such that they could sustain damaging levels of power dissipation, creating a need for systems that can implement processor power caps. A particular ...
Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques
ASPLOS'16Power and thermal dissipation constrain multicore performance scaling. Modern processors are built such that they could sustain damaging levels of power dissipation, creating a need for systems that can implement processor power caps. A particular ...
Comments