research-article

Finding the limits of power-constrained application performance

Authors:

Peter E. Bailey,

Aniruddha Marathe,

David K. Lowenthal,

Barry Rountree,

Martin SchulzAuthors Info & Claims

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 79, Pages 1 - 12

https://doi.org/10.1145/2807591.2807637

Published: 15 November 2015 Publication History

Abstract

As we approach exascale systems, power is turning from an optimization goal to a critical operating constraint. With power bounds imposed by both stakeholders and the limitations of existing infrastructure, we need to develop new techniques that work with limited power to extract maximum performance. In this paper, we explore this area and provide an approach to find the theoretical upper bound of computational performance on a per-application basis in hybrid MPI + OpenMP applications.

We use a linear programming (LP) formulation to optimize application schedules under various power constraints, where a schedule consists of a DVFS state and number of OpenMP threads for each section of computation between consecutive MPI calls. We also provide a more flexible mixed integer-linear (ILP) formulation and show that the resulting schedules closely match schedules from the LP formulation. Across four applications, we use our LP-derived upper bounds to show that current approaches trail optimal, power-constrained performance by up to 41.1%. This demonstrates the untapped potential of current systems, and our LP formulation provides future optimization approaches with a quantitative optimization target.

References

[1]

Coral benchmark codes. https://asc.llnl.gov/CORAL-benchmarks. Accessed: 2015-01-13.

[2]

Comd. https://github.com/exmatex/CoMD, 2013.

[3]

C. Artigues, O. Koné, P. Lopez, and M. Mongeau. Mixed-integer linear programming formulations. In C. Schwindt and J. Zimmermann, editors, Handbook on Project Management and Scheduling Vol.1, International Handbooks on Information Systems, pages 17--41. Springer International Publishing, 2015.

[4]

D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, P. Frederickson, T. Lasinski, R. Schreiber, et al. The NAS parallel benchmarks summary and preliminary results. In Supercomputing, pages 158--165, 1991.

Digital Library

[5]

P. E. Bailey, D. K. Lowenthal, V. Ravi, B. Rountree, M. Schulz, and B. R. de Supinski. Adaptive configuration selection for power-constrained heterogeneous systems. In International Conference on Parallel Processing, volume 43, 2014.

Digital Library

[6]

K. J. Barker, D. J. Kerbyson, and E. Anger. On the feasibility of dynamic power steering. In Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, pages 60--69. IEEE Press, 2014.

Digital Library

[7]

R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: adaptive dvfs and thread packing under power caps. In Proceedings of the 44th annual IEEE/ACM international symposium on microarchitecture, pages 175--185. ACM, 2011.

Digital Library

[8]

M. Curtis-Maury, A. Shah, F. Blagojevic, D. Nikolopoulos, B. de Supinski, and M. Schulz. Prediction models for multi-dimensional power-performance optimization on many cores. In International Conference on Parallel Architectures and Compilation Techniques, 2008.

Digital Library

[9]

H. David, E. Gorbatov, U. Hanebutte, R. Khanna, and C. Le. RAPL: Memory power estimation and capping. In ACM/IEEE International Symposium on Low Power Electronics and Design, pages 189--194. ACM, 2010.

Digital Library

[10]

M. Etinski, J. Corbalan, J. Labarta, and M. Valero. Optimizing job performance under a given power constraint in hpc centers. In Green Computing Conference, 2010 International, pages 257--267. IEEE, 2010.

Digital Library

[11]

M. Etinski, J. Corbalan, J. Labarta, and M. Valero. Linear programming based parallel job scheduling for power constrained systems. In High Performance Computing and Simulation (HPCS), 2011 International Conference on, pages 72--80. IEEE, 2011.

[12]

M. Etinski, J. Corbalan, J. Labarta, M. Valero, and A. Veidenbaum. Power-aware load balancing of large scale mpi applications. In Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1--8. IEEE, 2009.

Digital Library

[13]

R. Ge, X. Feng, W. Feng, and K. W. Cameron. CPU Miser: A performance-directed, run-time system for power-aware clusters. In ICPP, 2007.

Digital Library

[14]

Intel. Intel-64 and IA-32 Architectures Software Developer's Manual, Volumes 3A and 3B: System Programming Guide, 2011.

[15]

C. Isci, A. Buyuktosunoglu, C. Cher, P. Bose, and M. Martonosi. An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget. In IEEE/ACM International Symposium on Microarchitecture, pages 347--358, 2006.

Digital Library

[16]

N. Kappiah, V. W. Freeh, and D. K. Lowenthal. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs. In Supercomputing, Nov. 2005.

Digital Library

[17]

I. Karlin, J. Keasler, and R. Neely. Lulesh 2.0 updates and changes. Technical Report LLNL-TR-641973, Lawrence Livermore National Laboratory, August 2013.

[18]

O. Koné, C. Artigues, P. Lopez, and M. Mongeau. Event-based milp models for resource-constrained project scheduling problems. Computers & Operations Research, 38(1):3--13, 2011.

Digital Library

[19]

D. Li, B. de Supinski, M. Schulz, K. Cameron, and D. Nikolopoulos. Hybrid MPI/OpenMP power-aware computing. In IEEE International Parallel and Distributed Processing Symposium, pages 1--12, 2010.

[20]

A. Marathe, P. E. Bailey, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. A run-time system for power-constrained HPC applications. In International Supercomputing Conference, 2015.

[21]

T. Patki, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. Exploring hardware overprovisioning in power-constrained, high performance computing. In Proceedings of the 27th international ACM conference on International conference on supercomputing, pages 173--182. ACM, 2013.

Digital Library

[22]

T. Patki, A. Sasidharan, M. Melarth, D. K. Lowenthal, B. Rountree, M. Schulz, and B. de Supinski. Practical resource management in power-constrained, high performance computing. In High-Performance Distributed Computing, June 2015.

Digital Library

[23]

B. Rountree, D. K. Lowenthal, B. de Supinski, M. Schulz, and V. W. Freeh. Adagio: Making DVS practical for complex HPC applications. In International Conference on Supercomputing, Yorktown Heights, N.Y., USA, June 2009.

Digital Library

[24]

B. Rountree, D. K. Lowenthal, S. Funk, V. W. Freeh, B. R. de Supinski, and M. Schulz. Bounding energy consumption in large-scale MPI programs. In Supercomputing, 2007. SC'07. Proceedings of the 2007 ACM/IEEE Conference on, pages 1--9. IEEE, 2007.

Digital Library

[25]

O. Sarood, A. Langer, A. Gupta, and L. Kale. Maximizing throughput of overprovisioned hpc data centers under a strict power budget. In Supercomputing, 2014.

Digital Library

[26]

O. Sarood, A. Langer, L. Kalé, B. Rountree, and B. De Supinski. Optimizing power allocation to cpu and memory subsystems in overprovisioned hpc systems. In CLUSTER, 2013.

[27]

J. Shalf, S. Dosanjh, and J. Morrison. Exascale computing technology challenges. In High Performance Computing for Computational Science--VECPAR 2010, pages 1--25. Springer, 2011.

Digital Library

[28]

R. F. vanderWijngaart and J. Haopiang. Nas parallel benchmarks, multi-zone versions. 2003.

Cited By

Zhang HNukada ALiao Q(2024)FCUFS: Core-Level Frequency Tuning for Energy Optimization on Intel Processors2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00026(214-225)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTER59578.2024.00026
Badri SSaini MGoel N(2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 20-Oct-2023
https://dl.acm.org/doi/10.1145/3629524
Srivastava TZhang HHoffmann H(2022)Penelope: Peer-to-peer Power ManagementProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545047(1-11)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545047
Show More Cited By

Index Terms

Finding the limits of power-constrained application performance

Recommendations

Nonconvex quadratically constrained quadratic programming: best D.C. decompositions and their SDP representations

We propose in this paper a general D.C. decomposition scheme for constructing SDP relaxation formulations for a class of nonconvex quadratic programs with a nonconvex quadratic objective function and convex quadratic constraints. More specifically, we ...
A tight compact quadratically constrained convex relaxation of the Optimal Power Flow problem
Abstract
In this paper, we consider the Optimal Power Flow (OPF) problem which consists in determining the power production at each bus of an electric network by minimizing the production cost. Our contribution is an exact solution algorithm for the OPF ...
Highlights
- Global Solution of Optimal Power Flow.
- Spatial branch-and-bound based on quadratically constrained quadratic convex programming.
- Rank-relaxation.
Power Tuning HPC Jobs on Power-Constrained Systems
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

As we approach the exascale era, power has become a primary bottleneck. The US Department of Energy has set a power constraint of 20MW on each exascale machine. To be able achieve one exaflop under this constraint, it is necessary that we use power ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2015

985 pages

ISBN:9781450337236

DOI:10.1145/2807591

General Chair:
Jackie Kern
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Jeffrey S. Vetter
Oak Ridge National Laboratory and Georgia Institute of Technology, Oak Ridge, Tennessee

Copyright © 2015 ACM.

© 2015 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

DOE ASCR
DOE LLNL

Conference

SC15

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 15 - 20, 2015

Texas, Austin

Acceptance Rates

SC '15 Paper Acceptance Rate 79 of 358 submissions, 22%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
376
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang HNukada ALiao Q(2024)FCUFS: Core-Level Frequency Tuning for Energy Optimization on Intel Processors2024 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER59578.2024.00026(214-225)Online publication date: 24-Sep-2024
https://doi.org/10.1109/CLUSTER59578.2024.00026
Badri SSaini MGoel N(2023)Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent ComputingACM Transactions on Architecture and Code Optimization10.1145/362952420:4(1-25)Online publication date: 20-Oct-2023
https://dl.acm.org/doi/10.1145/3629524
Srivastava TZhang HHoffmann H(2022)Penelope: Peer-to-peer Power ManagementProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545047(1-11)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545047
Coutinho Demetrios ADe Sensi DLorenzon AGeorgiou KNunez-Yanez JEder KXavier-de-Souza S(2020)Performance and Energy Trade-Offs for Parallel Applications on Heterogeneous Multi-Processing SystemsEnergies10.3390/en1309240913:9(2409)Online publication date: 11-May-2020
https://doi.org/10.3390/en13092409
Patel TWagenhauser AEibel CHonig TZeiser TTiwari D(2020)What does Power Consumption Behavior of HPC Jobs Reveal? : Demystifying, Quantifying, and Predicting Power Consumption Characteristics2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00087(799-809)Online publication date: May-2020
https://doi.org/10.1109/IPDPS47924.2020.00087
Endrei MJin CDinh MAbramson DPoxon HDeRose Lde Supinski B(2019)Statistical and machine learning models for optimizing energy in parallel applicationsThe International Journal of High Performance Computing Applications10.1177/1094342019842915(109434201984291)Online publication date: 25-Apr-2019
https://doi.org/10.1177/1094342019842915
Chasapis DMoretó MSchulz MRountree BValero MCasas MEigenmann RDing CMcKee S(2019)Power efficient job scheduling by predicting the impact of processor manufacturing variabilityProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330372(296-307)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330372
Zhang HHoffmann HTaufer MBalaji PPeña A(2019)PoDDProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356174(1-23)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356174
Coutinho DGeorgiou KEder KNunez-Yanez JXavier-de-Souza S(2019)Performance and Energy Efficiency Trade-Offs in Single-ISA Heterogeneous Multi-Processing for Parallel Applications2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC)10.1109/VLSI-SoC.2019.8920384(232-233)Online publication date: Oct-2019
https://doi.org/10.1109/VLSI-SoC.2019.8920384
Li YLefurgy CRajamani KAllen-Ware MSilva GHeimsoth DGhose SMutlu O(2019)A Scalable Priority-Aware Approach to Managing Data Center Server Power2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00067(701-714)Online publication date: Feb-2019
https://doi.org/10.1109/HPCA.2019.00067
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten