Abstract
Recent High-Performance Computing (HPC) systems are facing important challenges, such as massive power consumption, while at the same time significantly under-utilized system resources. Given the power consumption trends, future systems will be deployed in an over-provisioned manner where more resources are installed than they can afford to power simultaneously. In such a scenario, maximizing resource utilization and energy efficiency, while keeping a given power constraint, is pivotal. Driven by this observation, in this position paper we first highlight the recent trends of resource management techniques, with a particular focus on malleability support (i.e., dynamically scaling resource allocations/requirements for a job), co-scheduling (i.e., co-locating multiple jobs within a node), and power management. Second, we consider putting them together, assess their relationships/synergies, and discuss the functionality requirements in each software component for future over-provisioned and power-constrained HPC systems. Third, we briefly introduce our ongoing efforts on the integration of software tools, which will ultimately lead to the convergence of malleability and power management, as it is designed in the HPC PowerStack initiative.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Deep-sea: Programming environment for european exascale systems. https://www.deep-projects.eu/, Accessed 25 Apr 2022
The hpc powerstack. https://hpcpowerstack.github.io/index.html, LNCS Accessed 25 Apr 2022
Regale: Open architecture for exascale supercomputers. https://regale-project.eu/, Accessed 25 Apr 2022
Top 500. https://www.top500.org/statistics/list/, Accessed 28 Feb 2022
Ahn, D.H., et al.: Flux: overcoming scheduling challenges for exascale workflows. Future Gener. Comput. Syst. 110, 202–213 (2020)
Aupy, G., et al.: Co-scheduling HPC workloads on cache-partitioned CMP platforms. In: CLUSTER, pp. 348–358 (2018)
Bartolini, A., et al.: A pulp-based parallel power controller for future exascale systems. In: ICECS, pp. 771–774 (2019)
Bhadauria, M., et al.: An approach to resource-aware co-scheduling for CMPs. In: ICS, pp. 189–199 (2010)
Borghesi, A., et al.: Examon-x: a predictive maintenance framework for automatic monitoring in industrial iot systems. IEEE Internet Things J. (2021)
Breitbart, J., et al.: Case study on co-scheduling for HPC applications. In: ICPPW, pp. 277–285 (2015)
Breitbart, J., et al.: Dynamic co-scheduling driven by main memory bandwidth utilization. In: CLUSTER, pp. 400–409 (2017)
Breslow, A.D., et al.: Enabling fair pricing on hpc systems with node sharing. In: SC (2013)
Capit, N., et al.: A batch scheduler with high level components. In: CCGrid, vol. 2, pp. 776–783 (2005)
Castain, R.H., et al.: Pmix: process management for exascale environments. Parallel Comput. 79, 9–29 (2018)
Cesarini, D., et al.: Countdown slack: a run-time library to reduce energy footprint in large-scale mpi applications. IEEE TPDS 31(11), 2696–2709 (2020)
Cochran, R., et al.: Pack & cap: adaptive dvfs and thread packing under power caps. In: MICRO, pp. 175–185 (2011)
Comprés, I., et al.: Infrastructure and api extensions for elastic execution of mpi applications, pp. 82–97. EuroMPI (2016)
Corbalan, J., et al.: EAR: energy management framework for supercomputers. In: Barcelona Supercomputing Center (BSC) Working paper (2019)
D’Amico, M., et al.: Holistic slowdown driven scheduling and resource management for malleable jobs. In: ICPP (2019)
Esmaeilzadeh, H., et al.: Dark silicon and the end of multicore scaling. In: ISCA, pp. 365–376 (2011)
Feitelson, D.G., et al.: Toward convergence in job schedulers for parallel supercomputers. In: JSSPP, pp. 1–26 (1996)
Hennessy, J., Patterson, D.: A new golden age for computer architecture: domain-specific hardware/software co-design, enhanced. In: ISCA (2018)
Kale, L.V., et al.: A malleable-job system for timeshared parallel machines. In: CCGRID, pp. 230–230 (2002)
Mo-Hellenbrand, A., et al.: A large-scale malleable tsunami simulation realized on an elastic mpi infrastructure. In: CF, pp. 271–274 (2017)
Netti, A., et al.: From facility to application sensor data: modular, continuous and holistic monitoring with dcdb. In: SC, pp. 1–27 (2019)
Patki, T., et al.: Exploring hardware overprovisioning in power-constrained, high performance computing. In: ICS, pp. 173–182 (2013)
Patki, T., et al.: Practical resource management in power-constrained, high performance computing. In: HPDC, pp. 121–132 (2015)
Sakamoto, R., et al.: Analyzing resource trade-offs in hardware overprovisioned supercomputers. In: IPDPS, pp. 526–535 (2018)
Sarood, O., et al.: Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In: SC, pp. 807–818 (2014)
Schreiber, M., et al.: Invasive compute balancing for applications with hybrid parallelization. In: SBAC-PAD, pp. 136–143 (2013)
Scogland, T.R., et al.: A power-measurement methodology for large-scale, high-performance computing. In: ICPE, pp. 149–159 (2014)
Shalf, J.: The future of computing beyond moore’s law. Phil. Trans. Roy. Soc. A 378(2166), 20190061 (2020)
Utrera, G., et al.: A job scheduling approach for multi-core clusters based on virtual malleability. In: Euro-Par, pp. 191–203 (2012)
Vigouroux, X., et al.: Towards energy consumption application profiling with bull energy software. https://prace-ri.eu/wp-content/uploads/PRACE-at-SC17-Ludovic-Sauge.pdf, Accessed 14 Mar 2022
Yoo, A.B., et al.: Slurm: simple linux utility for resource management. In: JSSPP, pp. 44–60 (2003)
Zhu, Q., et al.: Co-run scheduling with power cap on integrated CPU-GPU systems. In: IPDPS, pp. 967–977 (2017)
Acknowledgements
We would like to express our sincere gratitude to the anonymous reviewers for their constructive suggestions. This work has received funding under the European Commission’s EuroHPC and H2020 programmes under grant agreement no. 955606 and no. 956560.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Arima, E., Comprés, A.I., Schulz, M. (2022). On the Convergence of Malleability and the HPC PowerStack: Exploiting Dynamism in Over-Provisioned and Power-Constrained HPC Systems. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds) High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham. https://doi.org/10.1007/978-3-031-23220-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-23220-6_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23219-0
Online ISBN: 978-3-031-23220-6
eBook Packages: Computer ScienceComputer Science (R0)