Abstract:
High-Performance Computing (HPC) is indispensable in the current technological era. To preserve the development of innovative ideas and technologies, supercomputers must ...Show MoreMetadata
Abstract:
High-Performance Computing (HPC) is indispensable in the current technological era. To preserve the development of innovative ideas and technologies, supercomputers must continue to grow in size. Consequently, the importance of the interconnection network that connects the computing resources, increases. The communication demands of today’s workloads can cause bottlenecks in the supercomputer’s network, resulting in the utilization of only 3% of their available computing power. This inefficiency presents an opportunity to enhance system efficiency through network-oriented resource management.In this paper, we outline the limitations of state-of-the-art resource management strategies and identify the challenges associated with solving the underutilization in large-scale HPC systems. We advocate that flexibility and direct control over resources are the fundamental principles for overcoming these challenges. To this end, we present our research strategy focussing on designing closed-loop resource allocation strategies. Opposed to existing strategies, our approach adaptively reacts to the dynamic behavior of the system through elastic allocation, ensuring optimal resource allocation.
Date of Conference: 24-28 June 2024
Date Added to IEEE Xplore: 10 July 2024
ISBN Information: