Loading [a11y]/accessibility-menu.js
Application of Comprehensive Data Analysis for Interactive, Hierarchical Views of HPC Workloads | IEEE Conference Publication | IEEE Xplore

Application of Comprehensive Data Analysis for Interactive, Hierarchical Views of HPC Workloads


Abstract:

Alongside advancements in computer related technologies, High Performance Computing (HPC) systems continue to grow in both complexity and scale. As the compute capabiliti...Show More

Abstract:

Alongside advancements in computer related technologies, High Performance Computing (HPC) systems continue to grow in both complexity and scale. As the compute capabilities and space efficiency of these powerful machines continue to improve, there is a correlated increase in complexity which results in increased acquisition costs, node failures, and operational costs. In an effort to address these growing concerns, there have been attempts to improve cost efficiency through the use of data analysis and data monitoring. Facilities use data analysis to understand causes of degraded performance, causes of failure, and requirements for future acquisitions. This information is often obtained through ad-hoc programs. Data monitoring, in turn, is used by HPC facility managers to detect node failures in real-time and decrease downtime, thereby minimizing the impact of failures on operational costs. In this paper we present an application to ingest, store, analyze and display this data at the scale of HPC. Our approach brings monitoring, alerting, ad-hoc analysis, and exploratory analysis into a single integrated solution. Beyond providing support for existing diagnostic data, the analysis pipeline makes it simple to link diagnostic data between multiple data sources. With this linking capability and the features available in the data analysis software stack, the user is able to create interactive, hierarchical views of diagnostic data.
Date of Conference: 10-13 December 2018
Date Added to IEEE Xplore: 24 January 2019
ISBN Information:
Conference Location: Seattle, WA, USA

Contact IEEE to Subscribe

References

References is not available for this document.