Loading [a11y]/accessibility-menu.js
Leveraging Comprehensive Data Analysis to Inform Parallel HPC Workloads | IEEE Conference Publication | IEEE Xplore

Leveraging Comprehensive Data Analysis to Inform Parallel HPC Workloads


Abstract:

Alongside advancements in computer related technologies, High Performance Computing (HPC) systems continue to grow in both complexity and scale. As the complexity of hard...Show More

Abstract:

Alongside advancements in computer related technologies, High Performance Computing (HPC) systems continue to grow in both complexity and scale. As the complexity of hardware components increases so too does the complexity of the software solutions that leverage those resources. It is often difficult or impossible to know whether a distributed application is performing as intended without applying a specific profiling application. Furthermore, errors and adverse performance can go unnoticed until the application is executed at scale. To provide a general mechanism that addresses these types of errors we implement a data analysis pipeline that ingests, stores, and indexes data. This data can then be analyzed and displayed for users to perform analysis at varying levels of granularity. To this end, we have developed multiple hierarchical views that enable workload analysis at varying levels of granularity. The overall approach applies existing tools and leverages monitoring, alerting, ad-hoc analysis, and exploratory analysis to inform parallel HPC workloads.
Date of Conference: 09-12 December 2019
Date Added to IEEE Xplore: 24 February 2020
ISBN Information:
Conference Location: Los Angeles, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.