Memory Usage Prediction of HPC Workloads Using Feature Engineering and Machine Learning

Published: 27 February 2023 Publication History


In High Performance Computing (HPC) systems, numerous applications of varying scale and domain are scheduled to run concurrently, and share the available CPU and memory capacities among themselves. Applications whose run-time memory usage are not known a priori, are commonly allocated with significantly higher amounts of memory than actually needed, which leads to poor resource utilization and performance degradation of the overall system. In this paper, we disseminate our experience of performing user analysis and prediction over a large-scale resource utilization dataset to tightly estimate the memory requirements of a wide variety of applications in the Titan supercomputer system. By coupling our engineered features with random forest and XGBoost supervised machine learning techniques, our models respectively predict the correct class of memory usage in 89% and 90% of the validation data. Furthermore, more than 98% of users have 95% or better average prediction accuracy within one class tolerance range of the actual memory usage.


        1. High Performance Computing
        2. Memory Allocation Prediction
        3. Random Forest
        4. Supervised Learning
        5. Workload Dataset
        6. XGBoost


        HPC ASIA 2023

