research-article

Long-Term Analysis for Job Characteristics on the Supercomputer

Authors:

Guolong XingAuthors Info & Claims

HPCCT '21: Proceedings of the 2021 5th High Performance Computing and Cluster Technologies Conference

Pages 1 - 12

https://doi.org/10.1145/3497737.3497738

Published: 23 December 2021 Publication History

Abstract

A deep understanding of the job characteristics and their impacts on the high performance computing system is one of the most critical steps for efficiently planning its design, development and optimization. However, frequent and regular characterization studies are insufficient in many HPC systems, which might make the study done by the system researchers inconsistent with the actual system features and application characteristics, and ultimately lead to the failure of the proposed strategy. Our study in this paper tries to bridge the gap by performing long-term analysis for job characteristics on a petascale ARM supercomputer, in this way, we get many meaningful findings and insights, which we believe can benefit the co-design of hardware and applications, and improve performance and experience of the job submitters in the HPC system.

References

[1]

Patel T, Liu Z, Kettimuthu R, Job Characteristics on Large-Scale Systems: Long-Term Analysis, Quantification and Implications[C]//2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE Computer Society, 2020: 1186-1202.

[2]

Yu J, Yang W, Wang F, Spatially Bursty I/O on Supercomputers: Causes, Impacts and Solutions[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31(12): 2908-2922.

Digital Library

[3]

CFD. Computational fluid dynamics sites. https://en.wikipedia.org/wiki/Computational_fluid_dynamics.

[4]

Yoo A B, Jette M A, Grondona M. Slurm: Simple linux utility for resource management[C]//Workshop on job scheduling strategies for parallel processing. Springer, Berlin, Heidelberg, 2003: 44-60.

[5]

Nie B, Tiwari D, Gupta S, A large-scale study of soft-errors on GPUs in the field[C]//2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2016: 519-530.

[6]

Rodrigo G P, Östberg P O, Elmroth E, Towards understanding HPC users and systems: a NERSC case study[J]. Journal of Parallel and Distributed Computing, 2018, 111: 206-221.

[7]

Simakov N A, White J P, DeLeon R L, A Workload Analysis of NSF's Innovative HPC Resources Using XDMoD[J]. arXiv preprint arXiv:1801.04306, 2018.

[8]

Chen Y, Ganapathi A S, Griffith R, Analysis and lessons from a publicly available google cluster trace[J]. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2010-95, 2010, 94.

[9]

Attig N, Gibbon P, Lippert T. Trends in supercomputing: The European path to exascale[J]. Computer Physics Communications, 2011, 182(9): 2041-2046.

[10]

Stegailov V V, Norman H E. Challenges to the supercomputer development in Russia: a HPC user perspective[J]. Program Systems: Theory and Applications, 2014, 5(1): 111-152.

[11]

van Zon R, Ponce M, Spence E, Trends in Demand, Growth, and Breadth in Scientific Computing Training Delivered by a High-Performance Computing Center[J]. arXiv preprint arXiv:1901.05520, 2019.

[12]

Chunduri S, Parker S, Balaji P, Characterization of mpi usage on a production supercomputer[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2018: 386-400.

[13]

Lim S H, Sim H, Gunasekaran R, Scientific user behavior and data-sharing trends in a petascale file system[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2017: 1-12.

[14]

Liu Y, Liu Z, Kettimuthu R, Data transfer between scientific facilities–bottleneck analysis, insights and optimizations[C]//2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, 2019: 122-131.

[15]

Liu Z, Kettimuthu R, Foster I T, A Comprehensive Study of Wide Area Data Movement at a Scientific Computing Facility[C]//ICDCS. 2018: 1604-1611.

[16]

Liu Z, Kettimuthu R, Foster I, Cross-geography scientific data transferring trends and behavior[C]//Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. 2018: 267-278.

[17]

Lockwood G K, Snyder S, Wang T, A year in the life of a parallel file system[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2018: 931-943.

[18]

Patel T, Byna S, Lockwood G K, Revisiting I/O behavior in large-scale storage systems: The expected and the unexpected[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2019: 1-13.

[19]

Cortez E, Bonde A, Muzio A, Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms[C]//Proceedings of the 26th Symposium on Operating Systems Principles. 2017: 153-167.

[20]

Guo J, Chang Z, Wang S, Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces[C]//2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS). IEEE, 2019: 1-10.

Digital Library

[21]

Jiang C, Han G, Lin J, Characteristics of co-allocated online services and batch jobs in internet data centers: a case study from Alibaba cloud[J]. IEEE Access, 2019, 7: 22495-22508.

Cited By

Solórzano ASato KYamamoto KShoji FBrandt JSchwaller BWalton SGreen JTiwari D(2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00030

Recommendations

A study on combinational effects of job and resource characteristics on energy consumption

By becoming more popular and complex, HPC systems like computational grids, clusters, clouds and the supporting data centers are now changed to remarkable energy consumers. A wide variety of researches, ranging from power-aware hardware design to ...
Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions
Abstract
Nowadays, high-performance computing (HPC) clusters are increasingly popular. Large volumes of job logs recording many years of operation traces have been accumulated. In the same time, the HPC cloud makes it possible to access HPC services ...
An Edge Service for Managing HPC Workflows
HUST'17: Proceedings of the Fourth International Workshop on HPC User Support Tools

Large experimental collaborations, such as those at the Large Hadron Collider at CERN, have developed large job management systems running hundreds of thousands of jobs across worldwide computing grids. HPC facilities are becoming more important to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HPCCT '21: Proceedings of the 2021 5th High Performance Computing and Cluster Technologies Conference

July 2021

58 pages

ISBN:9781450390132

DOI:10.1145/3497737

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 December 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

HPCCT 2021

HPCCT 2021: 2021 5th High Performance Computing and Cluster Technologies Conference

July 2 - 4, 2021

Qingdao, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
70
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Solórzano ASato KYamamoto KShoji FBrandt JSchwaller BWalton SGreen JTiwari D(2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00030

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten