skip to main content
10.1145/3497737.3497738acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcctConference Proceedingsconference-collections
research-article

Long-Term Analysis for Job Characteristics on the Supercomputer

Published: 23 December 2021 Publication History

Abstract

A deep understanding of the job characteristics and their impacts on the high performance computing system is one of the most critical steps for efficiently planning its design, development and optimization. However, frequent and regular characterization studies are insufficient in many HPC systems, which might make the study done by the system researchers inconsistent with the actual system features and application characteristics, and ultimately lead to the failure of the proposed strategy. Our study in this paper tries to bridge the gap by performing long-term analysis for job characteristics on a petascale ARM supercomputer, in this way, we get many meaningful findings and insights, which we believe can benefit the co-design of hardware and applications, and improve performance and experience of the job submitters in the HPC system.

References

[1]
Patel T, Liu Z, Kettimuthu R, Job Characteristics on Large-Scale Systems: Long-Term Analysis, Quantification and Implications[C]//2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE Computer Society, 2020: 1186-1202.
[2]
Yu J, Yang W, Wang F, Spatially Bursty I/O on Supercomputers: Causes, Impacts and Solutions[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31(12): 2908-2922.
[3]
CFD. Computational fluid dynamics sites. https://en.wikipedia.org/wiki/Computational_fluid_dynamics.
[4]
Yoo A B, Jette M A, Grondona M. Slurm: Simple linux utility for resource management[C]//Workshop on job scheduling strategies for parallel processing. Springer, Berlin, Heidelberg, 2003: 44-60.
[5]
Nie B, Tiwari D, Gupta S, A large-scale study of soft-errors on GPUs in the field[C]//2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2016: 519-530.
[6]
Rodrigo G P, Östberg P O, Elmroth E, Towards understanding HPC users and systems: a NERSC case study[J]. Journal of Parallel and Distributed Computing, 2018, 111: 206-221.
[7]
Simakov N A, White J P, DeLeon R L, A Workload Analysis of NSF's Innovative HPC Resources Using XDMoD[J]. arXiv preprint arXiv:1801.04306, 2018.
[8]
Chen Y, Ganapathi A S, Griffith R, Analysis and lessons from a publicly available google cluster trace[J]. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2010-95, 2010, 94.
[9]
Attig N, Gibbon P, Lippert T. Trends in supercomputing: The European path to exascale[J]. Computer Physics Communications, 2011, 182(9): 2041-2046.
[10]
Stegailov V V, Norman H E. Challenges to the supercomputer development in Russia: a HPC user perspective[J]. Program Systems: Theory and Applications, 2014, 5(1): 111-152.
[11]
van Zon R, Ponce M, Spence E, Trends in Demand, Growth, and Breadth in Scientific Computing Training Delivered by a High-Performance Computing Center[J]. arXiv preprint arXiv:1901.05520, 2019.
[12]
Chunduri S, Parker S, Balaji P, Characterization of mpi usage on a production supercomputer[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2018: 386-400.
[13]
Lim S H, Sim H, Gunasekaran R, Scientific user behavior and data-sharing trends in a petascale file system[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2017: 1-12.
[14]
Liu Y, Liu Z, Kettimuthu R, Data transfer between scientific facilities–bottleneck analysis, insights and optimizations[C]//2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE, 2019: 122-131.
[15]
Liu Z, Kettimuthu R, Foster I T, A Comprehensive Study of Wide Area Data Movement at a Scientific Computing Facility[C]//ICDCS. 2018: 1604-1611.
[16]
Liu Z, Kettimuthu R, Foster I, Cross-geography scientific data transferring trends and behavior[C]//Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing. 2018: 267-278.
[17]
Lockwood G K, Snyder S, Wang T, A year in the life of a parallel file system[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2018: 931-943.
[18]
Patel T, Byna S, Lockwood G K, Revisiting I/O behavior in large-scale storage systems: The expected and the unexpected[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2019: 1-13.
[19]
Cortez E, Bonde A, Muzio A, Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms[C]//Proceedings of the 26th Symposium on Operating Systems Principles. 2017: 153-167.
[20]
Guo J, Chang Z, Wang S, Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces[C]//2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS). IEEE, 2019: 1-10.
[21]
Jiang C, Han G, Lin J, Characteristics of co-allocated online services and batch jobs in internet data centers: a case study from Alibaba cloud[J]. IEEE Access, 2019, 7: 22495-22508.

Cited By

View all
  • (2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HPCCT '21: Proceedings of the 2021 5th High Performance Computing and Cluster Technologies Conference
July 2021
58 pages
ISBN:9781450390132
DOI:10.1145/3497737
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 December 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Findings and insights
  2. Job characteristics
  3. Supercomputer

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HPCCT 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media