skip to main content
10.1145/2907294.2907314acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
short-paper
Public Access

Consecutive Job Submission Behavior at Mira Supercomputer

Published: 31 May 2016 Publication History

Abstract

Understanding user behavior is crucial for the evaluation of scheduling and allocation performances in HPC environments. This paper aims to further understand the dynamic user reaction to different levels of system performance by performing a comprehensive analysis of user behavior in recorded data in the form of delays in the subsequent job submission behavior. Therefore, we characterize a workload trace covering one year of job submissions from the Mira supercomputer at ALCF (Argonne Leadership Computing Facility). We perform an in-depth analysis of correlations between job characteristics, system performance metrics, and the subsequent user behavior. Analysis results show that the user behavior is significantly influenced by long waiting times, and that complex jobs (number of nodes and CPU hours) lead to longer delays in subsequent job submissions.

References

[1]
P. Carns et al. Understanding and improving computational science storage access through continuous characterization. ACM TOS, 7(3), 2011.
[2]
D. G. Feitelson. Looking at data. In IEEE IPDPS, 2008.
[3]
D. G. Feitelson. Workload modeling for computer systems performance evaluation. Cambridge University Press, 2015.
[4]
D. G. Feitelson and A. W. Mu'alem. On the definition of on-line in job scheduling problems. ACM SIGACT News, 36(1), 2005.
[5]
D. G. Feitelson and E. Shmueli. A case for conservative workload modeling: Parallel job scheduling with daily cycles of activity. In IEEE MASCOTS, 2009.
[6]
R. Ferreira da Silva et al. Characterizing a high throughput computing workload: The compact muon solenoid (CMS) experiment at LHC. Procedia Computer Science, 51, 2015.
[7]
R. Ferreira da Silva and T. Glatard. A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executions. In Euro-Par. Springer, 2013.
[8]
A. Geist and D. A. Reed. A survey of high-perfor- mance computing scaling challenges. IJHPCA, 2015.
[9]
D. L. Hart. Measuring TeraGrid: workload characterization for a high-performance computing federation. IJHPCA, 25(4), 2011.
[10]
C. B. Lee and A. Snavely. On the user--scheduler dialogue: studies of user-provided runtime estimates and utility functions. IJHPCA, 20(4), 2006.
[11]
H. Luu et al. A multiplatform study of I/O behavior on petascale supercomputers. In ACM HPDC, 2015.
[12]
D. A. Reed and J. Dongarra. Exascale computing and big data. Communications of the ACM, 58(7), 2015.
[13]
J. Renker et al. Questionnaire for user habits of compute clusters (QUHCC). In HCI International. Springer, 2015.
[14]
G. P. Rodrigo Álvarez et al. HPC system lifetime story: Workload characterization and evolutionary analyses on NERSC systems. In ACM HPDC, 2015.
[15]
S. Schlagkamp. Influence of dynamic think times on parallel job scheduler performances in generative simulations. In JSSPP, 2015.
[16]
S. Schlagkamp and J. Renker. Acceptance of waiting times in high performance computing. In HCI International. Springer, 2015.
[17]
U. Schwiegelshohn. How to design a job scheduling algorithm. In JSSPP. Springer, 2014.
[18]
E. Shmueli and D. G. Feitelson. Using site-level modeling to evaluate the performance of parallel system schedulers. In IEEE MASCOTS, 2006.
[19]
E. Shmueli and D. G. Feitelson. On simulation and design of parallel-systems schedulers: are we doing the right thing? IEEE TPDS, 20(7), 2009.
[20]
N. Zakay and D. G. Feitelson. On identifying user session boundaries in parallel workload logs. In JSSPP. Springer, 2013.

Cited By

View all
  • (2024)HighP5: Programming using Partitioned Parallel Processing SpacesJournal of the Brazilian Computer Society10.5753/jbcs.2024.434530:1(653-687)Online publication date: 17-Dec-2024
  • (2024)Improving batch schedulers with node stealing for failed jobsConcurrency and Computation: Practice and Experience10.1002/cpe.804336:12Online publication date: 16-Feb-2024
  • (2023)Investigating HPC Job Resource Requests and Job Efficiency Reporting2023 22nd International Symposium on Parallel and Distributed Computing (ISPDC)10.1109/ISPDC59212.2023.00024(61-68)Online publication date: Jul-2023
  • Show More Cited By

Index Terms

  1. Consecutive Job Submission Behavior at Mira Supercomputer

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPDC '16: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing
    May 2016
    302 pages
    ISBN:9781450343145
    DOI:10.1145/2907294
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. performance modeling
    2. user behavior
    3. workload analysis

    Qualifiers

    • Short-paper

    Funding Sources

    • DOE
    • German Research Foundation

    Conference

    HPDC'16
    Sponsor:

    Acceptance Rates

    HPDC '16 Paper Acceptance Rate 20 of 129 submissions, 16%;
    Overall Acceptance Rate 166 of 966 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)65
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)HighP5: Programming using Partitioned Parallel Processing SpacesJournal of the Brazilian Computer Society10.5753/jbcs.2024.434530:1(653-687)Online publication date: 17-Dec-2024
    • (2024)Improving batch schedulers with node stealing for failed jobsConcurrency and Computation: Practice and Experience10.1002/cpe.804336:12Online publication date: 16-Feb-2024
    • (2023)Investigating HPC Job Resource Requests and Job Efficiency Reporting2023 22nd International Symposium on Parallel and Distributed Computing (ISPDC)10.1109/ISPDC59212.2023.00024(61-68)Online publication date: Jul-2023
    • (2022)AI-Enabling Workloads on Large-Scale GPU-Accelerated System: Characterization, Opportunities, and Implications2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00093(1224-1237)Online publication date: Apr-2022
    • (2022)Pseudonymization at Scale: OLCF’s Summit Usage Data Case Study2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020380(3432-3440)Online publication date: 17-Dec-2022
    • (2021)User-level Workload Analysis for SupercomputersProceedings of the 2021 4th International Conference on Software Engineering and Information Management10.1145/3451471.3451483(68-73)Online publication date: 16-Jan-2021
    • (2020)Job characteristics on large-scale systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433812(1-17)Online publication date: 9-Nov-2020
    • (2020)Job Characteristics on Large-Scale Systems: Long-Term Analysis, Quantification, and ImplicationsSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00088(1-17)Online publication date: Nov-2020
    • (2019)WatCacheThe Journal of Supercomputing10.1007/s11227-017-2167-775:2(554-586)Online publication date: 1-Feb-2019
    • (2018)Rethinking Node Allocation Strategy for Data-intensive Applications in Consideration of Spatially Bursty I/OProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205305(12-21)Online publication date: 12-Jun-2018
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media