research-article

Techniques for modeling large-scale HPC I/O workloads

Authors:
Shane Snyder

Argonne National Laboratory, Argonne, IL

Argonne National Laboratory, Argonne, IL
View Profile

,
Philip Carns

Argonne National Laboratory, Argonne, IL

Argonne National Laboratory, Argonne, IL
View Profile

,
Robert Latham

Argonne National Laboratory, Argonne, IL

Argonne National Laboratory, Argonne, IL
View Profile

,
Misbah Mubarak

Argonne National Laboratory, Argonne, IL

Argonne National Laboratory, Argonne, IL
View Profile

,
Robert Ross

Argonne National Laboratory, Argonne, IL

Argonne National Laboratory, Argonne, IL
View Profile

,
Christopher Carothers

Rensselaer Polytechnic Institute, Troy, NY

Rensselaer Polytechnic Institute, Troy, NY
View Profile

,
Babak Behzad

University of Illinois at Urbana-Champaign, Urbana, IL

University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

,
Huong Vu Thanh Luu

University of Illinois at Urbana-Champaign, Urbana, IL

University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

,
Surendra Byna

Lawrence Berkeley National Laboratory, Berkeley, CA

Lawrence Berkeley National Laboratory, Berkeley, CA
View Profile

,
Prabhat

Lawrence Berkeley National Laboratory, Berkeley, CA

Lawrence Berkeley National Laboratory, Berkeley, CA
View Profile

PMBS '15: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing SystemsNovember 2015Article No.: 5Pages 1–11https://doi.org/10.1145/2832087.2832091

Published:15 November 2015Publication History

PMBS '15: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems

Pages 1–11

ABSTRACT

Accurate analysis of HPC storage system designs is contingent on the use of I/O workloads that are truly representative of expected use. However, I/O analyses are generally bound to specific workload modeling techniques such as synthetic benchmarks or trace replay mechanisms, despite the fact that no single workload modeling technique is appropriate for all use cases. In this work, we present the design of IOWA, a novel I/O workload abstraction that allows arbitrary workload consumer components to obtain I/O workloads from a range of diverse input sources. Thus, researchers can choose specific I/O workload generators based on the resources they have available and the type of evaluation they wish to perform. As part of this research, we also outline the design of three distinct workload generation methods, based on I/O traces, synthetic I/O kernels, and I/O characterizations. We analyze and contrast each of these workload generation techniques in the context of storage system simulation models as well as production storage system measurements. We found that each generator mechanism offers varying levels of accuracy, flexibility, and breadth of use that should be considered before performing I/O analyses. We also recommend a set of best practices for HPC I/O workload modeling based on challenges that we encountered while performing our evaluation.

References

mdtest benchmark. http://sourceforge.net/projects/mdtest/, 2015.Google Scholar
A. Adelmann, R. Ryne, J. Shalf, and C. Siegerist. H5Part: A portable high performance parallel data interface for particle simulations. In Particle Accelerator Conference, 2005. PAC 2005. Proceedings of the, pages 4129--4131. IEEE, 2005.Google ScholarCross Ref
D. W. Bauer Jr, C. D. Carothers, and A. Holder. Scalable time warp on blue gene supercomputers. In Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation, pages 35--44. IEEE Computer Society, 2009. Google ScholarDigital Library
K. J. Bowers, B. Albright, L. Yin, B. Bergen, and T. Kwan. Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation. Physics of Plasmas (1994-present), 15(5):055703, 2008.Google Scholar
S. Byna, Y. Chen, X.-H. Sun, R. Thakur, and W. Gropp. Parallel I/O prefetching using MPI file caching and I/O signatures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 44. IEEE Press, 2008. Google ScholarDigital Library
D. Capps and W. Norcott. IOzone filesystem benchmark. http://www.iozone.org/.Google Scholar
P. Carns. ALCF I/O data repository. Technical Report ANL/ALCF/TM-13/1, Argonne National Laboratory (ANL), 2013.Google Scholar
P. Carns, K. Harms, W. Allcock, C. Bacon, S. Lang, R. Latham, and R. Ross. Understanding and improving computational science storage access through continuous characterization. ACM Transactions on Storage (TOS), 7(3):8, 2011. Google ScholarDigital Library
P. Carns, R. Latham, R. Ross, K. Iskra, S. Lang, and K. Riley. 24/7 characterization of petascale I/O workloads. In Proceedings of 2009 Workshop on Interfaces and Architectures for Scientific Data Storage, September 2009.Google ScholarCross Ref
P. Carns, Y. Yao, K. Harms, R. Latham, R. B. Ross, and K. Antypas. Production I/O characterization on the Cray XE6. In In Proceedings of the Cray User Group meeting 2013 (CUG 2013), 2013.Google Scholar
C. D. Carothers, D. Bauer, and S. Pearce. Ross: A high-performance, low-memory, modular time warp system. Journal of Parallel and Distributed Computing, 62(11):1648--1669, 2002.Google ScholarDigital Library
C. D. Carothers, K. S. Perumalla, and R. M. Fujimoto. Efficient optimistic parallel simulations using reverse computation. ACM Transactions on Modeling and Computer Simulation (TOMACS), 9(3):224--253, 1999. Google ScholarDigital Library
J. Cope, N. Liu, S. Lang, P. Carns, C. Carothers, and R. Ross. Codes: Enabling co-design of multilayer exascale storage architectures. In Proceedings of the Workshop on Emerging Supercomputing Technologies, 2011.Google Scholar
P. E. Crandall, R. A. Aydt, A. A. Chien, and D. A. Reed. Input/output characteristics of scalable parallel applications. In Proceedings of the 1995 ACM/IEEE conference on Supercomputing, page 59. ACM, 1995. Google ScholarDigital Library
Department of Energy. CORAL. http://asc.llnl.gov/CORAL-benchmarks/, 2015.Google Scholar
S. Eidenbenz, M. Erazo, T. Li, and J. Liu. Toward comprehensive and accurate simulation performance prediction of parallel file systems. Technical report, Los Alamos National Laboratory (LANL), 2011.Google Scholar
S. Godard. Sysstat utilities home page. http://sebastien.godard.pagesperso-orange.fr/, 2015.Google Scholar
W. He, D. H. Du, and S. B. Narasimhamurthy. PIONEER: A solution to parallel I/O workload characterization and generation. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, pages 111--120. IEEE, 2015.Google ScholarDigital Library
M. Heroux and R. Barrett. Mantevo project. https://mantevo.org/, 2015.Google Scholar
W.-I. Kao and R. K. Iyer. A user-oriented synthetic workload generator. In Distributed Computing Systems, 1992., Proceedings of the 12th International Conference on, pages 270--277. IEEE, 1992.Google ScholarCross Ref
Y. Kim, R. Gunasekaran, G. M. Shipman, D. A. Dillow, Z. Zhang, and B. W. Settlemyer. Workload characterization of a leadership class storage cluster. In 5th Petascale Data Storage Workshop (PDSW), pages 1--5. IEEE, 2010.Google ScholarCross Ref
J. Kunkel. HDTrace -- a tracing and simulation environment of application and system interaction. Hamburg. University of Hamburg-2011, 2011.Google Scholar
Z. Kurmas, K. Keeton, and K. Mackenzie. Synthesizing representative I/O workloads using iterative distillation. In Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003. 11th IEEE/ACM International Symposium on, pages 6--15. IEEE, 2003.Google Scholar
Lawrence Livermore National Laboratory. IOR benchmark. https://github.com/chaos/ior, 2015.Google Scholar
Lawrence Livermore National Laboratory. Lustre Monitoring Tool (Github). https://github.com/chaos/lmt, 2015.Google Scholar
N. Liu, C. Carothers, J. Cope, P. Carns, R. Ross, A. Crume, and C. Maltzahn. Modeling a leadership-scale storage system. In Parallel Processing and Applied Mathematics, pages 10--19. Springer, 2012. Google ScholarDigital Library
N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the role of burst buffers in leadership-class storage systems. In Proceedings of 28th IEEE MSST conference, 2012.Google ScholarCross Ref
Y. Liu, R. Figueiredo, D. Clavijo, Y. Xu, and M. Zhao. Towards simulation of parallel file system scheduling algorithms with PFSsim. In Proceedings of the 7th IEEE International Workshop on Storage Network Architectures and Parallel I/O (May 2011), 2011.Google Scholar
H. Luu, B. Behzad, R. Aydt, and M. Winslett. A multi-level approach for understanding I/O activity in HPC applications. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on, pages 1--5, Sept 2013.Google ScholarCross Ref
S. Méndez, D. Rexachs, and E. Luque. Modeling parallel scientific applications through their input/output phases. In Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on, pages 7--15. IEEE, 2012. Google ScholarDigital Library
M. P. Mesnier, M. Wachs, R. R. Sambasivan, J. Lopez, J. Hendricks, G. R. Ganger, and D. O'Hallaron. Trace: Parallel trace replay with approximate causal events. In Proceedings of the 5th USENIX Conference on File and Storage Technologies, pages 24--24, Berkeley, CA, USA, 2007. USENIX Association. Google ScholarDigital Library
E. Molina-Estolano, C. Maltzahn, J. Bent, and S. Brandt. Building a parallel file system simulator. In Journal of Physics: Conference Series, volume 180, page 012050. IOP Publishing, 2009.Google Scholar
A. Núñez, J. Fernández, J. D. Garcia, F. Garcia, and J. Carretero. New techniques for simulating high performance MPI applications on large storage networks. The Journal of Supercomputing, 51(1):40--57, 2010. Google ScholarDigital Library
P. C. Roth. Characterizing the I/O behavior of scientific applications on the Cray XT. In Proceedings of the 2nd International Workshop on Petascale Data Storage, pages 50--55, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
H. Shan, K. Antypas, and J. Shalf. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 42. IEEE Press, 2008. Google ScholarDigital Library
S. S. Shende and A. D. Malony. The tau parallel performance system. International Journal of High Performance Computing Applications, 20(2):287--311, 2006. Google ScholarDigital Library
E. Smirni and D. A. Reed. Workload characterization of input/output intensive parallel applications. In Computer Performance Evaluation Modelling Techniques and Tools, pages 169--180. Springer, 1997. Google ScholarDigital Library
R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. In The Seventh Symposium on the Frontiers of Massively Parallel Computation, 1999. Frontiers' 99., pages 182--189. IEEE, 1999. Google ScholarDigital Library
A. Uselton, M. Howison, N. J. Wright, D. Skinner, N. Keen, J. Shalf, K. L. Karavanic, and L. Oliker. Parallel I/O performance: From events to ensembles. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pages 1--11. IEEE, 2010.Google ScholarCross Ref
R. F. Van der Wijngaart and P. Wong. NAS parallel benchmarks version 2.4. Technical report, NAS technical report, NAS-02-007, 2002.Google Scholar
A. Varga et al. The OMNeT++ discrete event simulation system. In Proceedings of the European Simulation Multiconference (ESMâĂ&Zacute;2001), 2001.Google Scholar
J. Vetter and C. Chambreau. mpiP: Lightweight, scalable MPI profiling. 2014.Google Scholar
K. Vijayakumar, F. Mueller, X. Ma, and P. C. Roth. Scalable I/O tracing and analysis. In Proceedings of the 4th Annual Workshop on Petascale Data Storage, pages 26--31, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
N. Zhu, J. Chen, T.-C. Chiueh, and D. Ellard. TBBT: scalable and accurate trace replay for file server evaluation. In ACM SIGMETRICS Performance Evaluation Review, volume 33, pages 392--393. ACM, 2005. Google ScholarDigital Library

Index Terms

Techniques for modeling large-scale HPC I/O workloads
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management
        Input / output

Recommendations

E-HPC: a library for elastic resource management in HPC environments
WORKS '17: Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science

Next-generation data-intensive scientific workflows need to support streaming and real-time applications with dynamic resource needs on high performance computing (HPC) platforms. The static resource allocation model on current HPC systems that was ...
Read More
An Efficient Dynamic Load-Balancing Large Scale Graph-Processing System
ICNCC '18: Proceedings of the 2018 VII International Conference on Network, Communication and Computing

Since the introduction of pregel by Google, several large-scale graph-processing systems have been introduced. These systems are based on the bulk synchronous parallel model or other similar models and use various strategies to optimize system ...
Read More
RADICAL-Pilot and PMIx/PRRTE: Executing Heterogeneous Workloads at Large Scale on Partitioned HPC Resources
Job Scheduling Strategies for Parallel Processing
Abstract
Execution of heterogeneous workflows on high-performance computing (HPC) platforms present unprecedented resource management and execution coordination challenges for runtime systems. Task heterogeneity increases the complexity of resource and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PMBS '15: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems
November 2015
105 pages
ISBN:9781450340090
DOI:10.1145/2832087
Conference Chairs:
Stephen Jarvis
University of Warwick, UK
,
Steven Wright
University of Warwick, UK
,
Simon Hammond
Sandia National Laboratories (NM)
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
PMBS '15 Paper Acceptance Rate9of22submissions,41%Overall Acceptance Rate9of22submissions,41%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 468
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Techniques for modeling large-scale HPC I/O workloads

PMBS '15: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

E-HPC: a library for elastic resource management in HPC environments

An Efficient Dynamic Load-Balancing Large Scale Graph-Processing System

RADICAL-Pilot and PMIx/PRRTE: Executing Heterogeneous Workloads at Large Scale on Partitioned HPC Resources

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Techniques for modeling large-scale HPC I/O workloads

PMBS '15: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

E-HPC: a library for elastic resource management in HPC environments

An Efficient Dynamic Load-Balancing Large Scale Graph-Processing System

RADICAL-Pilot and PMIx/PRRTE: Executing Heterogeneous Workloads at Large Scale on Partitioned HPC Resources

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media