skip to main content
10.1145/2832087.2832091acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Techniques for modeling large-scale HPC I/O workloads

Published:15 November 2015Publication History

ABSTRACT

Accurate analysis of HPC storage system designs is contingent on the use of I/O workloads that are truly representative of expected use. However, I/O analyses are generally bound to specific workload modeling techniques such as synthetic benchmarks or trace replay mechanisms, despite the fact that no single workload modeling technique is appropriate for all use cases. In this work, we present the design of IOWA, a novel I/O workload abstraction that allows arbitrary workload consumer components to obtain I/O workloads from a range of diverse input sources. Thus, researchers can choose specific I/O workload generators based on the resources they have available and the type of evaluation they wish to perform. As part of this research, we also outline the design of three distinct workload generation methods, based on I/O traces, synthetic I/O kernels, and I/O characterizations. We analyze and contrast each of these workload generation techniques in the context of storage system simulation models as well as production storage system measurements. We found that each generator mechanism offers varying levels of accuracy, flexibility, and breadth of use that should be considered before performing I/O analyses. We also recommend a set of best practices for HPC I/O workload modeling based on challenges that we encountered while performing our evaluation.

References

  1. mdtest benchmark. http://sourceforge.net/projects/mdtest/, 2015.Google ScholarGoogle Scholar
  2. A. Adelmann, R. Ryne, J. Shalf, and C. Siegerist. H5Part: A portable high performance parallel data interface for particle simulations. In Particle Accelerator Conference, 2005. PAC 2005. Proceedings of the, pages 4129--4131. IEEE, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  3. D. W. Bauer Jr, C. D. Carothers, and A. Holder. Scalable time warp on blue gene supercomputers. In Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation, pages 35--44. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. J. Bowers, B. Albright, L. Yin, B. Bergen, and T. Kwan. Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation. Physics of Plasmas (1994-present), 15(5):055703, 2008.Google ScholarGoogle Scholar
  5. S. Byna, Y. Chen, X.-H. Sun, R. Thakur, and W. Gropp. Parallel I/O prefetching using MPI file caching and I/O signatures. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 44. IEEE Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Capps and W. Norcott. IOzone filesystem benchmark. http://www.iozone.org/.Google ScholarGoogle Scholar
  7. P. Carns. ALCF I/O data repository. Technical Report ANL/ALCF/TM-13/1, Argonne National Laboratory (ANL), 2013.Google ScholarGoogle Scholar
  8. P. Carns, K. Harms, W. Allcock, C. Bacon, S. Lang, R. Latham, and R. Ross. Understanding and improving computational science storage access through continuous characterization. ACM Transactions on Storage (TOS), 7(3):8, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Carns, R. Latham, R. Ross, K. Iskra, S. Lang, and K. Riley. 24/7 characterization of petascale I/O workloads. In Proceedings of 2009 Workshop on Interfaces and Architectures for Scientific Data Storage, September 2009.Google ScholarGoogle ScholarCross RefCross Ref
  10. P. Carns, Y. Yao, K. Harms, R. Latham, R. B. Ross, and K. Antypas. Production I/O characterization on the Cray XE6. In In Proceedings of the Cray User Group meeting 2013 (CUG 2013), 2013.Google ScholarGoogle Scholar
  11. C. D. Carothers, D. Bauer, and S. Pearce. Ross: A high-performance, low-memory, modular time warp system. Journal of Parallel and Distributed Computing, 62(11):1648--1669, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. D. Carothers, K. S. Perumalla, and R. M. Fujimoto. Efficient optimistic parallel simulations using reverse computation. ACM Transactions on Modeling and Computer Simulation (TOMACS), 9(3):224--253, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Cope, N. Liu, S. Lang, P. Carns, C. Carothers, and R. Ross. Codes: Enabling co-design of multilayer exascale storage architectures. In Proceedings of the Workshop on Emerging Supercomputing Technologies, 2011.Google ScholarGoogle Scholar
  14. P. E. Crandall, R. A. Aydt, A. A. Chien, and D. A. Reed. Input/output characteristics of scalable parallel applications. In Proceedings of the 1995 ACM/IEEE conference on Supercomputing, page 59. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Department of Energy. CORAL. http://asc.llnl.gov/CORAL-benchmarks/, 2015.Google ScholarGoogle Scholar
  16. S. Eidenbenz, M. Erazo, T. Li, and J. Liu. Toward comprehensive and accurate simulation performance prediction of parallel file systems. Technical report, Los Alamos National Laboratory (LANL), 2011.Google ScholarGoogle Scholar
  17. S. Godard. Sysstat utilities home page. http://sebastien.godard.pagesperso-orange.fr/, 2015.Google ScholarGoogle Scholar
  18. W. He, D. H. Du, and S. B. Narasimhamurthy. PIONEER: A solution to parallel I/O workload characterization and generation. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM International Symposium on, pages 111--120. IEEE, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Heroux and R. Barrett. Mantevo project. https://mantevo.org/, 2015.Google ScholarGoogle Scholar
  20. W.-I. Kao and R. K. Iyer. A user-oriented synthetic workload generator. In Distributed Computing Systems, 1992., Proceedings of the 12th International Conference on, pages 270--277. IEEE, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  21. Y. Kim, R. Gunasekaran, G. M. Shipman, D. A. Dillow, Z. Zhang, and B. W. Settlemyer. Workload characterization of a leadership class storage cluster. In 5th Petascale Data Storage Workshop (PDSW), pages 1--5. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. Kunkel. HDTrace -- a tracing and simulation environment of application and system interaction. Hamburg. University of Hamburg-2011, 2011.Google ScholarGoogle Scholar
  23. Z. Kurmas, K. Keeton, and K. Mackenzie. Synthesizing representative I/O workloads using iterative distillation. In Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003. 11th IEEE/ACM International Symposium on, pages 6--15. IEEE, 2003.Google ScholarGoogle Scholar
  24. Lawrence Livermore National Laboratory. IOR benchmark. https://github.com/chaos/ior, 2015.Google ScholarGoogle Scholar
  25. Lawrence Livermore National Laboratory. Lustre Monitoring Tool (Github). https://github.com/chaos/lmt, 2015.Google ScholarGoogle Scholar
  26. N. Liu, C. Carothers, J. Cope, P. Carns, R. Ross, A. Crume, and C. Maltzahn. Modeling a leadership-scale storage system. In Parallel Processing and Applied Mathematics, pages 10--19. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the role of burst buffers in leadership-class storage systems. In Proceedings of 28th IEEE MSST conference, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  28. Y. Liu, R. Figueiredo, D. Clavijo, Y. Xu, and M. Zhao. Towards simulation of parallel file system scheduling algorithms with PFSsim. In Proceedings of the 7th IEEE International Workshop on Storage Network Architectures and Parallel I/O (May 2011), 2011.Google ScholarGoogle Scholar
  29. H. Luu, B. Behzad, R. Aydt, and M. Winslett. A multi-level approach for understanding I/O activity in HPC applications. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on, pages 1--5, Sept 2013.Google ScholarGoogle ScholarCross RefCross Ref
  30. S. Méndez, D. Rexachs, and E. Luque. Modeling parallel scientific applications through their input/output phases. In Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on, pages 7--15. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. P. Mesnier, M. Wachs, R. R. Sambasivan, J. Lopez, J. Hendricks, G. R. Ganger, and D. O'Hallaron. Trace: Parallel trace replay with approximate causal events. In Proceedings of the 5th USENIX Conference on File and Storage Technologies, pages 24--24, Berkeley, CA, USA, 2007. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. Molina-Estolano, C. Maltzahn, J. Bent, and S. Brandt. Building a parallel file system simulator. In Journal of Physics: Conference Series, volume 180, page 012050. IOP Publishing, 2009.Google ScholarGoogle Scholar
  33. A. Núñez, J. Fernández, J. D. Garcia, F. Garcia, and J. Carretero. New techniques for simulating high performance MPI applications on large storage networks. The Journal of Supercomputing, 51(1):40--57, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. C. Roth. Characterizing the I/O behavior of scientific applications on the Cray XT. In Proceedings of the 2nd International Workshop on Petascale Data Storage, pages 50--55, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Shan, K. Antypas, and J. Shalf. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 42. IEEE Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. S. Shende and A. D. Malony. The tau parallel performance system. International Journal of High Performance Computing Applications, 20(2):287--311, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. Smirni and D. A. Reed. Workload characterization of input/output intensive parallel applications. In Computer Performance Evaluation Modelling Techniques and Tools, pages 169--180. Springer, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. In The Seventh Symposium on the Frontiers of Massively Parallel Computation, 1999. Frontiers' 99., pages 182--189. IEEE, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Uselton, M. Howison, N. J. Wright, D. Skinner, N. Keen, J. Shalf, K. L. Karavanic, and L. Oliker. Parallel I/O performance: From events to ensembles. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pages 1--11. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  40. R. F. Van der Wijngaart and P. Wong. NAS parallel benchmarks version 2.4. Technical report, NAS technical report, NAS-02-007, 2002.Google ScholarGoogle Scholar
  41. A. Varga et al. The OMNeT++ discrete event simulation system. In Proceedings of the European Simulation Multiconference (ESMâĂŹ2001), 2001.Google ScholarGoogle Scholar
  42. J. Vetter and C. Chambreau. mpiP: Lightweight, scalable MPI profiling. 2014.Google ScholarGoogle Scholar
  43. K. Vijayakumar, F. Mueller, X. Ma, and P. C. Roth. Scalable I/O tracing and analysis. In Proceedings of the 4th Annual Workshop on Petascale Data Storage, pages 26--31, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. N. Zhu, J. Chen, T.-C. Chiueh, and D. Ellard. TBBT: scalable and accurate trace replay for file server evaluation. In ACM SIGMETRICS Performance Evaluation Review, volume 33, pages 392--393. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Techniques for modeling large-scale HPC I/O workloads

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PMBS '15: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems
      November 2015
      105 pages
      ISBN:9781450340090
      DOI:10.1145/2832087

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 November 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      PMBS '15 Paper Acceptance Rate9of22submissions,41%Overall Acceptance Rate9of22submissions,41%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader