skip to main content
10.1145/3573900.3593636acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
extended-abstract

A Study of Simulating Heterogeneous Workloads on Large-scale Interconnect Network

Published:21 June 2023Publication History

ABSTRACT

With the rapid growth of the machine learning applications, the workloads of future HPC systems are anticipated to be a mix of scientific simulation, big data analytics, and machine learning applications. Simulation is a great research vehicle to understand the performance implications of co-running scientific applications with big data and machine learning workloads on large-scale systems. In this work, we propose a scalable workload manager that provides an automatic framework to facilitate hybrid workload simulation. We investigate various hybrid workloads and navigate various application-system configurations for a deeper understanding of performance implications of a diverse mix of workloads on current and future supercomputers.

References

  1. Tal Ben-Nun and Torsten Hoefler. 2018. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. arXiv preprint arXiv:1802.09941 (2018).Google ScholarGoogle Scholar
  2. Christopher D Carothers, David Bauer, and Shawn Pearce. 2002. ROSS: A high-performance, low-memory, modular Time Warp system. J. Parallel and Distrib. Comput. 62, 11 (2002), 1648–1669.Google ScholarGoogle ScholarCross RefCross Ref
  3. Sudheer Chunduri, Kevin Harms, Scott Parker, Vitali Morozov, Sam Oshin, Naveen Cherukuri, and Kalyan Kumaran. 2017. Run-to-run Variability on Xeon Phi based Cray XC Systems. In SC17: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Greg Faanes, Abdulla Bataineh, Duncan Roweth, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, James Reinhard, 2012. Cray Cascade: A scalable HPC system based on a Dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mario Flajslik, Eric Borch, and Mike A Parker. 2018. Megafly: A Topology for Exascale Systems. In International Conference on High Performance Computing. Springer, 289–310.Google ScholarGoogle Scholar
  6. Nan Jiang, Daniel U Becker, George Michelogiannakis, James Balfour, Brian Towles, David E Shaw, John Kim, and William J Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 86–96.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Alejandro Lucero, Cyriel Minkenberg, and Jesus Labarta. 2015. Quiet neighborhoods: Key to protect job performance predictability. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 449–459.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. John Kim, Wiliam J Dally, Steve Scott, and Dennis Abts. 2008. Technology-driven, highly-scalable dragonfly topology. In ACM SIGARCH Computer Architecture News, Vol. 36. IEEE Computer Society, 77–88.Google ScholarGoogle Scholar
  9. Misbah Mubarak, Christopher D Carothers, Robert B Ross, and Philip Carns. 2017. Enabling parallel simulation of large-scale HPC network systems. IEEE Transactions on Parallel and Distributed Systems 28, 1 (2017), 87–100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Arun F Rodrigues, K Scott Hemmert, Brian W Barrett, Chad Kersey, Ron Oldfield, Marlo Weston, Rolf Risen, Jeanine Cook, Paul Rosenfeld, E CooperBalls, 2011. The structural simulation toolkit. SIGMETRICS Performance Evaluation Review 38, 4 (2011), 37–42.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Study of Simulating Heterogeneous Workloads on Large-scale Interconnect Network

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGSIM-PADS '23: Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
        June 2023
        173 pages
        ISBN:9798400700309
        DOI:10.1145/3573900

        Copyright © 2023 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 June 2023

        Check for updates

        Qualifiers

        • extended-abstract
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate398of779submissions,51%
      • Article Metrics

        • Downloads (Last 12 months)41
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format