extended-abstract

A Study of Simulating Heterogeneous Workloads on Large-scale Interconnect Network

Author:
Xin Wang

Illinois Institute of Technology, USA

Illinois Institute of Technology, USA

0000-0002-3692-2483
View Profile

SIGSIM-PADS '23: Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete SimulationJune 2023Pages 58–59https://doi.org/10.1145/3573900.3593636

Published:21 June 2023Publication History

SIGSIM-PADS '23: Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

Pages 58–59

ABSTRACT

With the rapid growth of the machine learning applications, the workloads of future HPC systems are anticipated to be a mix of scientific simulation, big data analytics, and machine learning applications. Simulation is a great research vehicle to understand the performance implications of co-running scientific applications with big data and machine learning workloads on large-scale systems. In this work, we propose a scalable workload manager that provides an automatic framework to facilitate hybrid workload simulation. We investigate various hybrid workloads and navigate various application-system configurations for a deeper understanding of performance implications of a diverse mix of workloads on current and future supercomputers.

References

Tal Ben-Nun and Torsten Hoefler. 2018. Demystifying parallel and distributed deep learning: An in-depth concurrency analysis. arXiv preprint arXiv:1802.09941 (2018).Google Scholar
Christopher D Carothers, David Bauer, and Shawn Pearce. 2002. ROSS: A high-performance, low-memory, modular Time Warp system. J. Parallel and Distrib. Comput. 62, 11 (2002), 1648–1669.Google ScholarCross Ref
Sudheer Chunduri, Kevin Harms, Scott Parker, Vitali Morozov, Sam Oshin, Naveen Cherukuri, and Kalyan Kumaran. 2017. Run-to-run Variability on Xeon Phi based Cray XC Systems. In SC17: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE.Google ScholarDigital Library
Greg Faanes, Abdulla Bataineh, Duncan Roweth, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, James Reinhard, 2012. Cray Cascade: A scalable HPC system based on a Dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 103.Google ScholarDigital Library
Mario Flajslik, Eric Borch, and Mike A Parker. 2018. Megafly: A Topology for Exascale Systems. In International Conference on High Performance Computing. Springer, 289–310.Google Scholar
Nan Jiang, Daniel U Becker, George Michelogiannakis, James Balfour, Brian Towles, David E Shaw, John Kim, and William J Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 86–96.Google ScholarCross Ref
Ana Jokanovic, Jose Carlos Sancho, German Rodriguez, Alejandro Lucero, Cyriel Minkenberg, and Jesus Labarta. 2015. Quiet neighborhoods: Key to protect job performance predictability. In IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 449–459.Google ScholarDigital Library
John Kim, Wiliam J Dally, Steve Scott, and Dennis Abts. 2008. Technology-driven, highly-scalable dragonfly topology. In ACM SIGARCH Computer Architecture News, Vol. 36. IEEE Computer Society, 77–88.Google Scholar
Misbah Mubarak, Christopher D Carothers, Robert B Ross, and Philip Carns. 2017. Enabling parallel simulation of large-scale HPC network systems. IEEE Transactions on Parallel and Distributed Systems 28, 1 (2017), 87–100.Google ScholarDigital Library
Arun F Rodrigues, K Scott Hemmert, Brian W Barrett, Chad Kersey, Ron Oldfield, Marlo Weston, Rolf Risen, Jeanine Cook, Paul Rosenfeld, E CooperBalls, 2011. The structural simulation toolkit. SIGMETRICS Performance Evaluation Review 38, 4 (2011), 37–42.Google ScholarDigital Library

Index Terms

A Study of Simulating Heterogeneous Workloads on Large-scale Interconnect Network
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation evaluation
    2. Simulation types and techniques
      1. Discrete-event simulation

Recommendations

Performance Analysis of Network I/O Workloads in Virtualized Data Centers

Server consolidation and application consolidation through virtualization are key performance optimizations in cloud-based service delivery industry. In this paper, we argue that it is important for both cloud consumers and cloud providers to understand ...
Read More
A Performance Study of Big Data Workloads in Cloud Datacenters with Network Variability
ICPE '18: Companion of the 2018 ACM/SPEC International Conference on Performance Engineering

Public cloud computing platforms are a cost-effective solution for individuals and organizations to deploy various types of workloads, ranging from scientific applications, business-critical workloads, e-governance to big data applications. Co-locating ...
Read More
Network-aware migration control and scheduling of differentiated virtual machine workloads
CLOUD '09: Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing

Server virtualization enables dynamic workload management for data centers. However, especially live migrations of virtual machines (VM) induce significant overheads on physical hosts and the shared network infrastructure possibly leading to host ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGSIM-PADS '23: Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
June 2023
173 pages
ISBN:9798400700309
DOI:10.1145/3573900
Editors:
Margaret Loper,
Dong(Kevin) Jin,
Christopher D. Carothers
Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 June 2023
Check for updates
Qualifiers
- extended-abstract
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate398of779submissions,51%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 41
  Total Downloads
- Downloads (Last 12 months)41
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Study of Simulating Heterogeneous Workloads on Large-scale Interconnect Network

SIGSIM-PADS '23: Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Performance Analysis of Network I/O Workloads in Virtualized Data Centers

A Performance Study of Big Data Workloads in Cloud Datacenters with Network Variability

Network-aware migration control and scheduling of differentiated virtual machine workloads

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A Study of Simulating Heterogeneous Workloads on Large-scale Interconnect Network

SIGSIM-PADS '23: Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Performance Analysis of Network I/O Workloads in Virtualized Data Centers

A Performance Study of Big Data Workloads in Cloud Datacenters with Network Variability

Network-aware migration control and scheduling of differentiated virtual machine workloads

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media