skip to main content
research-article

Modeling and Simulation of Extreme-Scale Fat-Tree Networks for HPC Systems and Data Centers

Published: 06 July 2017 Publication History

Abstract

As parallel and distributed systems are evolving toward extreme scale, for example, high-performance computing systems involve millions of cores and billion-way parallelism, and high-capacity storage systems require efficient access to petabyte or exabyte of data, many new challenges are posed on designing and deploying next-generation interconnection communication networks in these systems. Fat-tree networks have been widely used in both data centers and high-performance computing (HPC) systems in the past decades and are promising candidates of the next-generation extreme-scale networks. In this article, we present FatTreeSim, a simulation framework that supports modeling and simulation of extreme-scale fat-tree networks with the goal of understanding the design constraints of next-generation HPC and distributed systems and aiding the design and performance optimization of the applications running on these systems. We have systematically experimented FatTreeSim on Emulab and Blue Gene/Q and analyzed the scalability and fidelity of FatTreeSim with various network configurations. On the Blue Gene/Q Mira, FatTreeSim can achieve a peak performance of 305 million events per second using 16,384 cores. Finally, we have applied FatTreeSim to simulate several large-scale Hadoop YARN applications to demonstrate its usability.

References

[1]
Dennis Abts and Bob Felderman. 2012. A guided tour through data-center networking. Queue 10, 5, Article 10.
[2]
Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. ACM SIGCOMM Comput. Commun. Rev. 38, 4, 63.
[3]
Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, and Amin Vahdat. 2010. Hedera: Dynamic flow scheduling for data center networks. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI’10). USENIX Association, Berkeley, CA, 19.
[4]
Albert Alexandrov, Mihai F. Ionescu, Klaus E. Schauser, and Chris Scheiman. 1995. LogGP: Incorporating long messages into the logp model—one step closer towards a realistic model for parallel computation. In Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’95). ACM, New York. 95--105.
[5]
Saman Amarasinghe, Dan Campbell, William Carlson, Andrew Chien, William Dally, Elmootazbellah Elnohazy, Robert Harrison, William Harrod, Jon Hiller, Sherman Karp, Charles Koelbel, David Koester, Peter Kogge, John Levesque, Daniel Reed, Robert Schreiber, Mark Richards, Al Scarpelli, John Shalf, Allan Snavely, and Thomas Sterling. 2009. 1 ExaScale Software Study: Software Challenges in Extreme Scale Systems (2009).
[6]
C.D. Carothers, D. Bauer, and S. Pearce. 2000. ROSS: A high-performance, low memory, modular time warp system. In Proceedings of the 14th Workshop on Parallel and Distributed Simulation (PADS’00). 53--60.
[7]
J. Cope, N. Liu, S. Lang, P. Carns, C. D. Carothers, and R. Ross. 2011. CODES: Enabling co-design of multilayer exascale storage architectures. In Proceedings of the Workshop on Emerging Supercomputing Technologies (WEST’11).
[8]
Cody Cutler, Mike Hibler, Eric Eide, and Robert Ricci. 2010. Trusted disk loading in the emulab network testbed. In Proceedings of the 3rd International Conference on Cyber Security Experimentation and Test (CSET’10). USENIX Association, Berkeley, CA, 1--8.
[9]
Cisco. 20015. Global Cloud Index: Forecast and Methodology, 2013--2018. Retrieved from http://cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html (2015).
[10]
ClusterDesign.org. 2015. Real Cost Comparison of Fat-tree and Torus Networks, ClusterDesign.org. Retrieved from http://clusterdesign.org/2013/01/real-cost-comparison -of-fat-tree-and-torus-networks/.
[11]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1, 107--113.
[12]
Christian E. Hopps and Dave Thaler. 2015. Multipath Issues in Unicast and Multicast Next-Hop Selection. Retrieved from https://tools.ietf.org/html/rfc2991.
[13]
Apache Hadoop. 2015. Homepage. Retrieved from http://hadoop.apache.org.
[14]
Teerawat Issariyakul and Ekram Hossain. 2008. Introduction to Network Simulator NS2 (1 ed.). Springer.
[15]
Tonglin Li, Xiaobing Zhou, Kevin Brandstatter, Dongfang Zhao, Ke Wang, Anupam Rajendran, Zhao Zhang, and Ioan Raicu. 2013. ZHT: A light-weight reliable persistent dynamic scalable zero-hop distributed hash table Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’13). 775--787.
[16]
Xuan-Yi Lin, Yeh-Ching Chung, and Tai-Yi Huang. 2004. A multiple LID routing scheme for fat-tree-based InfiniBand networks. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04). 11.
[17]
N. Liu, C. Carothers, J. Cope, P. Carns, and R. Ross. 2012. Model and simulation of exascale communication networks. J. Simulat. 6, 4, 227--236.
[18]
Ning Liu and Christopher D. Carothers. 2011. Modeling billion-node torus networks using massively parallel discrete-event simulation. In Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation (PADS’11). IEEE Computer Society, Washington, DC, 1--8.
[19]
Ning Liu, Xi Yang, Xian-He Sun, Jonathon Jenkins, and Robert Ross. 2015. YARNsim: Simulating hadoop YARN. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid’15).
[20]
Tatsuya Mori, Masato Uchida, Ryoichi Kawahara, Jianping Pan, and Shigeki Goto. 2004. Identifying elephant flows through periodically sampled packets. In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement (IMC’04). ACM, New York, 115--120.
[21]
M. Mubarak, C. D. Carothers, R. Ross, and P. Carns. 2012. Modeling a million-node dragonfly network using massively parallel discrete-event simulation. In Proceedings of the High Performance Computing, Networking, Storage and Analysis (SCC’12). 366--376.
[22]
NS3. 2015. Homepage. Retrieved from https://www.nsnam.org/.
[23]
Kalyan S. Perumalla and Alfred J. Park. 2014. Simulating billion-task parallel programs. InProceedings of the Summer Simulation Conference International Symposium on Performance Evaluation of Computer and Telecommunication Systems.
[24]
Gartner Report. 2015. Homepage. Retrieved from http://www.gartner.com/newsroom/id/2313915 (2015).
[25]
S. Snyder, P. Carns, J. Jenkins, K. Harms, R. Ross, M. Mubarak, and C. Carothers. 2014. A case for epidemic fault detection and group membership in hpc storage systems. In The 5th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS’14). New Orleans, LA, USA, 237--248.
[26]
Summit. 2015. Scale new heights. Discover new solutions. Retrieved from http://www.olcf.ornl.gov/summit/.
[27]
W. Tang, J. Jenkins, F. Meyer, R. B. Ross, R. Kettimuthu, L. Winkler, X. Yang, T. Lehman, and N. L. Desai. 2014. Data-aware resource scheduling for multicloud workflows: A fine-grained simulation approach. In 2014 IEEE 6th International Conference on Cloud Computing Technology and Science (CloudCom’14). Singapore, 887--892.
[28]
Guanying Wang, A. R. Butt, P. Pandey, and K. Gupta. 2009. A simulation approach to evaluating design decisions in mapreduce setups. In Proceedings of the IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS’09). 1--11.
[29]
Boyu Zhang, Daniel T. Yehdego, Kyle L. Johnson, Ming-Ying Leung, and Michela Taufer. 2013. Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce. BMC Struct. Biol. 13, Suppl 1, S3.
[30]
Dongfang Zhao, Da Zhang, Ke Wang, and Ioan Raicu. 2013. Exploring reliability of exascale systems through simulations. In Proceedings of the 21st ACM/SCS High Performance Computing Symposium (HPC’13).
[31]
Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip Carns, Robert Ross, and Ioan Raicu. 2014. FusionFS: Toward supporting data-intensive scientific applications on extreme-scale distributed systems. In Proceedings of IEEE International Conference on Big Data.
[32]
G. Zheng, G. Gupta, E. Bohm, I. Dooley, and L. V. Kale. 2010. Simulating large scale parallel applications using statistical models for sequential execution blocks. In Proceedings of the 16th International Conference on Parallel and Distributed Systems (ICPADS’10).

Cited By

View all
  • (2024)An Evaluation of the Effect of Network Cost Optimization for Leadership Class SupercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00037(1-16)Online publication date: 17-Nov-2024
  • (2023)BurstBalancer: Do Less, Better Balance for Large-Scale Data Center TrafficIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329545435:6(932-949)Online publication date: 14-Jul-2023
  • (2023)SimMSG: Simulating Transportation of MPI Messages in High Performance Computing Systems2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00051(319-328)Online publication date: 17-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Computer Simulation
ACM Transactions on Modeling and Computer Simulation  Volume 27, Issue 2
Special Issue on PADS 2015
April 2017
203 pages
ISSN:1049-3301
EISSN:1558-1195
DOI:10.1145/3015562
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 July 2017
Accepted: 01 August 2016
Revised: 01 May 2016
Received: 01 November 2015
Published in TOMACS Volume 27, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Fat-tree network
  2. distributed system
  3. high-performance computing
  4. parallel discrete event simulation

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Maryland Procurement Office
  • Air Force Office of Scientific Research (AFOSR)
  • Office of Science of the U.S. Department of Energy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Evaluation of the Effect of Network Cost Optimization for Leadership Class SupercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00037(1-16)Online publication date: 17-Nov-2024
  • (2023)BurstBalancer: Do Less, Better Balance for Large-Scale Data Center TrafficIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329545435:6(932-949)Online publication date: 14-Jul-2023
  • (2023)SimMSG: Simulating Transportation of MPI Messages in High Performance Computing Systems2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00051(319-328)Online publication date: 17-Dec-2023
  • (2018)Large Scale Data Centers Simulation Based on Baseline Test Model2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00018(57-68)Online publication date: May-2018
  • (2017)Guest Editorial for the TOMACS Special Issue on the Principles of Advanced Discrete Simulation (PADS)ACM Transactions on Modeling and Computer Simulation10.1145/308454327:2(1-3)Online publication date: 6-Jul-2017

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media