skip to main content
10.1145/2769458.2769474acmconferencesArticle/Chapter ViewAbstractPublication PagespadsConference Proceedingsconference-collections
research-article

FatTreeSim: Modeling Large-scale Fat-Tree Networks for HPC Systems and Data Centers Using Parallel and Discrete Event Simulation

Published: 10 June 2015 Publication History

Abstract

Fat-tree topologies have been widely adopted as the communication network in data centers in the past decade. Nowadays, high-performance computing (HPC) system designers are considering using fat-tree as the interconnection network for the next generation supercomputers. For extreme-scale computing systems like the data centers and supercomputers, the performance is highly dependent on the interconnection networks. In this paper, we present FatTreeSim, a PDES-based toolkit consisting of a highly scalable fat-tree network model, with the goal of better understanding the design constraints of fat-tree networking architectures in data centers and HPC systems, as well as evaluating the applications running on top of the network. FatTreeSim is designed to model and simulate large-scale fat-tree networks up to millions of nodes with protocol-level fidelity. We have conducted extensive experiments to validate and demonstrate the accuracy, scalability and usability of FatTreeSim. On Argonne Leadership Computing Facility's Blue Gene/Q system, Mira, FatTreeSim is capable of achieving a peak event rate of 305 M/s for a 524,288-node fat-tree model with a total of 567 billion committed events. The strong scaling experiments use up to 32,768 cores and show a near linear scalability. Comparing with a small-scale physical system in Emulab, FatTreeSim can accurately model the latency in the same fat-tree network with less than 10% error rate for most cases. Finally, we demonstrate FatTreeSim's usability through a case study in which FatTreeSim serves as the network module of the YARNsim system, and the error rates for all test cases are less than 13.7%.

References

[1]
Apache Hadoop. http://hadoop.apache.org. {Last accessed May 2015}.
[2]
Cisco Global Cloud Index: Forecast and Methodology, 2013-2018. http://cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud Index White Paper.html. {Last accessed November 2014}.
[3]
Gartner Report. http://www.gartner.com/newsroom/id/2313915. {Last accessed May 2015}.
[4]
IDC: Amount of World Data Centers to Start Declining in 2017. http://www.datacenterknowledge.com/archives/2014/11/11/idc-amount-of-worlds-data-centers/-to-start-declining-in-2017/. {Last accessed November 2014}.
[5]
Mumak: Map-Reduce Simulator. https://issues.apache.org/jira/browse/MAPREDUCE-728. {Last accessed May 2015}.
[6]
ns-3. https://www.nsnam.org/. {Last accessed May 2015}.
[7]
Real Cost Comparison of Fat-tree and Torus Networks | ClusterDesign.org. http://clusterdesign.org/2013/01/real-cost-comparison-of-fat-tree-and-torus-networks/. {Last accessed May 2015}.
[8]
Summit. Scale new heights. Discover new solutions. http://www.olcf.ornl.gov/summit/. {Last accessed May 2015}. 1 Disclaimer: Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reect the views of the Maryland Procurement Office.
[9]
Yarn Scheduler Load Simulator (SLS). http://hadoop.apache.org/docs/r2.4.1/hadoop-sls/SchedulerLoadSimulator.html. {Last accessed May 2015}.
[10]
D. Abts and B. Felderman. A guided tour through data-center networking. Queue, 10(5):10:10--10:23, May 2012.
[11]
M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. ACM SIGCOMM Computer Communication Review, 38(4):63--74, Oct. 2008.
[12]
M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic ow scheduling for data center networks. In Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation, NSDI'10, pages 19--19, Berkeley, CA, USA, Apr. 2010. USENIX Association.
[13]
A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman. LogGP: Incorporating Long Messages into the LogP Model&Mdash;One Step Closer Towards a Realistic Model for Parallel Computation. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '95, pages 95--105, New York, NY, USA, July 1995. ACM.
[14]
C. Carothers, D. Bauer, and S. Pearce. ROSS: a high-performance, low memory, modular time warp system. In Fourteenth Workshop on Parallel and Distributed Simulation, 2000. PADS 2000. Proceedings, pages 53--60, Bologna, Italy, May 2000.
[15]
J. Cope, N. Liu, S. Lang, P. Carns, C. D. Carothers, and R. Ross. CODES: Enabling co-design of multilayer exascale storage architectures. In Proceedings of the Workshop on Emerging Supercomputing Technologies (WEST), Tuscon, AZ, June 2011.
[16]
C. Cutler, M. Hibler, E. Eide, and R. Ricci. Trusted disk loading in the emulab network testbed. In Proceedings of the 3rd International Conference on Cyber Security Experimentation and Test, CSET'10, pages 1--8, Berkeley, CA, USA, Aug. 2010. USENIX Association.
[17]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, Jan. 2008.
[18]
S. Hammoud, M. Li, Y. Liu, N. Alham, and Z. Liu. MRSim: A discrete event based MapReduce simulator. In 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), volume 6, pages 2993--2997, Yantai, China, Aug. 2010.
[19]
C. E. Hopps and D. Thaler. Multipath Issues in Unicast and Multicast Next-Hop Selection. https://tools.ietf.org/html/rfc2991. {Last accessed May 2015}.
[20]
T. Issariyakul and E. Hossain. Introduction to Network Simulator NS2. Springer Publishing Company, Incorporated, 1 edition, 2008.
[21]
T. Li, X. Zhou, K. Brandstatter, D. Zhao, K. Wang, A. Rajendran, Z. Zhang, and I. Raicu. ZHT: A light-weight reliable persistent dynamic scalable zero-hop distributed hash table. IPDPS '13, pages 775--787, Boston, MA, May 2013.
[22]
X.-Y. Lin, Y.-C. Chung, and T.-Y. Huang. A multiple LID routing scheme for fat-tree-based InfiniBand networks. In Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, pages 11--, Santa Fe, New Mexico, Apr. 2004.
[23]
N. Liu, C. Carothers, J. Cope, P. Carns, and R. Ross. Model and simulation of exascale communication networks. Journal of Simulation, 6(4):227--236, Nov. 2012.
[24]
N. Liu and C. D. Carothers. Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation. In Proceedings of the 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation, PADS '11, pages 1--8, Washington, DC, USA, 2011. IEEE Computer Society.
[25]
N. Liu, X. Yang, X.-H. Sun, J. Jenkins, and R. Ross. Yarnsim: Simulating hadoop yarn. In 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid '15, Shenzhen, China, May 2015.
[26]
Y. Liu, M. Li, N. K. Alham, and S. Hammoud. HSim: A MapReduce Simulator in Enabling Cloud Computing. Future Generation Computer Systems, 29(1):300--308, Jan. 2013.
[27]
T. Mori, M. Uchida, R. Kawahara, J. Pan, and S. Goto. Identifying elephant ows through periodically sampled packets. In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, IMC '04, pages 115--120, New York, NY, USA, Nov. 2004. ACM.
[28]
M. Mubarak, C. Carothers, R. Ross, and P. Carns. Modeling a Million-Node Dragony Network Using Massively Parallel Discrete-Event Simulation. In 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), pages 366--376, Washington, DC, USA, Nov. 2012.
[29]
M. Mubarak, C. D. Carothers, R. B. Ross, and P. Carns. A case study in using massively parallel simulation for extreme-scale torus network codesign. In Proceedings of the 2Nd ACM SIGSIM/PADS Conference on Principles of Advanced Discrete Simulation, SIGSIM-PADS '14, pages 27--38, New York, NY, USA, 2014. ACM.
[30]
K. S. Perumalla and A. J. Park. Simulating billion-task parallel programs. In Performance Evaluation of Computer and Telecommunication Systems (SPECTS 2014), International Symposium on, pages 585--592, Monterey, CA, USA, July 2014.
[31]
S. Snyder, P. Carns, J. Jenkins, K. Harms, R. Ross, M. Mubarak, and C. Carothers. A case for epidemic fault detection and group membership in hpc storage systems. In the 5th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS14)., pages 237--248, New Orleans, LA, USA, Nov. 2014. Springer International Publishing.
[32]
W. Tang, J. Jenkins, F. Meyer, R. B. Ross, R. Kettimuthu, L. Winkler, X. Yang, T. Lehman, and N. L. Desai. Data-aware resource scheduling for multicloud workows: A fine-grained simulation approach. In 2014 IEEE 6th International Conference on Cloud Computing Technology and Science (CloudCom), pages 887--892, Singapore, Dec. 2014.
[33]
G. Wang, A. Butt, P. Pandey, and K. Gupta. A simulation approach to evaluating design decisions in MapReduce setups. In IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems, MASCOTS '09, pages 1--11, London, UK, Sept. 2009.
[34]
B. Zhang, D. T. Yehdego, K. L. Johnson, M.-Y. Leung, and M. Taufer. Enhancement of accuracy and efficiency for RNA secondary structure prediction by sequence segmentation and MapReduce. BMC Structural Biology, 13(Suppl 1):S3, Nov. 2013.
[35]
D. Zhao, D. Zhang, K. Wang, and I. Raicu. Exploring reliability of exascale systems through simulations. In Proceedings of the High Performance Computing Symposium, HPC '13, pages 1:1--1:9, San Diego, CA, USA, 2013. Society for Computer Simulation International.
[36]
D. Zhao, Z. Zhang, X. Zhou, T. Li, K. Wang, D. Kimpe, P. Carns, R. Ross, and I. Raicu. Fusionfs: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems. In 2014 IEEE International Conference on Big Data, pages 61--70, Washington, DC, Oct 2014.
[37]
G. Zheng, G. Gupta, E. Bohm, I. Dooley, and L. V. Kale. Simulating Large Scale Parallel Applications using Statistical Models for Sequential Execution Blocks. In Proceedings of the 16th International Conference on Parallel and Distributed Systems.

Cited By

View all
  • (2024)BurstBalancer: Do Less, Better Balance for Large-Scale Data Center TrafficIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329545435:6(932-949)Online publication date: Jun-2024
  • (2023)SimMSG: Simulating Transportation of MPI Messages in High Performance Computing Systems2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00051(319-328)Online publication date: 17-Dec-2023
  • (2019)HyperX topologyProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356140(1-23)Online publication date: 17-Nov-2019
  • Show More Cited By

Index Terms

  1. FatTreeSim: Modeling Large-scale Fat-Tree Networks for HPC Systems and Data Centers Using Parallel and Discrete Event Simulation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGSIM PADS '15: Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation
      June 2015
      300 pages
      ISBN:9781450335836
      DOI:10.1145/2769458
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 June 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. blue gene/q
      2. datacenter interconnection network
      3. fat-tree networks
      4. parallel discrete event simulation
      5. supercompute interconnection networks

      Qualifiers

      • Research-article

      Conference

      SIGSIM-PADS '15
      Sponsor:

      Acceptance Rates

      SIGSIM PADS '15 Paper Acceptance Rate 35 of 60 submissions, 58%;
      Overall Acceptance Rate 398 of 779 submissions, 51%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)BurstBalancer: Do Less, Better Balance for Large-Scale Data Center TrafficIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329545435:6(932-949)Online publication date: Jun-2024
      • (2023)SimMSG: Simulating Transportation of MPI Messages in High Performance Computing Systems2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00051(319-328)Online publication date: 17-Dec-2023
      • (2019)HyperX topologyProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356140(1-23)Online publication date: 17-Nov-2019
      • (2018)Modeling Large-Scale Slim Fly Networks Using Parallel Discrete-Event SimulationACM Transactions on Modeling and Computer Simulation10.1145/320340628:4(1-25)Online publication date: 30-Aug-2018
      • (2018)Exascale Interconnect Topology Characterization and Parameter Exploration2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2018.00136(810-819)Online publication date: Jun-2018
      • (2017)A brief history of HPC simulation and future challengesProceedings of the 2017 Winter Simulation Conference10.5555/3242181.3242210(1-12)Online publication date: 3-Dec-2017
      • (2017)Guest Editorial for the TOMACS Special Issue on the Principles of Advanced Discrete Simulation (PADS)ACM Transactions on Modeling and Computer Simulation10.1145/308454327:2(1-3)Online publication date: 6-Jul-2017
      • (2017)A brief history of HPC simulation and future challenges2017 Winter Simulation Conference (WSC)10.1109/WSC.2017.8247804(419-430)Online publication date: Dec-2017
      • (2016)An Integrated Interconnection Network Model for Large-Scale Performance PredictionProceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/2901378.2901396(177-187)Online publication date: 15-May-2016
      • (2016)Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event SimulationProceedings of the 2016 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/2901378.2901389(189-199)Online publication date: 15-May-2016
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media