Abstract
Various topologies have been proposed for high-performance computing (HPC), i.e., fat-tree, Torus topology. Compared with conventional fat-tree topology, Torus performs much better when applied in HPC. Unfortunately, due to its wraparound links, Torus topology naturally has the tendency to trigger deadlock incidents inside the network. Researchers solve this problem by means of virtual channel, but this approach will also restrict the routing of message. In this paper, we propose a deadlock-free topology for HPC, called Mesh-of-Torus, which incarnates the good characteristics of Mesh and Torus topology. Comparing with mesh, Mesh-of-Torus has shorter network diameter. Furthermore, we have proposed a corresponding port assignment rules in consideration of complicated internal arbitration or scheduling mechanism incurred by the employment of virtual channel. Deadlock avoidance can be achieved when dimension-order routing algorithm and our port assignment rules are applied to Mesh-of-Torus. Finally, simulations and mathematical analysis have shown that Mesh-of-Torus outperforms Mesh in terms of average end-to-end latency and network load distribution.
Similar content being viewed by others
References
Arabnia HR, Oliver MA (1987) A transfer network for the arbitrary rotation of digitised images. Comput J 30(5):425–432
Wijngaart RFVD, Georganas E, Mattson TG, Wissink A (2017) A new parallel research kernel to expand research on dynamic load-balancing capabilities. In: International Supercomputing Conference
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor-theoretical properties and algorithms. Parallel Comput 21(11):1783–1805
Ding M, Tian H (2016) PCA-based network traffic anomaly detection. Tsinghua Sci Technol 21(2):500–509
Alonso P, Ranilla J, Aguiar JV (2017) High-performance computing. J Supercomput 73(1):1–3
Seitz CL (1985) The cosmic cube. Commun ACM 28(1):22–33
Chen D, Eisley NA, Heidelberger P, Senger RM, Sugawara Y, Kumar S, Salapura V, Satterfield DL, Burow BS., Parker JJ (2011) The IBM Blue Gene/Q interconnection network and message unit. In: High Performance Computing, Networking, Storage and Analysis
Xenopoulos P, Daniel J, Matheson M, Sukumar S (2016) Big data analytics on HPC architectures: performance and cost. In: IEEE International Conference on Big Data
González ÁF, RosilloEmai R, Dávila JÁM, Matellán V (2015) Historical review and future challenges in supercomputing and networks of scientific communication. J Supercomput 71(12):4476–4503
Azad HS, Bagherzadeh N, Jaberipour G (2015) Advances in multicore systems architectures. J Supercomput 71(8):2783–2786
Bermúdez Garzón DF, Requena CG, Gómez ME, López P, Duato J (2016) A family of fault-tolerant efficient indirect topologies. IEEE Trans Parallel Distrib Syst 27(4):927–940
Dhanak M, Godbole PD, Patil RA (2016) Torus network labeling in high performance computing. In: International Conference on Computing Communication Control and Automation
Yu Z, Xiang D, Wang X (2015) Balancing virtual channel utilization for deadlock-free routing in torus networks. J Supercomput 71(8):3094–3115
Abbas D, Jamshidi K (2015) A fault-tolerant hierarchical hybrid mesh-based wireless network-on-chip architecture for multicore platforms. J Supercomput 71(8):3116–3148
Prisacari B, Rodriguez G, Minkenberg C, Palacio RB (2012) Performance implications of deadlock avoidance techniques in torus networks. In: International Conference on High Performance Switching and Routing
Puente V, Beivide R, Gregorio JA, Prellezo JM, Duato J, Izu C (1999) Adaptive bubble router: a design to improve performance in torus networks. In: International Conference on Parallel Processing
Jeong YS, Lee SE (2013) Deadlock-free XY-YX router for on-chip interconnection network. Ieice Electron Express 10(20):20130699
Yu Z, Wang X, Shen K (2016) Conditional forwarding: simple flow control to increase adaptivity for fully adaptive routing algorithms. J Supercomput 72(2):639–653
Boden NJ, Cohen D, Felderman RE (1995) Myrinet: a gigabit-per-second local area network. Micro IEEE 15(1):29–36
Veselovsky G, Batovski DA (2003) A study of the permutation capability of a binary hypercube under deterministic dimension-order routing. In: Parallel, Distributed and Network-Based Processing
Ren P, Kinsy MA, Zheng N (2016) Fault-aware load-balancing routing for 2D-mesh and torus on-chip network topologies. IEEE Trans Comput 65(3):873–887
Šeda M, Šedová J, Horký M (2017) Multichannel queueing systems and their simulation. In: Applied Physics, System Science and Computers. APSAC
Cheng B, Fan J, Jia X (2013) Parallel construction of independent spanning trees and an application in diagnosis on Möbius cubes. J Supercomput 65(3):1279–1301
Xiang D, Pan Y, Wang Q, Chen Z (2008) Deadlock-free fully adaptive routing in 2-dimensional tori based on a new virtual network partitioning scheme. In: International Conference on Distributed Computing Systems
Liu Z, Fan J, Jia X (2015) Embedding complete binary trees into parity cubes. J Supercomput 71(1):1–27
Farrington PA, Nembhard HB, Sturrock DT, Evans GW, Chang X (2009) Network simulations with Opnet. In: Winter Simulation Conference
Lang H, Quan Z (2008) OPNET modeling and simulation of MSM Clos switch fabric and algorithm with OPNET. Mod Electron Tech 19:011
Li H, Cheng Y, Zhou C, Zhuang W (2009) Minimizing end-to-end delay: a novel routing metric for multi-radio wireless mesh networks. In: International Conference on Computer Communications
Yu Y, Huang Y, Zhao B, Hua Y (2008) Throughput analysis of wireless mesh networks. In: International Conference on Acoustics, Speech, and Signal Processing
Zhao D, Zou J, Todd TD (2007) Admission control with load balancing in IEEE 802.11-based ESS mesh networks. Wireless Netw 13(3):351–359
Yu J, Bang HC, Lee H, Yang SL (2016) Adaptive internet of things and web of things convergence platform for Internet of reality services. J Supercomput 72(1):84–102
Wani MA, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–193
Arabnia HR (1996) Distributed stereocorrelation algorithm. Int J Comput Commun 19(8):707–712
Wang X, Fan JX, Lin CK (2018) BCDC: a high-performance, server-centric data center network. J Comput Sci Technol 33(2):400–416
Wang T, Su Z, Xia Y (2018) CLOT: a cost-effective low-latency overlaid torus-based network architecture for data centers. In: IEEE International Conference on Communications
Acknowledgements
This work was supported by the National Science Foundation of China under Grants 61634004 and 61472300, the Fundamental Research Funds for the Central Universities Grant Nos. JB170107 and JB180309, and the key research and development plan of Shaanxi province No. 2017ZDCXL-GY-05-01.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xie, P., Gu, H., Wang, K. et al. Mesh-of-Torus: a new topology for server-centric data center networks. J Supercomput 75, 255–271 (2019). https://doi.org/10.1007/s11227-018-2610-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2610-4