Abstract
Fully adaptive routing algorithm has been widely used by modern commercial supercomputers due to its high path diversity. However, fully adaptive routing algorithm is prone to deadlock especially when wormhole switching with non-atomic virtual channel (VC) allocation is employed. Non-atomic VC allocation means that a VC can be immediately allocated once the tail flit of last packet arrives. Duato’s theory gives a general methodology for deadlock-free fully adaptive routing design by dividing VCs into escape and adaptive ones, and prohibiting packets from using adaptive VCs after using escape VCs. However, this prohibition usually induces adaptivity loss and performance degradation. To address these issues, we extend Duato’s theory and propose conditional forwarding flow control (CFFC): for packet residing in escape VC and requesting adaptive VC, it can be forwarded if the requested adaptive VC has enough free buffers to hold the whole packet. By allowing packets to regain adaptivity, CFFC enables the design of a fully adaptive routing algorithm with high routing adaptivity. By supporting non-atomic VC allocation, CFFC maintains efficient VC utilization. We prove that CFFC will not introduce deadlock if the routing algorithm is deadlock-free using non-atomic VC allocation, i.e. the routing subfunction applied in escape VCs is connected and deadlock-free. Simulation results show that our proposed method exhibits higher VC utilization and performs averagely 14.8 % better than existing fully adaptive routing algorithms.









Similar content being viewed by others
Notes
This statement has been proved in page 1229 of [20].
References
Dally WJ, Towles B (2004) Principles and practices of interconnection networks. Morgan Kaufmann, San Francisco
Duato J, Yalamanchili S, Ni L (1997) Interconnection networks: an engineering approach. IEEE Press, New York
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor: theoretical properties and algorithms. Elsevier Parallel Comput 21(11):1783–1806
Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114
Scott SL, Thorson GM (1996) The cray T3E network: adaptive routing in a high performance 3D Torus. In: Proceedings of high-performance interconnects symposium, hot interconnects IV, Stanford University
Abts D (2011) The cray XT4 and seastar 3-D torus interconnect. Encycl Parallel Comput. doi:10.1007/978-0-387-09766-4_22
Xie M, Lu YT, Wang KF, Liu L, Cao HJ, Yang XJ (2012) Tianhe-1A interconnect and message-passing services. IEEE Micro 32(1):8–20
Fu B, Han YH, Li HW (2011) An abacus turn model for time/space-efficient reconfigurable routing. In: Proceedings of international symposium on computer architecture, ISCA 2011. June 2011
Chen LZ, Pinkston TM (2013) Worm-bubble flow control. In: Proceedings of the 19th IEEE international symposium on high-performance computer architecture, HPCA 2013. Feb 2013
Ma S, Wang ZY, Jerger NE (2014) Leaving one slot empty: flit bubble flow control for torus cache-coherent NoCs. IEEE Trans Comput. doi:10.1109/TC.2013.2295523
Kim H, Kim G, Yeo H, Maeng S, Kim J (2014) Transportation-network inspired network-on-chip. In: Proceedings of 2014 IEEE international conference on high performance architectures, HPCA 2014, pp 332–343
Ajima YS, Sumimoto S, Shimizu T (2009) Tofu: a 6-D mesh/torus interconnect for exascale computers. Computer 42(11):36–40
Adiga NR, Blumrich MA et al (2005) Blue gene/l torus interconnection network. IBM J Res Dev 49(2):265–276
Wang Y, Zhang M, Fu Q, Pang Z (2012) Adaptive bubble scheme with minimal buffers in torus networks. In: Proceedings of international conference on high performance computing and communication, pp 914–919. June 2012
Puente V, Izu C, Beivide R, Gregorio JA, Vallejo F, Prellezo JM (2001) The adaptive bubble router. J Parallel Distrib Comput 61(9):1180–1208
Chen LZ, Wang RS, Pinkston TM (2011) Critical bubble scheme: an efficient implementation of globally aware network flow control. In: Proceedings of the 2011 IEEE international parallel distributed processing symposium, IPDPS 2011, pp 592–603. May 2011
Duato J (1993) A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distrib Syst 4(12):1320–1331
Ma S, Wang Z, Jerger NE, Shen L, Xiao N (2014) Novel flow control for fully adaptive routing in cache-coherent nocs. IEEE Trans Parallel Distrib Syst 25(9):2397–2407
Duato J, Pinkston TM (2001) A general theory for deadlock-free adaptive routing using a mixed set of resources. IEEE Trans Parallel Distrib Syst 12(12):1219–1235
Hu J, Marculescu R (2004) DyAD: smart routing for networks-on-chip. In: Proceedings of the design automation conference, DAC 2004, pp 260–263
Li M, Zeng Q, Jone WB (2006) DyXY: a proximity congestion-aware deadlock-free dynamic routing method for network on chip. In: Proceedings of the design automation conference, DAC 2006, pp 849–852
Xiang D (2011) Deadlock-free adaptive routing in meshes with fault-tolerance ability based on channel overlapping. IEEE Trans Dependable Secure Comput 8(1):74–88
Glass CJ, Ni L (1994) The turn model for adaptive routing. J ACM 41(5):874–902
Chiu GM (2000) The odd-even turn model for adaptive routing. IEEE Trans Parallel Distrib Syst 11(7):729–738
Dally WJ, Seitz GL (1987) Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans Comput 36(5):547–553
Xu Y, Zhao B, Zhang YT, Yang J (2010) Simple virtual channel allocation for high throughput and high frequency on-chip routers. In: Proceedings of 2010 IEEE international conference on high performance architectures, HPCA 2010, pp 1–11
Yu ZG, Xiang D, Wang XY (2013) VCBR: virtual channel balanced routing in torus networks, In: Proceedings of 2013 IEEE international conference on high performance computing and communications, HPCC 2013, pp 1359–1365
Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–433
Gu HX, Liu Z, Wang K (2006) Distribute adaptive routing in torus networks. J Xidian Univ (Science) 33(3):352–358
Liu R, Gu HX, Yu X, Nian X (2013) Distributed flow scheduling in energy-aware datacenter networks. IEEE Commun Lett 17(4):801–804
Valafar H, Arabnia HR, Williams G (2004) Distributed global optimization and its development on the multiring network. Neural Parallel Sci Comput 12(4):465–490
Luo W, Xiang D (2012) An efficient adaptive deadlock-free routing algorithm for torus networks. IEEE Trans Parallel Distrib Syst 23(5):800–808
Jesshope CR, Miller PR, Yantchev JT (1989) High performance communication processor networks. In: Proceedings of the 16th international symposium on computer architecture, ISCA 1989, pp 150–157
Arabnia HR (1996) Distributed stereocorrelation algorithm. Int J Comput Commun (Elsevier Science) 1996:707–712
Verbeek F, Schialtz J (2011) A comment on a necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distrib Syst 22(10):1775–1776
Verbeek F, Schialtz J (2011) On necessary and sufficient conditions for deadlock-free routing in wormhole networks. IEEE Trans Parallel Distrib Syst 22(10):2022–2032
Schwiebert L, Jayasimha DN (1996) A Necessary and sufficient condition for deadlock-free wormhole routing. J Parallel Distrib Comput 32(1):103–117
Alonso MG, Xiang D, Flich J, Yu ZG, Duato J (2014) Achieving balanced buffer utilization with a proper co-design of flow control and routing algorithm. In: Proceeding of the 8th IEEE/ACM international symposium on networks-on-chip, NOCS 2014, pp 25–32
Dally WJ (1992) Virtual-channel flow control. IEEE Trans Paralle Distrib Syst 3(3):194–205
Towles B, Grossman JP, Greskamp B, Shaw DE (2014) Unifying on-chip and inter-node switching within the Anton 2 network. In: Proceeding of the 41st annual international symposium on computer architecture, ISCA 2014. IEEE Press, Piscataway, pp 1–12
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–193
Yu ZG, Xiang D, Wang XY (2015) Balancing virtual channel utilization for deadlock-free routing in torus networks. J Supercomput 71(8):3094–3115
Chen K, Gu HX, Yang YT, Fan D (2014) A novel two-layer passive optical interconnection network for on-chip communication. J Lightwave Technol 32(9):1770–1776
Jiang N, Becker DU, Michelogiannakis G, Balfour J, Towles B, Kim J, Dally WJ (2013) A detailed and flexible cycle-accurate network-on-chip simulator. In: Proceedings of the 2013 IEEE international symposium on performance analysis of systems and software, ISPASS 2013, pp 88–96
Singh Dally WJ, Gupta AK (2003) GOAL: a load-balanced adaptive routing algorithm for torus network. In: Proceedings of international symposium on computer architecture, ISCA 2003, pp 194–205
Arif WM, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63
Andujar-Munoz FJ, Villar-Ortiz JA, Sanchez JL, Alfaro FJ, Duato J (2014) Building 3D torus using low-profile expansion cards. IEEE Trans Comput. doi:10.1109/TC.2013.155
Acknowledgments
We sincerely thank the anonymous reviewers for their helpful comments and suggestions. This work is supported in part by the National Science Foundation of China under grants 61171121, 61402086, 61572279 and Scientific Research Foundation of Liaoning Provincial Education Department (No. L2015165), and DUFE Excellent Talents Project (No. DUFE2015R06).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yu, Z., Wang, X. & Shen, K. Conditional forwarding: simple flow control to increase adaptivity for fully adaptive routing algorithms. J Supercomput 72, 639–653 (2016). https://doi.org/10.1007/s11227-015-1597-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1597-3