Skip to main content
Log in

Conditional forwarding: simple flow control to increase adaptivity for fully adaptive routing algorithms

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Fully adaptive routing algorithm has been widely used by modern commercial supercomputers due to its high path diversity. However, fully adaptive routing algorithm is prone to deadlock especially when wormhole switching with non-atomic virtual channel (VC) allocation is employed. Non-atomic VC allocation means that a VC can be immediately allocated once the tail flit of last packet arrives. Duato’s theory gives a general methodology for deadlock-free fully adaptive routing design by dividing VCs into escape and adaptive ones, and prohibiting packets from using adaptive VCs after using escape VCs. However, this prohibition usually induces adaptivity loss and performance degradation. To address these issues, we extend Duato’s theory and propose conditional forwarding flow control (CFFC): for packet residing in escape VC and requesting adaptive VC, it can be forwarded if the requested adaptive VC has enough free buffers to hold the whole packet. By allowing packets to regain adaptivity, CFFC enables the design of a fully adaptive routing algorithm with high routing adaptivity. By supporting non-atomic VC allocation, CFFC maintains efficient VC utilization. We prove that CFFC will not introduce deadlock if the routing algorithm is deadlock-free using non-atomic VC allocation, i.e. the routing subfunction applied in escape VCs is connected and deadlock-free. Simulation results show that our proposed method exhibits higher VC utilization and performs averagely 14.8 % better than existing fully adaptive routing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. This statement has been proved in page 1229 of [20].

References

  1. Dally WJ, Towles B (2004) Principles and practices of interconnection networks. Morgan Kaufmann, San Francisco

    Google Scholar 

  2. Duato J, Yalamanchili S, Ni L (1997) Interconnection networks: an engineering approach. IEEE Press, New York

    Google Scholar 

  3. Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor: theoretical properties and algorithms. Elsevier Parallel Comput 21(11):1783–1806

    Article  Google Scholar 

  4. Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114

    Article  Google Scholar 

  5. http://www.top500.org/

  6. Scott SL, Thorson GM (1996) The cray T3E network: adaptive routing in a high performance 3D Torus. In: Proceedings of high-performance interconnects symposium, hot interconnects IV, Stanford University

  7. Abts D (2011) The cray XT4 and seastar 3-D torus interconnect. Encycl Parallel Comput. doi:10.1007/978-0-387-09766-4_22

  8. Xie M, Lu YT, Wang KF, Liu L, Cao HJ, Yang XJ (2012) Tianhe-1A interconnect and message-passing services. IEEE Micro 32(1):8–20

    Article  Google Scholar 

  9. Fu B, Han YH, Li HW (2011) An abacus turn model for time/space-efficient reconfigurable routing. In: Proceedings of international symposium on computer architecture, ISCA 2011. June 2011

  10. Chen LZ, Pinkston TM (2013) Worm-bubble flow control. In: Proceedings of the 19th IEEE international symposium on high-performance computer architecture, HPCA 2013. Feb 2013

  11. Ma S, Wang ZY, Jerger NE (2014) Leaving one slot empty: flit bubble flow control for torus cache-coherent NoCs. IEEE Trans Comput. doi:10.1109/TC.2013.2295523

  12. Kim H, Kim G, Yeo H, Maeng S, Kim J (2014) Transportation-network inspired network-on-chip. In: Proceedings of 2014 IEEE international conference on high performance architectures, HPCA 2014, pp 332–343

  13. Ajima YS, Sumimoto S, Shimizu T (2009) Tofu: a 6-D mesh/torus interconnect for exascale computers. Computer 42(11):36–40

    Article  Google Scholar 

  14. Adiga NR, Blumrich MA et al (2005) Blue gene/l torus interconnection network. IBM J Res Dev 49(2):265–276

    Article  Google Scholar 

  15. Wang Y, Zhang M, Fu Q, Pang Z (2012) Adaptive bubble scheme with minimal buffers in torus networks. In: Proceedings of international conference on high performance computing and communication, pp 914–919. June 2012

  16. Puente V, Izu C, Beivide R, Gregorio JA, Vallejo F, Prellezo JM (2001) The adaptive bubble router. J Parallel Distrib Comput 61(9):1180–1208

    Article  MATH  Google Scholar 

  17. Chen LZ, Wang RS, Pinkston TM (2011) Critical bubble scheme: an efficient implementation of globally aware network flow control. In: Proceedings of the 2011 IEEE international parallel distributed processing symposium, IPDPS 2011, pp 592–603. May 2011

  18. Duato J (1993) A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distrib Syst 4(12):1320–1331

    Article  Google Scholar 

  19. Ma S, Wang Z, Jerger NE, Shen L, Xiao N (2014) Novel flow control for fully adaptive routing in cache-coherent nocs. IEEE Trans Parallel Distrib Syst 25(9):2397–2407

    Article  Google Scholar 

  20. Duato J, Pinkston TM (2001) A general theory for deadlock-free adaptive routing using a mixed set of resources. IEEE Trans Parallel Distrib Syst 12(12):1219–1235

    Article  Google Scholar 

  21. Hu J, Marculescu R (2004) DyAD: smart routing for networks-on-chip. In: Proceedings of the design automation conference, DAC 2004, pp 260–263

  22. Li M, Zeng Q, Jone WB (2006) DyXY: a proximity congestion-aware deadlock-free dynamic routing method for network on chip. In: Proceedings of the design automation conference, DAC 2006, pp 849–852

  23. Xiang D (2011) Deadlock-free adaptive routing in meshes with fault-tolerance ability based on channel overlapping. IEEE Trans Dependable Secure Comput 8(1):74–88

    Article  Google Scholar 

  24. Glass CJ, Ni L (1994) The turn model for adaptive routing. J ACM 41(5):874–902

    Article  Google Scholar 

  25. Chiu GM (2000) The odd-even turn model for adaptive routing. IEEE Trans Parallel Distrib Syst 11(7):729–738

    Article  Google Scholar 

  26. Dally WJ, Seitz GL (1987) Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans Comput 36(5):547–553

    Article  MATH  Google Scholar 

  27. Xu Y, Zhao B, Zhang YT, Yang J (2010) Simple virtual channel allocation for high throughput and high frequency on-chip routers. In: Proceedings of 2010 IEEE international conference on high performance architectures, HPCA 2010, pp 1–11

  28. Yu ZG, Xiang D, Wang XY (2013) VCBR: virtual channel balanced routing in torus networks, In: Proceedings of 2013 IEEE international conference on high performance computing and communications, HPCC 2013, pp 1359–1365

  29. Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–433

    Article  Google Scholar 

  30. Gu HX, Liu Z, Wang K (2006) Distribute adaptive routing in torus networks. J Xidian Univ (Science) 33(3):352–358

    Google Scholar 

  31. Liu R, Gu HX, Yu X, Nian X (2013) Distributed flow scheduling in energy-aware datacenter networks. IEEE Commun Lett 17(4):801–804

    Article  MATH  Google Scholar 

  32. Valafar H, Arabnia HR, Williams G (2004) Distributed global optimization and its development on the multiring network. Neural Parallel Sci Comput 12(4):465–490

    MathSciNet  MATH  Google Scholar 

  33. Luo W, Xiang D (2012) An efficient adaptive deadlock-free routing algorithm for torus networks. IEEE Trans Parallel Distrib Syst 23(5):800–808

    Article  Google Scholar 

  34. Jesshope CR, Miller PR, Yantchev JT (1989) High performance communication processor networks. In: Proceedings of the 16th international symposium on computer architecture, ISCA 1989, pp 150–157

  35. Arabnia HR (1996) Distributed stereocorrelation algorithm. Int J Comput Commun (Elsevier Science) 1996:707–712

    Article  Google Scholar 

  36. Verbeek F, Schialtz J (2011) A comment on a necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distrib Syst 22(10):1775–1776

    Article  Google Scholar 

  37. Verbeek F, Schialtz J (2011) On necessary and sufficient conditions for deadlock-free routing in wormhole networks. IEEE Trans Parallel Distrib Syst 22(10):2022–2032

    Article  Google Scholar 

  38. Schwiebert L, Jayasimha DN (1996) A Necessary and sufficient condition for deadlock-free wormhole routing. J Parallel Distrib Comput 32(1):103–117

    Article  Google Scholar 

  39. Alonso MG, Xiang D, Flich J, Yu ZG, Duato J (2014) Achieving balanced buffer utilization with a proper co-design of flow control and routing algorithm. In: Proceeding of the 8th IEEE/ACM international symposium on networks-on-chip, NOCS 2014, pp 25–32

  40. Dally WJ (1992) Virtual-channel flow control. IEEE Trans Paralle Distrib Syst 3(3):194–205

    Article  Google Scholar 

  41. Towles B, Grossman JP, Greskamp B, Shaw DE (2014) Unifying on-chip and inter-node switching within the Anton 2 network. In: Proceeding of the 41st annual international symposium on computer architecture, ISCA 2014. IEEE Press, Piscataway, pp 1–12

  42. Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–193

    Article  Google Scholar 

  43. Yu ZG, Xiang D, Wang XY (2015) Balancing virtual channel utilization for deadlock-free routing in torus networks. J Supercomput 71(8):3094–3115

    Article  Google Scholar 

  44. Chen K, Gu HX, Yang YT, Fan D (2014) A novel two-layer passive optical interconnection network for on-chip communication. J Lightwave Technol 32(9):1770–1776

    Article  Google Scholar 

  45. Jiang N, Becker DU, Michelogiannakis G, Balfour J, Towles B, Kim J, Dally WJ (2013) A detailed and flexible cycle-accurate network-on-chip simulator. In: Proceedings of the 2013 IEEE international symposium on performance analysis of systems and software, ISPASS 2013, pp 88–96

  46. Singh Dally WJ, Gupta AK (2003) GOAL: a load-balanced adaptive routing algorithm for torus network. In: Proceedings of international symposium on computer architecture, ISCA 2003, pp 194–205

  47. Arif WM, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63

    Article  MATH  Google Scholar 

  48. Andujar-Munoz FJ, Villar-Ortiz JA, Sanchez JL, Alfaro FJ, Duato J (2014) Building 3D torus using low-profile expansion cards. IEEE Trans Comput. doi:10.1109/TC.2013.155

Download references

Acknowledgments

We sincerely thank the anonymous reviewers for their helpful comments and suggestions. This work is supported in part by the National Science Foundation of China under grants 61171121, 61402086, 61572279 and Scientific Research Foundation of Liaoning Provincial Education Department (No. L2015165), and DUFE Excellent Talents Project (No. DUFE2015R06).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhigang Yu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Z., Wang, X. & Shen, K. Conditional forwarding: simple flow control to increase adaptivity for fully adaptive routing algorithms. J Supercomput 72, 639–653 (2016). https://doi.org/10.1007/s11227-015-1597-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1597-3

Keywords