ABSTRACT
Silicon devices are becoming less and less reliable as technology moves to smaller feature sizes. As a result, digital systems are increasingly likely to experience permanent failures during their life-time. To overcome this problem, networks-on-chip (NoCs) should be designed to, not only fulfill performance requirements, but also be robust to many fault occurrences. This paper proposes a fault- and application-aware routing framework called FATE: it leverages the diversity of communication patterns in applications for highly faulty NoCs to reduce congestion during execution. To this end, FATE estimates routing demands in applications to balance traffic load among the available resources. We propose a set of novel route-enabling rules that greatly reduce the search for deadlock-free, maximally-connected routes for any faulty 2D mesh topology, by preventing early on the exploration of routing configuration options that lead eventually to unviable solutions. Our experimental results show a 33% improvement on average saturation throughput for synthetic traffic patterns, and a 59% improvement on average packet latency for SPLASH-2 benchmarks, over state-of-the-art fault-tolerant solutions. The FATE approach is also beneficial in the complete absence of faults: indeed, it outperforms prior fully-adaptive routing techniques by improving the saturation throughput by up to 33%.
- K. Aisopos, A. DeOrio, L.-S. Peh, and V. Bertacco. Ariadne: agnostic reconfiguration in a disconnected network environment. In Proc. PACT, 2011. Google ScholarDigital Library
- G. Ascia, V. Catania, M. Palesi, and D. Patti. Implementation and analysis of a new selection strategy for adaptive routing in networks-on-chip. IEEE Trans. Computers, 57(6), 2008. Google ScholarDigital Library
- M. Badr and N. Jerger. SynFull: synthetic traffic models capturing cache coherent behavior. In Proc. ISCA, 2014. Google ScholarDigital Library
- N. Barrow-Williams, C. Fensch, and S. Moore. A communication characterisation of Splash-2 and Parsec. In Proc. IISWC, 2009. Google ScholarDigital Library
- C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proc. PACT, 2008. Google ScholarDigital Library
- N. Binkert et al. The gem5 simulator. ACM SIGARCH Computer Architecture News, 39(2), 2011. Google ScholarDigital Library
- J. Cano et al. Efficient routing in heterogeneous SoC designs with small implementation overhead. IEEE Trans. Computers, 63(2), 2014. Google ScholarDigital Library
- J. Cong, C. Liu, and G. Reinman. ACES: application-specific cycle elimination and splitting for deadlock-free routing on irregular network-on-chip. In Proc. DAC, 2010. Google ScholarDigital Library
- W. Dally and C. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Computers, C-36(5), 1987. Google ScholarDigital Library
- J. Flich et al. A survey and evaluation of topology-agnostic deterministic routing algorithms. IEEE Trans. PDS, 23(3), 2012. Google ScholarDigital Library
- C. Glass and L. Ni. The turn model for adaptive routing. In Proc. ISCA, 1992. Google ScholarDigital Library
- P. Gratz, B. Grot, and S. Keckler. Regional congestion awareness for load balance in networks-on-chip. In Proc. HPCA, 2008.Google ScholarCross Ref
- J. Henkel et al. Reliable on-chip systems in the nano-era: lessons learnt and future trends. In Proc. DAC, 2013. Google ScholarDigital Library
- C. Iordanou, V. Soteriou, and K. Aisopos. Hermes: architecting a top-performing fault-tolerant routing algorithm for networks-on-chips. In Proc. ICCD, 2014.Google Scholar
- N. Jiang et al. A detailed and flexible cycle-accurate network-on-chip simulator. In Proc. ISPASS, 2013.Google ScholarCross Ref
- M. Kinsy et al. Optimal and heuristic application-aware oblivious routing. In IEEE Trans. Computers, 2013. Google ScholarDigital Library
- M. Li, Q.-A. Zeng, and W.-B. Jone. DyXY: a proximity congestion-aware deadlock-free dynamic routing method for network on chip. In Proc. DAC, 2006. Google ScholarDigital Library
- S. Murali et al. Analysis of error recovery schemes for networks on chips. IEEE Design & Test, 22(5), 2005. Google ScholarDigital Library
- M. Palesi et al. Design of bandwidth aware and congestion avoiding efficient routing algorithms for networks-on-chip platforms. In Proc. NOCS, 2008. Google ScholarDigital Library
- R. Parikh and V. Bertacco. uDIREC: unified diagnosis and reconfiguration for frugal bypass of NoC faults. In Proc. MICRO, 2013. Google ScholarDigital Library
- P. Ren et al. Fault-tolerant routing for on-chip network without using virtual channels. In Proc. DAC, 2014. Google ScholarDigital Library
- J. Sancho, A. Robles, and J. Duato. An effective methodology to improve the performance of the Up*/Down* routing algorithm. IEEE Trans. PDS, 15(8), 2004. Google ScholarDigital Library
- M. Schroeder et al. Autonet: a high-speed, self-configuring local area network using point-to-point links. IEEE Journal of Selected Areas in Communications, 9(8), 1991. Google ScholarDigital Library
- A. Shafiee et al. Application-aware deadlock-free oblivious routing based on extended turn-model. In Proc. ICCAD, 2011. Google ScholarDigital Library
- S. Woo et al. The SPLASH-2 programs: characterization and methodological considerations. In Proc. ISCA, 1995. Google ScholarDigital Library
Index Terms
- Highly Fault-tolerant NoC Routing with Application-aware Congestion Management
Recommendations
A low overhead, fault tolerant and congestion aware routing algorithm for 3D mesh-based Network-on-Chips
A fault tolerant routing algorithm (FT-DyXYZ) for 3D Network-on-Chip is presented.FT-DyXYZ has low overhead and utilizes proximity congestion to balance traffic.FT-DyXYZ outperforms planar-adaptive routing in fault free and faulty situations.FT-DyXYZ ...
Fault-tolerant Network-on-Chip based on Fault-aware Flits and Deflection Routing
NOCS '15: Proceedings of the 9th International Symposium on Networks-on-ChipDeflection routing is a promising approach for energy and hardware efficient NoCs. Future VLSI designs will have an increasing susceptibility to failures and breakdowns. The inherent redundancy of NoCs can be used to tolerate such failures. We extended ...
A degradable NoC router for the improvement of fault-tolerant routing performance
AbstractNetwork-on-chip (NoC) provides high computation performance for a wide range of applications including robotics and artificial intelligence. This paper deals with the issue of improving the fault-tolerant routing performance for realizing high-...
Comments