Abstract
Network-on-Chip (NoC) with excellent scalability and high bandwidth has been considered to be the most promising communication architecture for complex integration systems. However, NoC reliability is getting continuously challenging for the shrinking semiconductor feature size and increasing integration density. Moreover, a single node failure in NoC might destroy the network connectivity and corrupt the entire system. Introducing redundancies is an efficient method to construct a resilient communication path. However, prior work based on redundancies, either results in limited reliability with coarse grain protection or involves even larger hardware overhead with fine grain. In this paper, we notice that data path such as links, buffers and crossbars in NoC can be divided into multiple identical parallel slices, which can be utilized as inherent redundancy to enhance reliability. As long as there is one fault-free slice left available, the proposed salvaging scheme named as RevivePath, can be employed to make the overall data path still functional. Furthermore, RevivePath uses the direct redundancy to protect the control path such as switch arbiter, routing computation, to provide a full fault-tolerant scheme to the whole router. Experimental results show that it achieves quite high reliability with graceful performance degradation even under high fault rate.
Similar content being viewed by others
References
Benini L, De Micheli G. Networks on chips: A new SoC paradigm. Computer, 2002, 35(1): 70–78.
De Micheli G, Benini L. Networks on Chips: Technology and Tools. Morgan Kaufmann Pub, 2006.
Borkar S. Microarchitecture and design challenges for gigascale integration. In Proc. the 37th International Symposium on Microarchitecture, Dec. 2004, p.3.
Dally W, Towles B. Route packets, not wires: On-chip inter-connection networks. In Proc. Design Automation Conference, June 2001, pp.684-689.
Borkar S. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro, 2005, 25(6): 10–16.
Constantinescu C. Trends and challenges in VLSI circuit reliability. IEEE Micro, 2003, 23(4): 14–19.
Zhang L, Han Y, Xu Q et al. On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems. IEEE Trans. Very Large Scale Integration Systems, 2009, 17(9): 1173–1186.
Boppana R V, Chalasani S. Fault-tolerant routing with non-adaptive wormhole algorithms in mesh networks. In Proc. Supercomputing, Nov. 1994, pp.693-702.
Zhang Z, Greiner A, Taktak S. A reconfigurable routing algorithm for a fault-tolerant 2D-mesh network-on-chip. In Proc. Design Automation Conference, June 2008, pp.441-446.
Flick D, DeOrio A, Chen G et al. A highly resilient routing algorithm for fault-tolerant NoCs. In Proc. Conf. Design, Automation and Test in Europe, April 2009, pp.21-26.
Flich J, Rodrigo S, Duato J. An efficient implementation of distributed routing algorithms for NoCs. In Proc. Int. Symp. Networks-on-Chip, April 2008, pp.87-96.
Wang J, Gu H, Yang Y et al. An energy- and buffer-aware fully adaptive routing algorithm for Network-on-Chip. Microelectronics Journal, 2013, 44(2): 137–144.
Xiang D, Zhang Y, Pan Y. Practical deadlock-free fault-tolerant routing in meshes based on the planar network fault model. IEEE Trans. Computers, 2009, 58(5): 620–633.
Xiang D, Luo W. An efficient adaptive deadlock-free routing algorithm for torus networks. IEEE Trans. Parallel and Distributed System, 2012, 23(5): 800–808.
Siewiorek D, Swarz R. Reliable Computer Systems: Design and Evaluation (3rd edition). A K Peters/CRC Press, 1998.
Smolens J, Gold B, Kim J et al. Fingerprinting: Bounding soft-error-detection latency and bandwidth. In Proc. the 11th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Oct. 2004, pp.224-234.
Weaver C, Austin T. A fault tolerant approach to microprocessor design. In Proc. International Conference on Dependable Systems and Networks, June 2001, pp.411-420.
Constantinides K, Plaza S, Blome J et al. BulletProof: A defect-tolerant CMP switch architecture. In Proc. the 12th International Symposium on High-Performance Computer Architecture, Feb. 2006, pp.5-16.
Hegde R, Shanbhag N R. Toward achieving energy efficiency in presence of deep submicronnoise. IEEE Trans. Very Large Scale Integration Systems, 2000, 8(4): 379–391.
Kim J, Park D, Nicopoulos C et al. Design and analysis of an NoC architecture from performance, reliability and energy perspective. In Proc. Int. Symp. Architecture for Networking and Communications Systems, Oct. 2005, pp.173-182.
Murali S, Atienza D, Benini L et al. A multi-path routing strategy with guaranteed in-order packet delivery and fault tolerance for networks on chip. In Proc. Design Automation Conference, June 2006, pp.845-848.
Koibuchi M, Matsutani H, Amano H et al. A lightweight fault-tolerant mechanism for network-on-chip. In Proc. ACM/IEEE International Symposium on Networks-on-Chip, April 2008, pp.13-22.
Fick D, DeOrio A, Hu J et al. Vicis: A reliable network for unreliable silicon. In Proc. the 46th Design Automation Conference, July 2009, pp.812-817.
Palesi M, Kumar S, Catania V. Leveraging partially faulty links usage for enhancing yield and performance in networks-on-chip. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2010, 29(3): 426–440.
Alaghi A, Karimi N, Sedghi M et al. Online NoC switch fault detection and diagnosis using a high level fault model. In Proc. International Symposium on Defect and Fault Tolerance in VLSI Systems, Sept. 2007, pp.21-29.
Gomez M E, Duato J, Flich J et al. An efficient fault-tolerant routing methodology for meshes and tori. Computer Architecture Letters, 2004, 3(1): 3.
Ho C T, Stockmeyer L. A new approach to fault-tolerant wormhole routing for mesh-connected parallel computers. IEEE Trans. Computers, 2004, 53(4): 427–438.
Han Y, Xu Y, Li H et al. Test resource partitioning based on efficient response compaction for test time and tester channels reduction. In Proc. Asian Test Symposium, Nov. 2003, pp.440-445.
Han Y, Xu Y, Chandra A et al. Test resource partitioning based on efficient response compaction for test time and tester channels reduction. Journal of Computer Science and Technology, 2005, 20(2): 201–210.
Han Y, Hu Y, Li X et al. Embedded test decompressor to reduce the required channels and vector memory of tester for complex processor circuit. IEEE Trans. Very Large Scale Integration Systems, 2007, 15(5): 531–540.
Han Y, Hu Y, Li H et al. Theoretic analysis and enhanced X-tolerance of test response compact based on convolutional code. In Proc. the 2005 Asia and South Pacific Design Automation Conference, Jan. 2005, pp.53-58.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Han, YH., Liu, C., Lu, H. et al. RevivePath: Resilient Network-on-Chip Design Through Data Path Salvaging of Router. J. Comput. Sci. Technol. 28, 1045–1053 (2013). https://doi.org/10.1007/s11390-013-1396-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-013-1396-3