skip to main content
research-article

RQNoC: A Resilient Quality-of-Service Network-on-Chip with Service Redirection

Published:17 February 2016Publication History
Skip Abstract Section

Abstract

In this article, we describe RQNoC, a service-oriented Network-on-Chip (NoC) resilient to permanent faults. We characterize the network resources based on the particular service that they support and, when faulty, bypass them, allowing the respective traffic class to be redirected. We propose two alternatives for service redirection, each having different advantages and disadvantages. The first one, Service Detour, uses longer alternative paths through resources of the same service to bypass faulty network parts, keeping traffic classes isolated. The second approach, Service Merge, uses resources of other services providing shorter paths but allowing traffic classes to interfere with each other. The remaining network resources that are common for all services employ additional mechanisms for tolerating faults. Links tolerate faults using additional spare wires combined with a flit-shifting mechanism, and the router control is protected with Triple-Modular-Redundancy (TMR). The proposed RQNoC network designs are implemented in 65nm technology and evaluated in terms of performance, area, power consumption, and fault tolerance. Service Detour requires 9% more area and consumes 7.3% more power compared to a baseline network, not tolerant to faults. Its packet latency and throughput is close to the fault-free performance at low-fault densities, but fault tolerance and performance drop substantially for 8 or more network faults. Service Merge requires 22% more area and 27% more power than the baseline and has a 9% slower clock. Compared to a fault-free network, a Service Merge RQNoC with up to 32 faults has increased packet latency up to 1.5 to 2.4× and reduced throughput to 70% or 50%. However, it delivers substantially better fault tolerance, having a mean network connectivity above 90% even with 32 network faults versus 41% of a Service Detour network. Combining Serve Merge and Service Detour improves fault tolerance, further sustaining a higher number of network faults and reduced packet latency.

References

  1. Muhammad Ali, Michael Welzl, and Sven Hessler. 2007. A fault tolerant mechanism for handling permanent and transient failures in a network on chip. In 4th International Conference on Information Technology, 2007 (ITNG’07). IEEE, 1027--1032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Evgeny Bolotin, Israel Cidon, Ran Ginosar, and Avinoam Kolodny. 2004. QNoC: QoS architecture and design process for network on chip. Journal of Systems Architecture 50, 2, 105--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Borkar. 2005. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25, 6, 10--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cristian Constantinescu. 2003. Trends and challenges in VLSI circuit reliability. IEEE Micro 4, 14--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Kypros Constantinides, Stephen Plaza, Jason Blome, Bin Zhang, Valeria Bertacco, Scott Mahlke, Todd Austin, and Michael Orshansky. 2006. Bulletproof: A defect-tolerant CMP switch architecture. In 12th International Symposium on High-Performance Computer Architecture, 2006. IEEE, 5--16.Google ScholarGoogle ScholarCross RefCross Ref
  6. Andrew DeOrio, David Fick, Valeria Bertacco, Dennis Sylvester, David Blaauw, Jin Hu, and Gregory Chen. 2012. A reliable routing architecture and algorithm for NoCs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 5, 726--739. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Marios Evripidou, Chrysostomos Nicopoulos, Vassos Soteriou, and Jongman Kim. 2012. Virtualizing virtual channels for increased network-on-chip robustness and upgradeability. In 2012 IEEE Computer Society Annual Symposium onVLSI (ISVLSI’12). IEEE, 21--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chaochao Feng, Zhonghai Lu, Axel Jantsch, Jinwen Li, and Minxuan Zhang. 2010. FoN: Fault-on-neighbor aware routing algorithm for networks-on-chip. In 2010 IEEE International SOC Conference (SOCC’10). IEEE, 441--446.Google ScholarGoogle ScholarCross RefCross Ref
  9. Chaochao Feng, Zhonghai Lu, Axel Jantsch, Minxuan Zhang, and Zuocheng Xing. 2013. Addressing transient and permanent faults in NoC with efficient fault-tolerant deflection router. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21, 6, 1053--1066. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. David Fick, Andrew DeOrio, Gregory Chen, Valeria Bertacco, Dennis Sylvester, and David Blaauw. 2009a. A highly resilient routing algorithm for fault-tolerant NoCs. In Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, 21--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David Fick, Andrew DeOrio, Jin Hu, Valeria Bertacco, David Blaauw, and Dennis Sylvester. 2009b. Vicis: A reliable network for unreliable silicon. In Proceedings of the 46th Annual Design Automation Conference. ACM, New York, NY, 812--817. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Gilabert, María Engracia Gómez, Simone Medardoni, and Davide Bertozzi. 2010. Improved utilization of NoC channel bandwidth by switch replication for cost-effective multi-processor systems-on-chip. In Proceedings of the 2010 4th ACM/IEEE International Symposium on Networks-on-Chip. IEEE Computer Society, 165--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cristian Grecu, Andre Ivanov, Res Saleh, Egor S. Sogomonyan, and Partha Pratim Pande. 2006. On-line fault detection and location for NoC interconnects. In 12th IEEE International On-Line Testing Symposium, 2006 (IOLTS’06). IEEE, 6 pp. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yinhe Han and Binzhang Fu. 2009. A new fault-tolerant routing based on turn model. In Proceedings of the 3rd Workshop on Diagnostic Services in Network-on-Chips (DSNOC’09). IEEE Computer Society, 102--103.Google ScholarGoogle Scholar
  15. Andrew B. Kahng, Bin Li, Li-Shiuan Peh, and Kambiz Samadi. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, 423--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mohammad Reza Kakoee, Valeria Bertacco, and Luca Benini. 2011. Relinoc: A reliable network for priority-based on-chip communication. In Design, Automation & Test in Europe Conference & Exhibition (DATE’& Exhibition (DATE’’11). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jongman Kim, Chrysostomos Nicopoulos, Dongkook Park, Vijaykrishnan Narayanan, Mazin S. Yousif, and Chita R. Das. 2006. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In ACM SIGARCH Computer Architecture News, Vol. 34. IEEE Computer Society, 4--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Adán Kohler, Gert Schley, and Martin Radetzki. 2010. Fault tolerant network on chip switching with graceful performance degradation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 29, 6, 883--896. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Michihiro Koibuchi, Hiroki Matsutani, Hideharu Amano, and Timothy Mark Pinkston. 2008. A lightweight fault-tolerant mechanism for network-on-chip. In Proceedings of the 2nd ACM/IEEE International Symposium on Networks-on-Chip. IEEE Computer Society, 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Teijo Lehtonen, Pasi Liljeberg, and Juha Plosila. 2007. Online reconfigurable self-timed links for fault tolerant NoC. VLSI Design 2007.Google ScholarGoogle Scholar
  21. Cheng Liu, Lei Zhang, Yinhe Han, and Xiaowei Li. 2011. A resilient on-chip router design through data path salvaging. In Proceedings of the 16th Asia and South Pacific Design Automation Conference. IEEE Press, 437--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Srinivasan Murali, David Atienza, Luca Benini, and Giovanni De Michel. 2006. A multi-path routing strategy with guaranteed in-order packet delivery and fault-tolerance for networks on chip. In Proceedings of the 43rd Annual Design Automation Conference. ACM, New York, NY, 845--848. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. John D. Owens, William J. Dally, Ron Ho, D. N. (Jay) Jayasimha, Stephen W. Keckler, and Li-Shiuan Peh. 2007. Research challenges for on-chip interconnection networks. IEEE Micro 27, 5, 96--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Li-Shiuan Peh and Natalie Enright Jerger. 2009. On-Chip Networks (1st ed.). Morgan and Claypool Publishers, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Antonis Psathakis, Vassilis Papaefstathiou, Nikolaos Chrysos, Fabien Chaix, Evangelos Vasilakis, Dionisios Pnevmatikatos, and Manolis Katevenis. 2015. A systematic evaluation of emerging mesh-like CMP NoCs. In 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). IEEE, 159--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Martin Radetzki, Chaochao Feng, Xueqian Zhao, and Axel Jantsch. 2013. Methods for fault tolerance in networks-on-chip. ACM Computing Surveys 46, 1, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ronald L. Rivest and Charles E. Leiserson. 1990. Introduction to Algorithms. McGraw-Hill, Inc., New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Samuel Rodrigo, Jose Flich, Antoni Roca, Simone Medardoni, Davide Bertozzi, J. Camacho, Federico Silla, and Jose Duato. 2010. Addressing manufacturing challenges with cost-efficient fault tolerant routing. In 2010 4th ACM/IEEE International Symposium on Networks-on-Chip (NOCS’10). IEEE, 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ioannis Sourdis, Christos Strydis, Antonino Armato, Christos-Savvas Bouganis, Babak Falsafi, Georgi Nedeltchev Gaydadjiev, Sebastian Isaza, Alirad Malek, Riccardo Mariani, D. Pnevmatikatos, and others. 2013. DeSyRe: On-demand system reliability. Microprocessors and Microsystems 37, 8, 981--1001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Stavros Tzilis and Ioannis Sourdis. 2014. A runtime manager for gracefully degrading SoCs. In International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).Google ScholarGoogle ScholarCross RefCross Ref
  31. Arseniy Vitkovskiy, Vassos Soteriou, and Chrysostomos Nicopoulos. 2013. Dynamic fault-tolerant routing algorithm for networks-on-chip based on localised detouring paths. IET Computers & Digital Techniques 7, 2, 93--103.Google ScholarGoogle ScholarCross RefCross Ref
  32. David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal. 2007. On-chip interconnection architecture of the tile processor. IEEE Micro 5, 15--31. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RQNoC: A Resilient Quality-of-Service Network-on-Chip with Service Redirection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Embedded Computing Systems
          ACM Transactions on Embedded Computing Systems  Volume 15, Issue 2
          Special Issue on Innovative Design, Special Issue on MEMOCODE 2014 and Special Issue on M2M/IOT
          May 2016
          421 pages
          ISSN:1539-9087
          EISSN:1558-3465
          DOI:10.1145/2888407
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 February 2016
          • Accepted: 1 November 2015
          • Revised: 1 June 2015
          • Received: 1 December 2014
          Published in tecs Volume 15, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader