skip to main content
10.1145/1755888.1755902acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Compiler directed network-on-chip reliability enhancement for chip multiprocessors

Published:13 April 2010Publication History

ABSTRACT

Chip multiprocessors (CMPs) are expected to be the building blocks for future computer systems. While architecting these emerging CMPs is a challenging problem on its own, programming them is even more challenging. As the number of cores accommodated in chip multiprocessors increases, network-on-chip (NoC) type communication fabrics are expected to replace traditional point-to-point buses. Most of the prior software related work so far targeting CMPs focus on performance and power aspects. However, as technology scales, components of a CMP are being increasingly exposed to both transient and permanent hardware failures. This paper presents and evaluates a compiler-directed power-performance aware reliability enhancement scheme for network-on-chip (NoC) based chip multiprocessors (CMPs). The proposed scheme improves on-chip communication reliability by duplicating messages traveling across CMP nodes such that, for each original message, its duplicate uses a different set of communication links as much as possible (to satisfy performance constraint). In addition, our approach tries to reuse communication links across the different phases of the program to maximize link shutdown opportunities for the NoC (to satisfy power constraint). Our results show that the proposed approach is very effective in improving on-chip network reliability, without causing excessive power or performance degradation. In our experiments, we also evaluate the performance oriented and energy oriented versions of our compiler-directed reliability enhancement scheme, and compare it to two pure hardware based fault tolerant routing schemes.

References

  1. M. Ali et al. A Fault Tolerant Mechanism for Handling Permanent and Transient Failures in a Network-on-Chip. In Proc. ITNG, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD Athlon 64 X2 Dual-Core Processor for Desktop. http://www.amd.com /usen/Processors/ProductInformation/0,,30 118 9485 13041,00.htmlGoogle ScholarGoogle Scholar
  3. J. M. Anderson. Automatic Computation and Data Decomposition for Multiprocessors. Ph.D Thesis, Stanford University, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Ascia et al. Multi-objective Mapping for Mesh-based NoC Architectures. In Proc. CODES+ISSS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In Proc. MICRO, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Bertozzi et al. Low Power Error Resilient Encoding for On-Chip Data Buses. In Proc. DATE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Brooks et al. Wattch: A Framework for Architectural-level Power Analysis and Optimizations, In Proc. ISCA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Chen et al. Compiler-directed Channel Allocation for Saving Power in On-chip Networks. In Proc. POPL, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Chen et al. Reducing NoC Energy Consumption Through Compiler-Directed Channel Voltage Scaling. In Proc. PLDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Coons et al. A Spatial Path Scheduling Algorithm for EDGE Architectures. In Proc. ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. J. Dally and B. Towles. Route Packets, Not Wires: On-chip Interconnection Networks. In Proc. DAC, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. De Micheli. Reliable Communication in SoCs. In Proc. DAC, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Duato. A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks. IEEE TPDS 4(12):1320--1331, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. W. Hall et al. Maximizing Multiprocessor Performance With the SUIF Compiler. IEEE Computer, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Hoskote et al. A 5-GHz Mesh Interconnect for a Teraflops Processor. In IEEE MICRO, Sept/Oct, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Hsu et al. Exploring the Cache Design Space for Large Scale CMPs. In SIGARCH Comput. Archit. News, 33(4):24--33, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Hu and R. Marculescu. Energy- and Performance-Aware Mapping for Regular NoC Architectures. IEEE TCAD, 24(4):551--562, April, 2005. http://www.intel.com/idf/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Intel quad-core Xeon. http://www.intel.com/quad-core/?cid=cim:gglxeon us clovertownk7449sGoogle ScholarGoogle Scholar
  19. J. Kahle et al. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4-5), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kandemir and O. Ozturk. Software-Directed Combined CPU/Link Voltage Scaling for NoC-Based CMPs. In Proc. SIGMETRICS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Kim et al. An Adaptive, Non-Uniform Cache Structure forWire-Delay Dominated On-Chip Caches. In Proc. ASPLOS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Kongetira et al. Niagara: A 32-Way Multithreaded SPARC Processor. IEEE MICRO, Apr., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Lee et al. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proc. MICRO, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Lee et al. Space-Time Scheduling of Instruction-Level Parallelism on a RAW Machine. In Proc. ASPLOS, Oct. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Li et al. Profile-Driven Energy Reduction in Network-on-Chips. In Proc. PLDI, San Diego, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Li et al. Compiler-directed Proactive Power Management for Networks. In Proc. CASES, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. McGowen. Adaptive Designs for Power and Thermal Optimization. In Proc. ICCAD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Mejia et al. Segment-Based Routing: An Efficient Fault-Tolerant Routing Algorithm for Meshes and Tori. In Proc. IPDPS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Murali et al. Analysis of Error Recovery Schemes for Networks on Chips. In IEEE Design and Test, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Nagarajan et al. Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures. In Proc. PACT, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Oh et al. Fault-Tolerant Routing in Mesh-Connected 2D Tori. In Proc. ICCS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Pirretti et al. Fault Tolerant Algorithms for Network-on-Chip Interconnect. In Proc. IEEE VLSI, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  33. V. Soteriou and L.-S. Peh. Design Space Exploration of Power-Aware On/Off Interconnection Networks. In Proc. ICCD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. SPEC. http://www.spec.org/cpu2000/CINT2000/.Google ScholarGoogle Scholar
  35. SPEC. http://www.spec.org/jbb2005/Google ScholarGoogle Scholar
  36. C. C. Su and K. G. Shin. Adaptive Fault-Tolerant Deadlock-Free Routing in Meshes and Hypercubes. IEEE TC, 45(6):666--683, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Tarjan et al. CACTI 4.0. HP Labs, Tech. Rep. HPL-2006-86, 2006.Google ScholarGoogle Scholar
  38. T. Theocharides et al. Networks on Chip: Interconnects for the Next Generation Systems on Chip. In Advances in Computers, Vol 63, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  39. Virtutech Simics. http://www.virtutech.com/Google ScholarGoogle Scholar
  40. H.-S. Wang et al. Orion: A Power-Performance Simulator for Interconnection Networks. In Proc. MICRO, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J.Wu. Fault-Tolerant Adaptive and Minimal Routing in Mesh-Connected Multicomputers Using Extended Safety Levels. In IEEE TPDS, 11(2):149--159, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xpress-MP. http://www.dashoptimization.com/pdf/Mosel1.pdf, 2002.Google ScholarGoogle Scholar
  43. J. Zhou and F. C. M. Lau. Adaptive Fault-Tolerant Wormhole Routing in 2D Meshes. In Proc. IPDPS, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. X. Zhu and W. Qin. Prototyping a Fault-Tolerant Multiprocessor SoC With Runtime Fault Recovery. In Proc. DAC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Compiler directed network-on-chip reliability enhancement for chip multiprocessors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      LCTES '10: Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
      April 2010
      184 pages
      ISBN:9781605589534
      DOI:10.1145/1755888
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 45, Issue 4
        LCTES '10
        April 2010
        170 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1755951
        Issue’s Table of Contents

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate116of438submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader