ABSTRACT
Chip multiprocessors (CMPs) are expected to be the building blocks for future computer systems. While architecting these emerging CMPs is a challenging problem on its own, programming them is even more challenging. As the number of cores accommodated in chip multiprocessors increases, network-on-chip (NoC) type communication fabrics are expected to replace traditional point-to-point buses. Most of the prior software related work so far targeting CMPs focus on performance and power aspects. However, as technology scales, components of a CMP are being increasingly exposed to both transient and permanent hardware failures. This paper presents and evaluates a compiler-directed power-performance aware reliability enhancement scheme for network-on-chip (NoC) based chip multiprocessors (CMPs). The proposed scheme improves on-chip communication reliability by duplicating messages traveling across CMP nodes such that, for each original message, its duplicate uses a different set of communication links as much as possible (to satisfy performance constraint). In addition, our approach tries to reuse communication links across the different phases of the program to maximize link shutdown opportunities for the NoC (to satisfy power constraint). Our results show that the proposed approach is very effective in improving on-chip network reliability, without causing excessive power or performance degradation. In our experiments, we also evaluate the performance oriented and energy oriented versions of our compiler-directed reliability enhancement scheme, and compare it to two pure hardware based fault tolerant routing schemes.
- M. Ali et al. A Fault Tolerant Mechanism for Handling Permanent and Transient Failures in a Network-on-Chip. In Proc. ITNG, 2007. Google ScholarDigital Library
- AMD Athlon 64 X2 Dual-Core Processor for Desktop. http://www.amd.com /usen/Processors/ProductInformation/0,,30 118 9485 13041,00.htmlGoogle Scholar
- J. M. Anderson. Automatic Computation and Data Decomposition for Multiprocessors. Ph.D Thesis, Stanford University, 1997. Google ScholarDigital Library
- G. Ascia et al. Multi-objective Mapping for Mesh-based NoC Architectures. In Proc. CODES+ISSS, 2004. Google ScholarDigital Library
- T. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In Proc. MICRO, 1999. Google ScholarDigital Library
- D. Bertozzi et al. Low Power Error Resilient Encoding for On-Chip Data Buses. In Proc. DATE, 2002. Google ScholarDigital Library
- D. Brooks et al. Wattch: A Framework for Architectural-level Power Analysis and Optimizations, In Proc. ISCA, 2000. Google ScholarDigital Library
- G. Chen et al. Compiler-directed Channel Allocation for Saving Power in On-chip Networks. In Proc. POPL, 2006. Google ScholarDigital Library
- G. Chen et al. Reducing NoC Energy Consumption Through Compiler-Directed Channel Voltage Scaling. In Proc. PLDI, 2006. Google ScholarDigital Library
- K. Coons et al. A Spatial Path Scheduling Algorithm for EDGE Architectures. In Proc. ASPLOS, 2006. Google ScholarDigital Library
- W. J. Dally and B. Towles. Route Packets, Not Wires: On-chip Interconnection Networks. In Proc. DAC, 2001. Google ScholarDigital Library
- G. De Micheli. Reliable Communication in SoCs. In Proc. DAC, 2004. Google ScholarDigital Library
- J. Duato. A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks. IEEE TPDS 4(12):1320--1331, 1993. Google ScholarDigital Library
- M. W. Hall et al. Maximizing Multiprocessor Performance With the SUIF Compiler. IEEE Computer, December 1996. Google ScholarDigital Library
- Y. Hoskote et al. A 5-GHz Mesh Interconnect for a Teraflops Processor. In IEEE MICRO, Sept/Oct, 2007. Google ScholarDigital Library
- L. Hsu et al. Exploring the Cache Design Space for Large Scale CMPs. In SIGARCH Comput. Archit. News, 33(4):24--33, 2005. Google ScholarDigital Library
- J. Hu and R. Marculescu. Energy- and Performance-Aware Mapping for Regular NoC Architectures. IEEE TCAD, 24(4):551--562, April, 2005. http://www.intel.com/idf/. Google ScholarDigital Library
- Intel quad-core Xeon. http://www.intel.com/quad-core/?cid=cim:gglxeon us clovertownk7449sGoogle Scholar
- J. Kahle et al. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4-5), 2005. Google ScholarDigital Library
- M. Kandemir and O. Ozturk. Software-Directed Combined CPU/Link Voltage Scaling for NoC-Based CMPs. In Proc. SIGMETRICS, 2008. Google ScholarDigital Library
- C. Kim et al. An Adaptive, Non-Uniform Cache Structure forWire-Delay Dominated On-Chip Caches. In Proc. ASPLOS, 2002. Google ScholarDigital Library
- P. Kongetira et al. Niagara: A 32-Way Multithreaded SPARC Processor. IEEE MICRO, Apr., 2005. Google ScholarDigital Library
- C. Lee et al. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proc. MICRO, 1997. Google ScholarDigital Library
- W. Lee et al. Space-Time Scheduling of Instruction-Level Parallelism on a RAW Machine. In Proc. ASPLOS, Oct. 1998. Google ScholarDigital Library
- F. Li et al. Profile-Driven Energy Reduction in Network-on-Chips. In Proc. PLDI, San Diego, 2007. Google ScholarDigital Library
- F. Li et al. Compiler-directed Proactive Power Management for Networks. In Proc. CASES, 2005. Google ScholarDigital Library
- R. McGowen. Adaptive Designs for Power and Thermal Optimization. In Proc. ICCAD, 2005. Google ScholarDigital Library
- A. Mejia et al. Segment-Based Routing: An Efficient Fault-Tolerant Routing Algorithm for Meshes and Tori. In Proc. IPDPS, 2006. Google ScholarDigital Library
- S. Murali et al. Analysis of Error Recovery Schemes for Networks on Chips. In IEEE Design and Test, 2005. Google ScholarDigital Library
- R. Nagarajan et al. Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures. In Proc. PACT, 2004. Google ScholarDigital Library
- E. Oh et al. Fault-Tolerant Routing in Mesh-Connected 2D Tori. In Proc. ICCS, 2003. Google ScholarDigital Library
- M. Pirretti et al. Fault Tolerant Algorithms for Network-on-Chip Interconnect. In Proc. IEEE VLSI, 2004.Google ScholarCross Ref
- V. Soteriou and L.-S. Peh. Design Space Exploration of Power-Aware On/Off Interconnection Networks. In Proc. ICCD, 2004. Google ScholarDigital Library
- SPEC. http://www.spec.org/cpu2000/CINT2000/.Google Scholar
- SPEC. http://www.spec.org/jbb2005/Google Scholar
- C. C. Su and K. G. Shin. Adaptive Fault-Tolerant Deadlock-Free Routing in Meshes and Hypercubes. IEEE TC, 45(6):666--683, 1996. Google ScholarDigital Library
- D. Tarjan et al. CACTI 4.0. HP Labs, Tech. Rep. HPL-2006-86, 2006.Google Scholar
- T. Theocharides et al. Networks on Chip: Interconnects for the Next Generation Systems on Chip. In Advances in Computers, Vol 63, 2005.Google ScholarCross Ref
- Virtutech Simics. http://www.virtutech.com/Google Scholar
- H.-S. Wang et al. Orion: A Power-Performance Simulator for Interconnection Networks. In Proc. MICRO, 2002. Google ScholarDigital Library
- J.Wu. Fault-Tolerant Adaptive and Minimal Routing in Mesh-Connected Multicomputers Using Extended Safety Levels. In IEEE TPDS, 11(2):149--159, 2000. Google ScholarDigital Library
- Xpress-MP. http://www.dashoptimization.com/pdf/Mosel1.pdf, 2002.Google Scholar
- J. Zhou and F. C. M. Lau. Adaptive Fault-Tolerant Wormhole Routing in 2D Meshes. In Proc. IPDPS, 2001. Google ScholarDigital Library
- X. Zhu and W. Qin. Prototyping a Fault-Tolerant Multiprocessor SoC With Runtime Fault Recovery. In Proc. DAC, 2006. Google ScholarDigital Library
Index Terms
- Compiler directed network-on-chip reliability enhancement for chip multiprocessors
Recommendations
Compiler directed network-on-chip reliability enhancement for chip multiprocessors
LCTES '10Chip multiprocessors (CMPs) are expected to be the building blocks for future computer systems. While architecting these emerging CMPs is a challenging problem on its own, programming them is even more challenging. As the number of cores accommodated in ...
Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors
The design and performance of next-generation chip multiprocessors (CMPs) will be bound by the limited amount of power that can be dissipated on a single die. We present photonic networks-on-chip (NoC) as a solution to reduce the impact of intra-chip ...
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
With the number of cores of chip multiprocessors (CMPs) rapidly growing as technology scales down, connecting the different components of a CMP in a scalable and efficient way becomes increasingly challenging. In this article, we explore the ...
Comments