research-article

Compiler directed network-on-chip reliability enhancement for chip multiprocessors

Authors:
Ozcan Ozturk

Bilkent University, Ankara, Turkey

Bilkent University, Ankara, Turkey
View Profile

,
Mahmut Kandemir

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Mary J. Irwin

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

,
Sri H.K. Narayanan

Pennsylvania State University, University Park, PA, USA

Pennsylvania State University, University Park, PA, USA
View Profile

LCTES '10: Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systemsApril 2010Pages 85–94https://doi.org/10.1145/1755888.1755902

Published:13 April 2010Publication History

LCTES '10: Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems

Pages 85–94

ABSTRACT

Chip multiprocessors (CMPs) are expected to be the building blocks for future computer systems. While architecting these emerging CMPs is a challenging problem on its own, programming them is even more challenging. As the number of cores accommodated in chip multiprocessors increases, network-on-chip (NoC) type communication fabrics are expected to replace traditional point-to-point buses. Most of the prior software related work so far targeting CMPs focus on performance and power aspects. However, as technology scales, components of a CMP are being increasingly exposed to both transient and permanent hardware failures. This paper presents and evaluates a compiler-directed power-performance aware reliability enhancement scheme for network-on-chip (NoC) based chip multiprocessors (CMPs). The proposed scheme improves on-chip communication reliability by duplicating messages traveling across CMP nodes such that, for each original message, its duplicate uses a different set of communication links as much as possible (to satisfy performance constraint). In addition, our approach tries to reuse communication links across the different phases of the program to maximize link shutdown opportunities for the NoC (to satisfy power constraint). Our results show that the proposed approach is very effective in improving on-chip network reliability, without causing excessive power or performance degradation. In our experiments, we also evaluate the performance oriented and energy oriented versions of our compiler-directed reliability enhancement scheme, and compare it to two pure hardware based fault tolerant routing schemes.

References

M. Ali et al. A Fault Tolerant Mechanism for Handling Permanent and Transient Failures in a Network-on-Chip. In Proc. ITNG, 2007. Google ScholarDigital Library
AMD Athlon 64 X2 Dual-Core Processor for Desktop. http://www.amd.com /usen/Processors/ProductInformation/0,,30 118 9485 13041,00.htmlGoogle Scholar
J. M. Anderson. Automatic Computation and Data Decomposition for Multiprocessors. Ph.D Thesis, Stanford University, 1997. Google ScholarDigital Library
G. Ascia et al. Multi-objective Mapping for Mesh-based NoC Architectures. In Proc. CODES+ISSS, 2004. Google ScholarDigital Library
T. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In Proc. MICRO, 1999. Google ScholarDigital Library
D. Bertozzi et al. Low Power Error Resilient Encoding for On-Chip Data Buses. In Proc. DATE, 2002. Google ScholarDigital Library
D. Brooks et al. Wattch: A Framework for Architectural-level Power Analysis and Optimizations, In Proc. ISCA, 2000. Google ScholarDigital Library
G. Chen et al. Compiler-directed Channel Allocation for Saving Power in On-chip Networks. In Proc. POPL, 2006. Google ScholarDigital Library
G. Chen et al. Reducing NoC Energy Consumption Through Compiler-Directed Channel Voltage Scaling. In Proc. PLDI, 2006. Google ScholarDigital Library
K. Coons et al. A Spatial Path Scheduling Algorithm for EDGE Architectures. In Proc. ASPLOS, 2006. Google ScholarDigital Library
W. J. Dally and B. Towles. Route Packets, Not Wires: On-chip Interconnection Networks. In Proc. DAC, 2001. Google ScholarDigital Library
G. De Micheli. Reliable Communication in SoCs. In Proc. DAC, 2004. Google ScholarDigital Library
J. Duato. A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks. IEEE TPDS 4(12):1320--1331, 1993. Google ScholarDigital Library
M. W. Hall et al. Maximizing Multiprocessor Performance With the SUIF Compiler. IEEE Computer, December 1996. Google ScholarDigital Library
Y. Hoskote et al. A 5-GHz Mesh Interconnect for a Teraflops Processor. In IEEE MICRO, Sept/Oct, 2007. Google ScholarDigital Library
L. Hsu et al. Exploring the Cache Design Space for Large Scale CMPs. In SIGARCH Comput. Archit. News, 33(4):24--33, 2005. Google ScholarDigital Library
J. Hu and R. Marculescu. Energy- and Performance-Aware Mapping for Regular NoC Architectures. IEEE TCAD, 24(4):551--562, April, 2005. http://www.intel.com/idf/. Google ScholarDigital Library
Intel quad-core Xeon. http://www.intel.com/quad-core/?cid=cim:gglxeon us clovertownk7449sGoogle Scholar
J. Kahle et al. Introduction to the Cell Multiprocessor. IBM Journal of Research and Development, 49(4-5), 2005. Google ScholarDigital Library
M. Kandemir and O. Ozturk. Software-Directed Combined CPU/Link Voltage Scaling for NoC-Based CMPs. In Proc. SIGMETRICS, 2008. Google ScholarDigital Library
C. Kim et al. An Adaptive, Non-Uniform Cache Structure forWire-Delay Dominated On-Chip Caches. In Proc. ASPLOS, 2002. Google ScholarDigital Library
P. Kongetira et al. Niagara: A 32-Way Multithreaded SPARC Processor. IEEE MICRO, Apr., 2005. Google ScholarDigital Library
C. Lee et al. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In Proc. MICRO, 1997. Google ScholarDigital Library
W. Lee et al. Space-Time Scheduling of Instruction-Level Parallelism on a RAW Machine. In Proc. ASPLOS, Oct. 1998. Google ScholarDigital Library
F. Li et al. Profile-Driven Energy Reduction in Network-on-Chips. In Proc. PLDI, San Diego, 2007. Google ScholarDigital Library
F. Li et al. Compiler-directed Proactive Power Management for Networks. In Proc. CASES, 2005. Google ScholarDigital Library
R. McGowen. Adaptive Designs for Power and Thermal Optimization. In Proc. ICCAD, 2005. Google ScholarDigital Library
A. Mejia et al. Segment-Based Routing: An Efficient Fault-Tolerant Routing Algorithm for Meshes and Tori. In Proc. IPDPS, 2006. Google ScholarDigital Library
S. Murali et al. Analysis of Error Recovery Schemes for Networks on Chips. In IEEE Design and Test, 2005. Google ScholarDigital Library
R. Nagarajan et al. Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures. In Proc. PACT, 2004. Google ScholarDigital Library
E. Oh et al. Fault-Tolerant Routing in Mesh-Connected 2D Tori. In Proc. ICCS, 2003. Google ScholarDigital Library
M. Pirretti et al. Fault Tolerant Algorithms for Network-on-Chip Interconnect. In Proc. IEEE VLSI, 2004.Google ScholarCross Ref
V. Soteriou and L.-S. Peh. Design Space Exploration of Power-Aware On/Off Interconnection Networks. In Proc. ICCD, 2004. Google ScholarDigital Library
SPEC. http://www.spec.org/cpu2000/CINT2000/.Google Scholar
SPEC. http://www.spec.org/jbb2005/Google Scholar
C. C. Su and K. G. Shin. Adaptive Fault-Tolerant Deadlock-Free Routing in Meshes and Hypercubes. IEEE TC, 45(6):666--683, 1996. Google ScholarDigital Library
D. Tarjan et al. CACTI 4.0. HP Labs, Tech. Rep. HPL-2006-86, 2006.Google Scholar
T. Theocharides et al. Networks on Chip: Interconnects for the Next Generation Systems on Chip. In Advances in Computers, Vol 63, 2005.Google ScholarCross Ref
Virtutech Simics. http://www.virtutech.com/Google Scholar
H.-S. Wang et al. Orion: A Power-Performance Simulator for Interconnection Networks. In Proc. MICRO, 2002. Google ScholarDigital Library
J.Wu. Fault-Tolerant Adaptive and Minimal Routing in Mesh-Connected Multicomputers Using Extended Safety Levels. In IEEE TPDS, 11(2):149--159, 2000. Google ScholarDigital Library
Xpress-MP. http://www.dashoptimization.com/pdf/Mosel1.pdf, 2002.Google Scholar
J. Zhou and F. C. M. Lau. Adaptive Fault-Tolerant Wormhole Routing in 2D Meshes. In Proc. IPDPS, 2001. Google ScholarDigital Library
X. Zhu and W. Qin. Prototyping a Fault-Tolerant Multiprocessor SoC With Runtime Fault Recovery. In Proc. DAC, 2006. Google ScholarDigital Library

Index Terms

Compiler directed network-on-chip reliability enhancement for chip multiprocessors
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Compiler directed network-on-chip reliability enhancement for chip multiprocessors
LCTES '10

Chip multiprocessors (CMPs) are expected to be the building blocks for future computer systems. While architecting these emerging CMPs is a challenging problem on its own, programming them is even more challenging. As the number of cores accommodated in ...
Read More
Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors

The design and performance of next-generation chip multiprocessors (CMPs) will be bound by the limited amount of power that can be dissipated on a single die. We present photonic networks-on-chip (NoC) as a solution to reduce the impact of intra-chip ...
Read More
An analysis of on-chip interconnection networks for large-scale chip multiprocessors

With the number of cores of chip multiprocessors (CMPs) rapidly growing as technology scales down, connecting the different components of a CMP in a scalable and efficient way becomes increasingly challenging. In this article, we explore the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LCTES '10: Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
April 2010
184 pages
ISBN:9781605589534
DOI:10.1145/1755888
General Chair:
Jaejin Lee
Seoul National University, Korea
,
Program Chair:
Bruce B. Childers
University of Pittsburgh, USA
ACM SIGPLAN Notices Volume 45, Issue 4
LCTES '10
April 2010
170 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1755951
Issue’s Table of Contents
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chip multiprocessors
compiler
noc
reliability
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate116of438submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 276
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Compiler directed network-on-chip reliability enhancement for chip multiprocessors

LCTES '10: Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Compiler directed network-on-chip reliability enhancement for chip multiprocessors

Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors

An analysis of on-chip interconnection networks for large-scale chip multiprocessors