skip to main content
research-article

NoC-based fault-tolerant cache design in chip multiprocessors

Published: 28 March 2014 Publication History

Abstract

Advances in technology scaling increasingly make emerging Chip MultiProcessor (CMP) platforms more susceptible to failures that cause various reliability challenges. In such platforms, error-prone on-chip memories (caches) continue to dominate the chip area. Also, Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these architectures. We present a novel solution for efficient implementation of fault-tolerant design of Last-Level Cache (LLC) in CMP architectures. The proposed approach leverages the interconnection network fabric to protect the LLC cache banks against permanent faults in an efficient and scalable way. During an LLC access to a faulty block, the network detects and corrects the faults, returning the fault-free data to the requesting core. Leveraging the NoC interconnection fabric, designers can implement any cache fault-tolerant scheme in an efficient, modular, and scalable manner for emerging multicore/manycore platforms. We propose four different policies for implementing a remapping-based fault-tolerant scheme leveraging the NoC fabric in different settings. The proposed policies enable design trade-offs between NoC traffic (packets sent through the network) and the intrinsic parallelism of these communication mechanisms, allowing designers to tune the system based on design constraints. We perform an extensive design space exploration on NoC benchmarks to demonstrate the usability and efficacy of our approach. In addition, we perform sensitivity analysis to observe the behavior of various policies in reaction to improvements in the NoC architecture. The overheads of leveraging the NoC fabric are minimal: on an 8-core, 16-cache-bank CMP we demonstrate reliable access to LLCs with additional overheads of less than 3% in area and less than 7% in power.

References

[1]
A. Agarwal, B. C. Paul, H. Mahmoodi-Meimand, A. Datta, and K. Roy. 2005. A process-tolerant cache architecture for improved yield in nanoscale technologies. IEEE Trans. VLSI Syst. 13, 1, 27--38.
[2]
N. Aggarwal, P. Ranganathan, N. P. Jouppi, and J. E. Smith. 2007. Configurable isolation: Building high availability systems with commodity multi-core processors. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA'07). 470--481.
[3]
A. Alameldeen, I. Wagner, Z. Chishti, W. Wu, and S.-L. Lu. 2011. Energy-efficient cache design using variable-strength error-correcting codes. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). 461--471.
[4]
F. Angiolini, D. Atienza, S. Murali, L. Benini, and Micheli, G. D. 2006. Reliability support for on-chip memories using networks-on-chip. In Proceedings of the International Conference on Computer Design (ICCD'06).
[5]
A. Ansari, S. Feng, S. Gupta, and S. Mahlke. 2011. Archipelago: A polymorphic cache design for enabling robust near-threshold operation. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA'11). 539--550.
[6]
ASU. 2012. Predictive technology model (ptm). http://ptm.asu.edu.
[7]
A. Banaiyanmofrad, H. Homayoun, and N. Dutt. 2011. FFT-cache: A flexible fault-tolerant cache architecture for ultra low voltage operation. In Proceedings of the 14th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'11). 95--104.
[8]
A. Banaiyanmofrad, G. Girao, and N. Dutt. 2012. A novel noc--based design for fault-tolerance of last-level caches in cmps. In Proceedings of the 8th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS'12). 63--72.
[9]
B. M. Beckmann and D. A. Wood. 2004. Managing wire delay in large chip-multiprocessor caches. In Proceedings of the 37th International Symposium on Microarhitecture (MICRO'04). 319--330.
[10]
D. Bertozzi, L. Benini, and G. D. Micheli. 2000. Error control schemes for on-chip communication links: The energy--reliability tradeoff. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 24, 6, 818--831.
[11]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 72--81.
[12]
P. Bogdan, T. Dumitras, and R. Marculescu. 2007. Stochastic communication: A new paradigm for fault-tolerant networks-on-chip. http://www.hindawi.com/journals/vlsi/2007/095348/abs/.
[13]
B. Calhoun and A. Chandrakasan. 2006. A 256 kb sub-threshold sram in 65nm cmos. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC'06).
[14]
C. Chen and M. Hsiao. 1984. Error-correcting codes for semiconductor memory applications: A state of the art review. IBM J. Res. Devel. 28, 2, 124--134.
[15]
R. Das, A. K. Mishra, C. Nicopoulos, P. Dongkook, V. Narayanan, et al. 2008. Performance and power optimization through data compression in network-on-chip architectures. In Proceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA'08). 215--225.
[16]
A. Eghbal, H. Pedram, P. M. Yaghini, and H. R. Zarandi. 2010. Designing a fault-tolerant noc router architecture. Int. J. Electron. 97, 10, 1181--1192.
[17]
N. Enright-Jerger, L.-S. Peh, and M. Lipasti. 2008. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'08). 35--46.
[18]
X. Fu, T. Li, and J. A. B. Fortes. 2010. Architecting reliable multi-core network-on-chip for small scale processing technology. In Proceedings of the Design Automation Conference (DSN'10).
[19]
G. Girao, D. Barcelos, and F. R. Wagner. 2009. Performance and energy evaluation of memory organizations in noc-based mpsocs under latency and task migration. In Proceedings of the 17th IFIP WG 10.5/IEEE International Conference on Very Large Scale Integration (VLSI-SoC'09).
[20]
S. M. Z. Iqbal, Y. Liang, and H. Grahn. 2010. ParMiBench: An open source benchmark for embedded multiprocessor systems. In Proceedings of Computer Architecture Letters.
[21]
Li, F. Kandemir, M. J. Irwin, and S. W. SON. 2008. A novel migration-based nuca design for chip multiprocessors. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'08).
[22]
A. B. Kahng, B. Li, L. S. Peh, and K. Samadi. 2009. ORION 2.0: A fast and accurate noc power and area model for early-stage design space exploration. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'09). 423--428.
[23]
C. Kim, D. Burger, and S. W. Keckler. 2002. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'02). 211--222.
[24]
J. Kim, D. Park, T. Theocharides, N. Vijaykrishnan, and C. R. Das. 2005. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of the 42nd Annual Design Automation Conference (DAC'05). 559--564.
[25]
D. Kim, K. Kim, J.-Y. Kim, S.-J. Lee, and H.-J. Yoo. 2007a. Solutions for real chip implementation issues of noc and their application to memory-centric noc. In Proceedings of the 1st International Symposium on Networks-on-Chip (NOCS'07). 30--39.
[26]
J. Kim, C. Nicopoulos, and D. Park. 2006. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA'06). 4--15.
[27]
J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. Hoe. 2007b. Multi-bit error tolerant caches using two-dimensional error coding. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). 197--209.
[28]
C. K. Koh, W. F. Wong, Y. Chen, and H. Li. 2009. Tolerating process variations in large, set associative caches: The buddy cache. ACM Trans. Archit. Code Optim. 6, 2, 1--34.
[29]
L. Kunz, G. Girao, and F. R. Wagner. 2011. Improving the efficiency of a hardware transactional memory on an noc-based mpsoc. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE'11). 1--4.
[30]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, Al. E. 2002. Simics: A full system simulation platform. IEEE Comput. 35, 2, 50--58.
[31]
S. Manolache, P. Eles, and Z. Peng. 2005. Fault and energy-aware communication mapping with guaranteed latency for applications implemented on noc. In Proceedings of the 42nd Design Automation Conference (DAC'05). 266--269.
[32]
T. Marescaux, E. Brockmeyer, and H. Corporaal. 2007. The impact of higher communication layers on noc supported mpsocs. In Proceedings of the International Symposium on Networks-on-Chips (NOCS'07).
[33]
R. Marculescu, U. Y. Ogras, L.-S. Peh, N. E. Jerger, and Y. Hoskote. 2009. Outstanding research problems in noc design: System, microarchitecture, and circuit perspectives. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 28, 1, 3--21.
[34]
M. Monchiero, G. Palermo, C. Silvano, and O. Villa. 2006. Exploration of distributed shared memory architectures for noc-based multiprocessors. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS'06). 144--151.
[35]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2009. Cacti 6.5. Tech. rep., HP Laboratories. http://www.hpl.hp.com/research/cacti/.
[36]
S. R. Nassif, N. Mehta, and Y. Cao. 2010. A resilience roadmap. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'10). 1011--1016.
[37]
S. Ozdemir, D. Sinha, G. Memik, J. Adams, and H. Zhou. 2006. Yield-aware cache architectures. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06). 15--25.
[38]
V. Puente, J. A. Gregorio, F. Vallejo, and R. Beivide. 2004. Immunet: A cheap and robust fault-tolerant packet routing mechanism. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA'04). 198.
[39]
M. Pirretti, G. M. Link, R. R. Brooks, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. 2004. Fault tolerant algorithms for network-on-chip interconnect. In Proceedings of the IEEE Symposium on VLSI. 46--51.
[40]
D. Roberts, N. S. Kim, and T. Mudge. 2007. On-chip cache device scaling limits and effective fault repair techniques in future nanoscale technology. In Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD'07).
[41]
SUN/ORACLE. 2010. SPARC T3 processor data sheet. http://www.oracle.com/us/products/servers-storage/servers/sparc-enterprise/t-series/sparc-t3-chip-ds-173097.pdf.
[42]
T. Thomas and B. Anthony. 1999. Area, performance, and yield implications of redundancy in on-chip caches. In Proceedings of the International Conference on Computer Design (ICCD'99). 291--292.
[43]
P. M. Yaghini, A. Eghbal, H. Pedram, and H. R. Zarandi. 2010. Investigation of transient fault effects in an asynchronous noc router. In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network Based Processing (PDP'10). 540--545.
[44]
Y. Wang, L. Zhang, Y. Han, H. Li, and X. Li. 2010. Address remapping for static nuca in noc-based degradable chip-multiprocessors. In Proceedings of the 16th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'10).
[45]
A. G. Wassal, H. H. Sarhan, A. Elsherief. 2011. Novel 3d memory-centric noc architecture for transaction-based soc applications. In Proceedings of the Saudi International Electronics, Communications and Photonics Conference (SIECPC'11). 1--5.
[46]
C. Wilkerson, H. Gao, A. R. Alamelden, Z. Chishti, M. Khellah, and S.-L. Lu. 2008. Trading off cache capacity for reliability to enable low voltage operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 203--214.
[47]
C. Wilkerson, A. R. Alamelden, Z. Chishti, W. Wu, D. Somasekhar, and S.-L. Lu. 2010. Reducing cache power with low-cost, multi-bit error-correcting codes. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10).
[48]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The splash-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA'95).
[49]
C. A. Zeferino and A. A. Susin. 2003. SoCIN: A parametric and scalable network-on-chip. In Proceedings of the 16th Symposium on Integrated Circuits and Systems Design (SBCCI'03). 169.
[50]
M. Zhang, V. M. Stojanovic, and P. Ampadu. 2012. Reliable ultra-low-voltage cache design for many-core systems. IEEE Trans. Circ. Syst. II: Express Briefs 59, 12, 858--862.

Cited By

View all
  • (2017)Leveraging on Deep Memory Hierarchies to Minimize Energy Consumption and Data Access Latency on Single-Chip Cloud ComputersIEEE Transactions on Sustainable Computing10.1109/TSUSC.2017.27066202:2(154-166)Online publication date: 1-Apr-2017
  • (2016)A Fault-Tolerant L1 Cache with Predictable Performance by Virtual Filter Cache2016 13th International Conference on Embedded Software and Systems (ICESS)10.1109/ICESS.2016.31(60-66)Online publication date: Aug-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 13, Issue 3s
Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
March 2014
403 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2597868
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 28 March 2014
Accepted: 01 November 2013
Revised: 01 June 2013
Received: 01 December 2012
Published in TECS Volume 13, Issue 3s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Fault-tolerant design
  2. chip multiprocessor
  3. network-on-chip
  4. remapping

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Leveraging on Deep Memory Hierarchies to Minimize Energy Consumption and Data Access Latency on Single-Chip Cloud ComputersIEEE Transactions on Sustainable Computing10.1109/TSUSC.2017.27066202:2(154-166)Online publication date: 1-Apr-2017
  • (2016)A Fault-Tolerant L1 Cache with Predictable Performance by Virtual Filter Cache2016 13th International Conference on Embedded Software and Systems (ICESS)10.1109/ICESS.2016.31(60-66)Online publication date: Aug-2016

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media