Abstract
The many-core SoC is a future trend technology, and the process yield will face many unpredictable challenges. Nonuniform cache architecture (NUCA) can improve the performance of many-core SoC for embedded systems. It embeds a NoC into the cache memory to enhance the data access by distributing traffic loads to several banks in parallel. Providing fault-tolerant mechanism in NUCA is very important because the chip can still work efficiently when some memory banks are unusable. In this paper, we design a specific router working with static and dynamic cache remapping policies to support faulty banks in NUCA. When a L2 cache bank in NUCA is unusable, static remapping policy (SRP) selects a suitable neighbor cache bank according to the collected remapping cost to assist with the cache access by considering cache status and traffic status of the router. We also propose a dynamic remapping policy (DRP) to select the suitable cache bank dynamically at runtime to fit the real loading status of neighbor nodes around the faulty bank. The experimental results show that the average improvement of the SRP is approximated to 26 %, and the average improvement of the DRP is approximated to 28 %.
Similar content being viewed by others
References
Hsu L, Reinhardt S, Iyer R, Makineni S (2006) Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In: Proceedings of international conference on parallel architectures and compilation techniques, pp 13–22
Nesbit KJ, Moreto M, Cazorla FJ, Ramirez A, Valero M, Smith JE (2008) Multicore resource management. IEEE MICRO 28(3):6–16. Special issue on Interaction of computer architecture and operating systems in the manycore era
Kannan H, Guo F, Zhao L (2006) From chaos to QoS: case studies in CMP resource management. Comput Archit News 35(1):21–30
Qureshi MK, Patt YN (2006) Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of IEEE/ACM international symposium on microarchitecture, pp 423–432
Chang N, Kim K, Lee HG (2001) Cycle-accurate energy measurement and characterization with a case study of the arm7tdmi. IEEE Trans Very Large Scale Integr (VLSI) Syst 10(2):146–154
Dick RP, Lakshminarayana G, Raghunathan A, Jha NK (2003) Analysis of power dissipation in embedded systems using real-time operating systems. IEEE Trans Comput-Aided Des Integr Circuits Syst 22(5):615–627
Lee S-J, Lee K, Yoo H-J (2005) Packet-switched on-chip interconnection network for system-on-chip applications. IEEE Trans Circuits Syst 52(6):308–312
Pande PP, Micheli GD, Grecu C, Ivanov A, Saleh R (2005) Design, synthesis, and test of networks on chips. IEEE Des Test Comput 22(5):404–413
Chang K-C, Shen J-S, Chen T-F (2006) Evaluation and design trade-offs between circuit-switched and packet-switched NoCs for application-specific SoCs. In: Proceedings of design automation conference, July 2006, pp 143–148
Chang K-C, Shen J-S, Chen T-F (2008) Tailoring circuit-switched network-on-chip to application-specific SoC. ACM Trans Des Autom Electron Syst 13(1):1–31
Kim J, Hardavellas N, Mai K, Falsafi B, Hoe JC (2009) Multi-bit error tolerant caches using two-dimensional error coding. In: Proceedings of the 40th annual IEEE/ACM international symposium on microarchitecture, Dec 2007, pp 197–209
Wang S, Wang L (2009) Exploiting memory soft redundancy for joint improvement of error tolerance and access efficiency. IEEE Trans Very Large Scale Integr (VLSI) Syst 17(8):973–982
Kim C, Burger D, Keckler SW (2003) An adaptive, non uniform cache structure for wire delay dominated on-chip caches. In: Proceedings of the international conference on architectural support for programming languages and operating systems, Oct 2002, pp 99–107
Zhou X, Yu C, Dash A, Petrove P (2008) Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors. ACM Trans Des Autom Electron Syst 13(1):16
Guironnet de Massas P, Pétro F (2008) Comparison of memory write policies for NoC based multicore cache coherent systems. In: Proceedings of design, automation and test in Europe, Mar. 2008, pp 997–1002
Loghi M, Letis M, Benini L, Poncino M (2005) Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors. In: Proceedings of the 15th ACM great lakes symposium on VLSI, Apr 2005, pp 276–281
Lira J, Molina C, González A (2009) Analysis of non-uniform cache architecture policies for chip-multiprocessor using the parsec benchmark suite. In: Proceedings of the workshop on managed many-core systems, Mar 2009, pp 1–8
Kim J, Hardavellas N, Mai K, Falsafi B, Hoe JC (2009) Multi-bit error tolerant caches using two-dimensional error coding. In: Proceedings of the 40th annual IEEE/ACM international symposium on microarchitecture, Dec 2007 pp 197–209
Wang S, Wang L (2009) Exploiting memory soft redundancy for joint improvement of error tolerance and access efficiency. IEEE Trans Very Large Scale Integr (VLSI) Syst 17(8):973–982
Lee H, Cho S, Childers BR (2010) PERFECTORY: a fault-tolerant directory memory architecture. IEEE Trans Comput 59(5):638–650
Chang K-C (2011) Reliable network-on-chip design for multi-core system-on-chip. J Supercomput 55(1):86–102
Chang K-C, Liao I-M, Liao C-H (2012) Improving performance of multi-core NUCA coherent systems using NoC-assisted mechanisms. J Supercomput 62(3):1318–1337
Magnussion PS et al (2002) Simics: a full system simulation platform. Computer 35(2):50–58
Koibuchi M, Matsutani H, Amano H, Mark Pinkston T (2008) A lightweight fault-tolerant mechanism for network-on-chip. In: Proceedings of the second ACM/IEEE international symposium on networks-on-chip, pp 13–22
Valinataj M, Mohammadi S, Plosila J, Liljeberg P (2010) A fault-tolerant and congestion-aware routing algorithm for networks-on-chip. In: Proceedings of the IEEE 13th international symposium on design and diagnostics of electronic circuits and systems, pp 139–144
Fick D, Deorio A, Chen G, Bertacco V, Sylvester D, Blaauw D (2009) A highly resilient routing algorithm for fault-tolerant NoCs. In: Proceedings of the design, automation & test in Europe conference & exhibition, pp 21–26
Zhang Z, Greiner A, Taktak S (2008) A reconfigurable routing algorithm for a fault-tolerant 2D-mesh network-on-chip. In: Proceedings of the 45th ACM/IEEE design automation conference, pp 441–446
Seyrafi M, Asad A, Zonouz AE, Berangi R, Fathy M, Soryani M (2010) A new low cost fault tolerant solution for mesh based NoCs. In: Proceedings of the international conference on electronics and information engineering, pp 207–213
Valinataj M, Mohammadi S (2010) A fault-aware, reconfigurable and adaptive routing algorithm for NoC applications. In: Proceedings of the 18th IEEE/IFIP VLSI system on chip conference, pp 13–18
Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. Comput Archit News 23(2):24–36. Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture
Borkar S (2005) Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE MICRO 25(6):10–16
Acknowledgements
This work was supported in part by the NSC under Grant No. NSC 101-2628-E-035 -007-MY3.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chang, KC., Chen, CY., Yu, CS. et al. Supporting faulty banks in NUCA by NoC assisted remapping mechanisms. J Supercomput 67, 305–323 (2014). https://doi.org/10.1007/s11227-013-1001-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-1001-0