Abstract
This paper proposes an adaptive cache coherence protocol to improve the reliability of caches against soft errors in shared-memory multi-core processors. The proposed protocol is conducted based-on a comprehensive study and analysis intended to determine the effects of cache coherence protocols on the characteristics of cache memories. The outcomes of this analysis indicate that differences in handling dirty data items play an important role to make distinction in favor of or against a cache coherence protocol. Based on the primary results, the proposed protocol tries to enhance the reliability of caches by means of sharing management. Sharing is dynamically adjusted according to the operational mode of CPU. The experimental results show that proposed protocol leads to about 16 % improvements in MTTF, with no performance degradation and with negligible bandwidth and cache energy consumption overheads compared to previous works.











Similar content being viewed by others
References
Kumar S, Hari S, Li M, Ramachandran P, Choi B, Adve S (2009) mSWAT: low-cost hardware fault detection and diagnosis for multicore systems. in: 42th Annual International Symposium on Microarchitecture
Reis G, Chang J, Vachharajani N, Rangan R, August D (2005) SWIFT: software implemented fault tolerance. In: 3rd International Symposium on Code Generation and Optimization
Kim J, Hardavellas N, Mai K, Falsafi B, Hoe J (2007) Multi-bit error tolerant caches using two-dimensional error coding. In: 40th IEEE/ACM International Symposium on Microarchitecture
Manoochehri M, Annavaram M, Dubois M (2011) CPPC: correctable parity protected cache. In: 38th ACM/IEEE International Symposium on Computer, Architecture
Alameldeen A, Wagner I, Chishti Z, Wu W, Wilkerson C, Lu S (2011) Energy-efficient cache design using variable-strength error-correcting codes. In: 38th International Symposium on Computer, Architecture
Wang S, Hu J, Ziavras S (2009) On the characterization and optimization of on-chip cache reliability against soft errors. IEEE Trans Comput 58:1171–1184
Zhang W (2005) Replication cache: a small fully associative cache to improve data cache reliability. IEEE Trans Comput 54:1547–1555
Zhang W, Gurumurthi S, Kandemir M, Sivasubramaniam A (2003) ICR: in-cache replication for enhancing data cache reliability. In: 33rd International Conference on Dependable Systems and Networks
Sasan A, Homayoun H, Eltawil A, Kurdahi F (2009) A fault tolerant cache architecture for sub 500mV operation: resizable data composer cache (RDC-cache). In: 11th International Conference on Compilers Architecture and Synthesis for Embedded Systems
Yoon D, Erez M (2009) Memory mapped ECC: low-cost error protection for last level caches. In: 36th International Symposium on Computer, Architecture
Chishti Z, Alameldeen A, Wilkerson C, Wu W, Lu S (2009) Improving cache lifetime reliability at ultra-low voltages. In: 42th Annual International Symposium on Microarchitecture
Archibald J, Baer J (1986) Cache coherence protocols: evaluation using a multiprocessor simulation model. ACM Trans Comput Syst 4: 273–298
Yang Q, Bhuyan L, Liu B (1989) Analysis and comparison of cache coherence protocols for a packet-switched multiprocessor. IEEE Trans Comput 28:1148–1153
Hackenberg D, Molka D, Nagel W (2009) Comparing cache architectures and coherency protocols on x86–64 multicore SMP systems. In: 42th Annual International Symposium on Microarchitecture
Martin M, Hill M, Sorin D (2012) Why on-chip cache coherence is here to stay. Commun ACM 55:78–89
Woo S, Ohara M, Torrie E, Singh J, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: 28th Annual International Symposium on Microarchitecture
SESC (2005) http://sesc.sourceforge.net
Ando H, Seki K, Sakashita S, Aihara M, Kan R, Imada K, Itoh M, Nagai M, Tosaka Y, Takahisa K, Hatanaka K (2007) Accelerated testing of a 90nm SPARC64 V microprocessor for neutron SER. IEEE Workshop on Silicon Errors in Logic-System Effects
Kessler R (1999) The alpha 21264 microprocessor. IEEE Micro Mag 19:24–36
Quach N (2000) High availability and reliability in the Itanium processor. IEEE Micro Mag 20: 103–114
Li L, Degalahal V, Vijaykrishnan N, Kandemir M, Irwin M (2004) Soft error and energy consumption interactions: a data cache perspective. In: International Symposium on Low Energy Electronics and Design
CACTI 6.5 (2010) http://www.cs.utah.edu/.rajeev/cacti6.5/
Bailey DH (1990) FFT’s in external or hierarchical memory. J Supercomput
Hanrahan P, Salzman D, Aupperle L (1991) A rapid hierarchical radiosity algorithm. In: International Conference on Special Interest Group on Graphics and Interactive Techniques, vol25, pp 197–206
Derhacobian N (2004) The SER Challenge. International Workshop on Memory Technology, Desig and Test
Chen G, Sylvester D, Blaauw D, Mudge T (2010) Yield-driven near-threshold SRAM design. In: IEEE Transactions on Very Large Scale Integration Systems
Baumann R (2001) Soft errors in advanced semiconductor devices part I: the three radiation sources. IEEE Trans Dev Mater Reliab 1: 17–22
Baumann R (2005) Soft errors in advanced computer systems. IEEE Des Test Comput 22:258–266
Mukherjee S, Emer J, Reinhardt S (2005) The soft error problem: an architectural perspective. In: IEEE International Symposium on High-Performance Computer Architecture
Mukherjee S, Emer J, Fossum T, Reinhardt S (2004) Cache scrubbing in microprocessors: myth or necessity?. In: EEE International Symposium on Pacific Rim Dependable Computing
Li X, Adve S, Bose P, Rivers J (2007) Architecture-level soft error analysis: examining the limits of common assumptions. In: IEEE International Conference on Dependable Systems and Networks
Asadi H, Sridharan V, Tahoori M, Kaeli D (2005) Balancing performance and reliability in the memory hierarchy. In: The International Symposium on Performance Analysis of Systems and Software
Binkert N, Hsu L, Saidi A, Dreslinski R, Schultz A, Reinhardt S (2005) Analyzing NIC overheads in network-intensive workloads. In: The 8\(^{th}\) Workshop on Computer Architecture Evaluation using Commercial Workloads
Saidi A, Binkert N, Hsu L, Reinhardt S (2005) Performance validation of network-intensive workloads on a full-system simulator. Computer Science and Engineering Division (The University of Michigan)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Maghsoudloo, M., Zarandi, H.R. Cache vulnerability mitigation using an adaptive cache coherence protocol. J Supercomput 68, 1048–1067 (2014). https://doi.org/10.1007/s11227-014-1139-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1139-4