Abstract
Cache structures in a multicore system are more vulnerable to soft errors due to high transistor density. Protecting all caches unselectively has notable overhead on performance and energy consumption. In this study, we propose asymmetrically reliable caches to supply reliability need of the system using sufficient additional hardware under the performance and energy constraints. In our framework, a chip multiprocessor is composed of a high reliability core which has ECC protection, and a set of low reliability cores which have no protection on their data caches. Between two types of cores, there is also a middle-level reliability core which has only parity check. Application threads are mapped on the different cores in terms of reliability based on their critical data usage. The experimental results for selected applications show that our proposed techniques improve reliability with considerable performance and energy overhead on the average compared to traditional unsafe caches.
Similar content being viewed by others
References
Alameldeen, A.R., Wagner, I., Chishti, Z., Wu, W., Wilkerson, C., Lu, S.L.: Energy-efficient cache design using variable-strength error-correcting codes. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA ’11. ACM, New York, NY, USA (2011). doi:10.1145/2000064.2000118
Arslan, S., Topcuoglu, H., Kandemir, M., Tosun, O.: Performance and energy efficient asymmetrically reliable caches for multicore architectures. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), pp. 1025–1032 (2015). doi:10.1109/IPDPSW.2015.113
Arslan, S., Topcuoglu, H.R., Kandemir, M.T., Tosun, O.: Protecting Code Regions on Asymmetrically Reliable Caches, pp. 375–387. Springer, Cham (2016). doi:10.1007/978-3-319-30695-7_28
Asadi, G.H., Mehdi, V.S., Tahoori, B., Kaeli, D.: Balancing performance and reliability in the memory hierarchy. In: IEEE International Symposium on Performance Analysis of Systems and Software, 2005, ISPASS 2005, pp. 269–279 (2005). doi:10.1109/ISPASS.2005.1430581
Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, A., Hestness, J., Hower, D.R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M.D., Wood, D.A.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011). doi:10.1145/2024716.2024718
Cai, Y., Schmitz, M., Ejlali, A., Al-Hashimi, B., Reddy, S. (2006) Cache size selection for performance, energy and reliability of time-constrained systems. In: Asia and South Pacific Conference on Design Automation, 2006. doi:10.1109/ASPDAC.2006.1594804
Carbin, M., Misailovic, S., Rinard, M.C.: Verifying quantitative reliability for programs that execute on unreliable hardware. SIGPLAN Not. 48(10), 33–52 (2013). doi:10.1145/2544173.2509546
Ebrahimi, M., Evans, A., Tahoori, M.B., Costenaro, E., Alexandrescu, D., Chandra, V., Seyyedi, R.: Comprehensive analysis of sequential and combinational soft errors in an embedded processor. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34(10), 1586–1599 (2015). doi:10.1109/TCAD.2015.2422845
Eltawil, A.A., Engel, M., Geuskens, B., Djahromi, A.K., Kurdahi, F.J., Marwedel, P., Niar, S., Saghir, M.A.: A survey of cross-layer power-reliability tradeoffs in multi and many core systems-on-chip. Microprocess. Microsyst. 37(8), 760–771 (2013). doi:10.1016/j.micpro.2013.07.008
González, A., Aliagas, C., Valero, M.: A data cache with multiple caching strategies tuned to different types of locality. In: Proceedings of the 9th International Conference on Supercomputing, ICS ’95, pp. 338–347. ACM, New York (1995). doi:10.1145/224538.224622
Iqbal, S., Liang, Y., Grahn, H.: Parmibench—an open-source benchmark for embedded multiprocessor systems. Comput. Archit. Lett. 9(2), 45–48 (2010). doi:10.1109/L-CA.2010.14
de Kruijf, M., Nomura, S., Sankaralingam, K.: Relax: An architectural framework for software recovery of hardware faults. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pp. 497–508. ACM, New York, NY (2010). doi:10.1145/1815961.1816026
Lee, K., Shrivastava, A., Issenin, I., Dutt, N., Venkatasubramanian, N.: Mitigating soft error failures for multimedia applications by selective data protection. In: Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES ’06, pp. 411–420. ACM, New York, NY (2006). doi:10.1145/1176760.1176810
Leem, L., Cho, H., Bau, J., Jacobson, Q., Mitra, S.: Ersa: Error resilient system architecture for probabilistic applications. In: Design, Automation Test in Europe Conference Exhibition (DATE), 2010, pp. 1560–1565 (2010). doi:10.1109/DATE.2010.5457059
Leveugle, R., Calvez, A., Maistri, P., Vanhauwaert, P.: Statistical fault injection: quantified error and confidence. In: Design, Automation Test in Europe Conference Exhibition, 2009, DATE ’09, pp. 502–506 (2009). doi:10.1109/DATE.2009.5090716
Luo, Y., Govindan, S., Sharma, B., Santaniello, M., Meza, J., Kansal, A., Liu, J., Khessib, B., Vaid, K., Mutlu, O.: Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 467–478 (2014), doi:10.1109/DSN.2014.50
Meaney, P., Lastras-Montano, L., Papazova, V., Stephens, E., Johnson, J., Alves, L., O’Connor J., Clarke, W.: Ibm zenterprise redundant array of independent memory subsystem. IBM J. Res. Dev. 56(1.2):4:1–4:11 (2012). doi:10.1147/JRD.2011.2177106
Misailovic, S., Carbin, M., Achour, S., Qi, Z., Rinard, M.C.: Chisel: Reliability- and accuracy-aware optimization of approximate computational kernels. SIGPLAN Not. 49(10), 309–328 (2014). doi:10.1145/2714064.2660231
Muralimanohar, N., Balasubramonian, R., Jouppi, N.P.: Architecting efficient interconnects for large caches with cacti 6.0. IEEE Micro. 28(1), 69–79 (2008). doi:10.1109/MM.2008.2, http://dx.doi.org/10.1109/MM.2008.2
Naseer, R., Boulghassoul, Y., Draper, J., DasGupta, S., Witulski, A.: Critical charge characterization for soft error rate modeling in 90 nm sram. In: IEEE International Symposium on Circuits and Systems, 2007 (ISCAS 2007), pp. 1879–1882 (2007). doi:10.1109/ISCAS.2007.378282
Oz, I., Topcuoglu, H.R., Kandemir, M., Tosun, O.: Thread vulnerability in parallel applications. J. Parallel Distrib. Comput. 72(10), 1171–1185 (2012). doi:10.1016/j.jpdc.2012.05.002
Rehman, S., Kriebel, F., Shafique, M., Henkel, J.: Compiler-driven dynamic reliability management for on-chip systems under variabilities. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1–4 (2014). doi:10.7873/DATE.2014.119
Rehman, S., Kriebel, F., Shafique, M., Henkel, J.: Reliability-driven software transformations for unreliable hardware. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 33(11), 1597–1610 (2014). doi:10.1109/TCAD.2014.2341894
Sampson, A., Dietl, W., Fortuna, E., Gnanapragasam, D., Ceze, L., Grossman, D.: Enerj: Approximate data types for safe and general low-power computation. SIGPLAN Not. 46(6), 164–174 (2011). doi:10.1145/1993316.1993518
Shantharam, M., Srinivasmurthy, S., Raghavan, P.: Characterizing the impact of soft errors on iterative methods in scientific computing. In: Proceedings of the International Conference on Supercomputing, ICS ’11, pp. 152–161. ACM, New York, NY (2011). doi:10.1145/1995896.1995922
Shivakumar, P., Kistler, M., Keckler, S., Burger, D., Alvisi, L.: Modeling the effect of technology trends on the soft error rate of combinational logic. In: Proceedings of the International Conference on Dependable Systems and Networks, 2002. DSN 2002, pp. 389–398. doi:10.1109/DSN.2002.1028924
Suleman, M.A., Mutlu, O., Qureshi, M.K., Patt, Y.N.: Accelerating critical section execution with asymmetric multi-core architectures. SIGARCH Comput. Archit. News 37(1), 253–264 (2009). doi:10.1145/2528521.1508274
Ungsunan, P., Lin, C., Gai, Y., Kong, X.: Improving multi-core system dependability with asymmetrically reliable cores. In: International Conference on Complex, Intelligent and Software Intensive Systems, 2009. CISIS ’09, pp. 1252–1257 (2009). doi:10.1109/CISIS.2009.95
Wilkerson, C., Alameldeen, A.R., Chishti, Z., Wu, W., Somasekhar, D., Lu, S.l.: Reducing cache power with low-cost, multi-bit error-correcting codes. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pp. 83–93. ACM, New York, NY, USA (2010). doi:10.1145/1815961.1815973
Woo, S., Ohara, M., Torrie, E., Singh, J., Gupta, A.: The splash-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995, pp. 24–36 (1995)
Woo, S.C., Singh, J.P., Hennessy, J.L.: The performance advantages of integrating block data transfer in cache-coherent multiprocessors. SIGOPS Oper. Syst. Rev. 28(5), 219–229 (1994). doi:10.1145/381792.195547
Yetim, Y., Malik, S., Martonosi, M.: Commguard: Mitigating communication errors in error-prone parallel execution. SIGARCH Comput Archit News 43(1), 311–323 (2015). doi:10.1145/2786763.2694354
Yoon DH, Erez M (2009) Memory mapped ecc: Low-cost error protection for last level caches. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA ’09, pp. 116–127. ACM, New York, NY, USA. doi:10.1145/1555754.1555771
Yoon, D.H., Erez, M.: Virtualized ecc: Flexible reliability in main memory. Micro, IEEE 31(1), 11–19 (2011). doi:10.1109/MM.2010.103
Acknowledgments
This research was supported by The Scientific and Technological Research Council of Turkey (TUBITAK) with a research grant (Project Number: 113E530).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Arslan, S., Topcuoglu, H.R., Kandemir, M.T. et al. Asymmetrically reliable caches for multicore architectures under performance and energy constraints . Cluster Comput 19, 1819–1833 (2016). https://doi.org/10.1007/s10586-016-0641-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0641-2