Abstract
Integrating the large number of transistor in a single chip leads to significant improvement on the performance of processors. More performance is achieved by putting multiple CPU cores on a single chip which is named as chip multiprocessor (CMP) architecture. On the other hand, miniaturization and integration of the large number of transistors in new silicons such as CMPs increase susceptibility to soft errors and degrade the reliability. Previous researches have exploited traditional redundancy techniques such as dual and triple cores redundancy to tolerate fault in CMP architecture while these methods impose significant performance and energy overheads. In this paper, we present a performance efficient soft error protection scheme for CMP architecture which is based on simultaneous multithreading. Fortunately, some of soft errors are masked at architectural level and don’t cause visible output error. Soft error masking effect can be used to decrease a lot of overheads in reliability enhancement techniques against soft errors. Recently, architectural vulnerability factor (AVF) is widely used for estimating the portion of soft errors which are masked. In this article, we propose a reliability aware CMP architecture which use online AVF estimation to specify level of protection. To meet system reliability demands, the estimated AVF is used to exploit partial redundancy against soft errors which leads to significant performance improvement. Also, we introduce a dynamic scheduling method for mapping threads on the cores to enhance total throughput of CMP architecture. Our dynamic scheduling applies thread migration among cores by simultaneous considering to the total vulnerability and throughput of cores. Thread migration between cores balances loads between cores and improves performance. Our experimental results on SPEC CPU2006 show up to 38 % improvement in core throughput in different phases of thread migration compared to static mapping of threads on the cores.
Similar content being viewed by others
References
Ning L, Yao W, Ni J, Yao N (2007) Fault-tolerance cmp architecture based on smt technology. In: IMSCCS, pp 425–429
Nguyen HT, Yagil Y (2003) A systematic approach to SER estimation and solutions. In: 41st annual IEEE international reliability physics symposium proceedings, 2003, pp 60–70
Karnik T, Hazucha P, Patel J (2004) Characterization of soft errors caused by single event upsets in CMOS processes. IEEE Trans Dependable Sec Comput 1(2):128–143
Naseer R, Draper J (2006) Df-dice: a scalable solution for soft error tolerant circuit design. In: Proceedings 2006 IEEE international symposium on circuits and systems, 2006. ISCAS 2006. IEEE, pp 3890–3893
Pouyan F, Azarpeyvand A, Safari S, Fakharie S (2015) Reliability-aware simultaneous multithreaded architecture using online architectural vulnerability factor estimation. IET Comput Digit Tech 9(2):124–133. doi:10.1049/iet-cdt.2013.0162
Ma Y, Gao H, Dimitrov M, Zhou H (2007) Optimizing dual-core execution for power efficiency and transient-fault recovery. IEEE Trans Parallel Distrib Syst 18(8):1080–1093
Zhou H (2006) A case for fault tolerance and performance enhancement using chip multi-processors. Comput Archit Lett 5(1):22–25
Sundaramoorthy K, Purser Z, Rotenberg E (2000) Slipstream processors improving both performance and fault tolerance. In: ASPLOS, pp 257–268
Gong R, Dai K, Wang Z (2008) Transient fault tolerance on chip multiprocessor based on dual and triple core redundancy. In: PRDC, pp 273–280
Mukherjee SS, Weaver CT, Emer JS, Reinhardt SK, Austin TM (2003) A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In: MICRO, ACM/IEEE, pp 29–42
Fu X, Li T, Fortes JAB (2006) Sim-soda: a unified framework for architectural level software reliability analysis. In: Workshop on modeling, benchmarking and simulation in conjunction with ISCA
Wang NJ, Mahesri A, SJ Patel (2007) Examining ACE analysis reliability estimates using fault-injection. In: Proceedings of 34th international symposium on computer architecture (34th ISCA’07), ACM SIGARCH, San Diego, California, USA, pp 460–469
Li X, Adve SV, Bose P, Rivers JA (2008) Online estimation of architectural vulnerability factor for soft errors. ISCA, IEEE 2008:341–352
Eggers SJ, Emer JS, Levy HM, Lo JL, Stamm RL, Tullsen DM (1997) Simultaneous multithreading: a platform for next-generation processors. IEEE Micro 17(5):12–19
Soundararajan N, Sivasubramaniam A, Narayanan V (2010) Characterizing the soft error vulnerability of multicores running multithreaded applications. In: SIGMETRICS, pp 379–380
Soundararajan N, Parashar A, Sivasubramaniam A (2007) Mechanisms for bounding vulnerabilities of processor structures. In: Proceedings of 34th international symposium on computer architecture (34th ISCA’07), ACM SIGARCH, San Diego, California, USA, pp 506–515
Biswas A, Soundararajan N, Mukherjee SS, Gurumurthi S (2009) Quantized avf: a means of capturing vulnerability variations over small windows of time. In: IEEE workshop on silicon errors in logic–system effects
Walcott KR, Humphreys G, Gurumurthi S (2007) Dynamic prediction of architectural vulnerability from microarchitectural state. In: Proceedings of 34th international symposium on computer architecture (34th ISCA’07), ACM SIGARCH, San Diego, California, USA, pp 516–527
Montesinos P, Liu W, Torrellas J (2007) Using register lifetime predictions to protect register files against soft errors. In: The 37th annual IEEE/IFIP international conference on dependable systems and networks, DSN 2007, 25–28 June 2007 Edinburgh, UK, Proceedings, pp 286–296
Malhotra S, Narkhede P, Shah K, Makaraju S, Shanmugasundaram M (2015) A review of fault tolerant scheduling in multicore systems 4(5):132–136
Reinhardt SK, Mukherjee SS (2000) Transient fault detection via simultaneous multithreading. In: ISCA, pp 25–36
Gaisler J (1997) Evaluation of a 32-bit microprocessor with built-in concurrent error-detection. In: Digest of Papers: FTCS-27, The twenty-seventh annual international symposium on fault-tolerant computing, Seattle, Washington, USA, June 24–27, pp 42–46
Fu X, Zhang W, Li T, Fortes JAB (2008) Optimizing issue queue reliability to soft errors on simultaneous multithreaded architectures. In: ICPP, pp 190–197
Mukherjee SS, Weaver CT, Emer JS, Reinhardt SK, Austin TM (2003) Measuring architectural vulnerability factors. IEEE Micro 23(6):70–75
Wang NJ, Quek J, Rafacz TM, Patel SJ (2004) Characterizing the effects of transient faults on a high-performance processor pipeline. In: DSN, pp 61
Mukherjee SS, Kontz M, Reinhardt SK (2002) Detailed design and evaluation of redundant multithreading alternatives. In: ISCA, pp 99–110
Gomaa MA, Vijaykumar TN (2006) Opportunistic transient-fault detection. IEEE Micro 26(1):92–99
Pan S, Hu Y, Li X (2009) Online computing and predicting architectural vulnerability factor of microprocessor structures. In: PRDC, IEEE Computer Society, pp 345–350
Gomaa MA, Vijaykumar TN (2005) Opportunistic transient-fault detection. In: ISCA, pp 172–183
Reinhardt SK, Mukherjee SS (2000) Transient fault detection via simultaneous multithreading. In: Proceedings of the 27th annual international symposium on computer architecture. IEEE Computer Society and ACM SIGARCH, Vancouver, pp 25–36
Vijaykumar TN, Pomeranz I, Cheng K (2002) Transient-fault recovery using simultaneous multithreading. In: ISCA, pp 87–98
Gomaa M, Scarbrough C, Vijaykumar TN, Pomeranz I (2003) Transient-fault recovery for chip multiprocessors. In: Proceedings of the 30th annual international symposium on computer architecture, ISCA ’03, pp 98–109
Gomaa MA, Powell MD, Vijaykumar TN (2004) Heat-and-run: leveraging SMT and CMP to manage power density through the operating system. In: Proceedings of the 11th international conference on architectural support for programming languages and operating systems, ASPLOS 2004, Boston, MA, USA, October 7–13, pp 260–270
Constantinou T, Sazeides Y, Michaud P, Fetis D, Seznec A (2005) Performance implications of single thread migration on a chip multi-core. SIGARCH Computer Architecture News 33(4):80–91
Strong RD, Mudigonda J, Mogul JC, Binkert NL, Tullsen DM (2009) Fast switching of threads between cores. Operating Systems Review 43(2):35–45
Sharkey J (2005) M-sim: a flexible, multithreaded architectural simulation environment, Tech. Rep. CS-TR-05-DP01, Department of Computer Science, State University of New York at Binghamton (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pouyan, F., Azarpeyvand, A., Safari, S. et al. Reliability aware throughput management of chip multi-processor architecture via thread migration. J Supercomput 72, 1363–1380 (2016). https://doi.org/10.1007/s11227-016-1665-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1665-3