Skip to main content

Advertisement

Log in

Reliability aware throughput management of chip multi-processor architecture via thread migration

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Integrating the large number of transistor in a single chip leads to significant improvement on the performance of processors. More performance is achieved by putting multiple CPU cores on a single chip which is named as chip multiprocessor (CMP) architecture. On the other hand, miniaturization and integration of the large number of transistors in new silicons such as CMPs increase susceptibility to soft errors and degrade the reliability. Previous researches have exploited traditional redundancy techniques such as dual and triple cores redundancy to tolerate fault in CMP architecture while these methods impose significant performance and energy overheads. In this paper, we present a performance efficient soft error protection scheme for CMP architecture which is based on simultaneous multithreading. Fortunately, some of soft errors are masked at architectural level and don’t cause visible output error. Soft error masking effect can be used to decrease a lot of overheads in reliability enhancement techniques against soft errors. Recently, architectural vulnerability factor (AVF) is widely used for estimating the portion of soft errors which are masked. In this article, we propose a reliability aware CMP architecture which use online AVF estimation to specify level of protection. To meet system reliability demands, the estimated AVF is used to exploit partial redundancy against soft errors which leads to significant performance improvement. Also, we introduce a dynamic scheduling method for mapping threads on the cores to enhance total throughput of CMP architecture. Our dynamic scheduling applies thread migration among cores by simultaneous considering to the total vulnerability and throughput of cores. Thread migration between cores balances loads between cores and improves performance. Our experimental results on SPEC CPU2006 show up to 38 % improvement in core throughput in different phases of thread migration compared to static mapping of threads on the cores.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Ning L, Yao W, Ni J, Yao N (2007) Fault-tolerance cmp architecture based on smt technology. In: IMSCCS, pp 425–429

  2. Nguyen HT, Yagil Y (2003) A systematic approach to SER estimation and solutions. In: 41st annual IEEE international reliability physics symposium proceedings, 2003, pp 60–70

  3. Karnik T, Hazucha P, Patel J (2004) Characterization of soft errors caused by single event upsets in CMOS processes. IEEE Trans Dependable Sec Comput 1(2):128–143

    Article  Google Scholar 

  4. Naseer R, Draper J (2006) Df-dice: a scalable solution for soft error tolerant circuit design. In: Proceedings 2006 IEEE international symposium on circuits and systems, 2006. ISCAS 2006. IEEE, pp 3890–3893

  5. Pouyan F, Azarpeyvand A, Safari S, Fakharie S (2015) Reliability-aware simultaneous multithreaded architecture using online architectural vulnerability factor estimation. IET Comput Digit Tech 9(2):124–133. doi:10.1049/iet-cdt.2013.0162

    Article  Google Scholar 

  6. Ma Y, Gao H, Dimitrov M, Zhou H (2007) Optimizing dual-core execution for power efficiency and transient-fault recovery. IEEE Trans Parallel Distrib Syst 18(8):1080–1093

    Article  Google Scholar 

  7. Zhou H (2006) A case for fault tolerance and performance enhancement using chip multi-processors. Comput Archit Lett 5(1):22–25

    Google Scholar 

  8. Sundaramoorthy K, Purser Z, Rotenberg E (2000) Slipstream processors improving both performance and fault tolerance. In: ASPLOS, pp 257–268

  9. Gong R, Dai K, Wang Z (2008) Transient fault tolerance on chip multiprocessor based on dual and triple core redundancy. In: PRDC, pp 273–280

  10. Mukherjee SS, Weaver CT, Emer JS, Reinhardt SK, Austin TM (2003) A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In: MICRO, ACM/IEEE, pp 29–42

  11. Fu X, Li T, Fortes JAB (2006) Sim-soda: a unified framework for architectural level software reliability analysis. In: Workshop on modeling, benchmarking and simulation in conjunction with ISCA

  12. Wang NJ, Mahesri A, SJ Patel (2007) Examining ACE analysis reliability estimates using fault-injection. In: Proceedings of 34th international symposium on computer architecture (34th ISCA’07), ACM SIGARCH, San Diego, California, USA, pp 460–469

  13. Li X, Adve SV, Bose P, Rivers JA (2008) Online estimation of architectural vulnerability factor for soft errors. ISCA, IEEE 2008:341–352

    Google Scholar 

  14. Eggers SJ, Emer JS, Levy HM, Lo JL, Stamm RL, Tullsen DM (1997) Simultaneous multithreading: a platform for next-generation processors. IEEE Micro 17(5):12–19

    Article  Google Scholar 

  15. Soundararajan N, Sivasubramaniam A, Narayanan V (2010) Characterizing the soft error vulnerability of multicores running multithreaded applications. In: SIGMETRICS, pp 379–380

  16. Soundararajan N, Parashar A, Sivasubramaniam A (2007) Mechanisms for bounding vulnerabilities of processor structures. In: Proceedings of 34th international symposium on computer architecture (34th ISCA’07), ACM SIGARCH, San Diego, California, USA, pp 506–515

  17. Biswas A, Soundararajan N, Mukherjee SS, Gurumurthi S (2009) Quantized avf: a means of capturing vulnerability variations over small windows of time. In: IEEE workshop on silicon errors in logic–system effects

  18. Walcott KR, Humphreys G, Gurumurthi S (2007) Dynamic prediction of architectural vulnerability from microarchitectural state. In: Proceedings of 34th international symposium on computer architecture (34th ISCA’07), ACM SIGARCH, San Diego, California, USA, pp 516–527

  19. Montesinos P, Liu W, Torrellas J (2007) Using register lifetime predictions to protect register files against soft errors. In: The 37th annual IEEE/IFIP international conference on dependable systems and networks, DSN 2007, 25–28 June 2007 Edinburgh, UK, Proceedings, pp 286–296

  20. Malhotra S, Narkhede P, Shah K, Makaraju S, Shanmugasundaram M (2015) A review of fault tolerant scheduling in multicore systems 4(5):132–136

  21. Reinhardt SK, Mukherjee SS (2000) Transient fault detection via simultaneous multithreading. In: ISCA, pp 25–36

  22. Gaisler J (1997) Evaluation of a 32-bit microprocessor with built-in concurrent error-detection. In: Digest of Papers: FTCS-27, The twenty-seventh annual international symposium on fault-tolerant computing, Seattle, Washington, USA, June 24–27, pp 42–46

  23. Fu X, Zhang W, Li T, Fortes JAB (2008) Optimizing issue queue reliability to soft errors on simultaneous multithreaded architectures. In: ICPP, pp 190–197

  24. Mukherjee SS, Weaver CT, Emer JS, Reinhardt SK, Austin TM (2003) Measuring architectural vulnerability factors. IEEE Micro 23(6):70–75

    Article  Google Scholar 

  25. Wang NJ, Quek J, Rafacz TM, Patel SJ (2004) Characterizing the effects of transient faults on a high-performance processor pipeline. In: DSN, pp 61

  26. Mukherjee SS, Kontz M, Reinhardt SK (2002) Detailed design and evaluation of redundant multithreading alternatives. In: ISCA, pp 99–110

  27. Gomaa MA, Vijaykumar TN (2006) Opportunistic transient-fault detection. IEEE Micro 26(1):92–99

    Article  Google Scholar 

  28. Pan S, Hu Y, Li X (2009) Online computing and predicting architectural vulnerability factor of microprocessor structures. In: PRDC, IEEE Computer Society, pp 345–350

  29. Gomaa MA, Vijaykumar TN (2005) Opportunistic transient-fault detection. In: ISCA, pp 172–183

  30. Reinhardt SK, Mukherjee SS (2000) Transient fault detection via simultaneous multithreading. In: Proceedings of the 27th annual international symposium on computer architecture. IEEE Computer Society and ACM SIGARCH, Vancouver, pp 25–36

  31. Vijaykumar TN, Pomeranz I, Cheng K (2002) Transient-fault recovery using simultaneous multithreading. In: ISCA, pp 87–98

  32. Gomaa M, Scarbrough C, Vijaykumar TN, Pomeranz I (2003) Transient-fault recovery for chip multiprocessors. In: Proceedings of the 30th annual international symposium on computer architecture, ISCA ’03, pp 98–109

  33. Gomaa MA, Powell MD, Vijaykumar TN (2004) Heat-and-run: leveraging SMT and CMP to manage power density through the operating system. In: Proceedings of the 11th international conference on architectural support for programming languages and operating systems, ASPLOS 2004, Boston, MA, USA, October 7–13, pp 260–270

  34. Constantinou T, Sazeides Y, Michaud P, Fetis D, Seznec A (2005) Performance implications of single thread migration on a chip multi-core. SIGARCH Computer Architecture News 33(4):80–91

    Article  Google Scholar 

  35. Strong RD, Mudigonda J, Mogul JC, Binkert NL, Tullsen DM (2009) Fast switching of threads between cores. Operating Systems Review 43(2):35–45

    Article  Google Scholar 

  36. Sharkey J (2005) M-sim: a flexible, multithreaded architectural simulation environment, Tech. Rep. CS-TR-05-DP01, Department of Computer Science, State University of New York at Binghamton (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Azarpeyvand.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pouyan, F., Azarpeyvand, A., Safari, S. et al. Reliability aware throughput management of chip multi-processor architecture via thread migration. J Supercomput 72, 1363–1380 (2016). https://doi.org/10.1007/s11227-016-1665-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1665-3

Keywords

Navigation