Skip to main content

Advertisement

Log in

Partial-TMR: A New Method for Protecting Register Files Against Soft Error Based on Lifetime Analysis

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

High-energy particles in the space can easily cause soft error in register file (RF). As a critical structure in a processor, RF often stores data for long periods of time and is read frequently, resulting in a higher probability of spreading corrupted data to other parts of the processor. The triple modular redundancy (TMR) is a common and effective fault tolerance method that enables multi-bit error correction. Designing full TMR for all the registers could cause excessive area and power overheads. However, some registers in RF have less impact on processor reliability. Therefore, there is no need to design TMR for them. This paper designs an efficient strategy which can rate the registers in RF based on their vulnerability. Based on the proposed strategy, a new RF fault tolerance method named Partial-TMR formulates in this paper, which selectively protects more vulnerable registers against multi-bit error, and improves fault tolerance efficiency. For integer RF, Partial-TMR improves its soft error correction capability by 24.5% relative to the baseline system and 3% relative to ParShield, while for floating-point RF, the improvement comes to 5.17% and 0.58% respectively. The soft error correction capability of Partial-TMR is slightly lower than that of full TMR by 1% to 3%, but Partial-TMR significantly cuts the area and power overheads. Compared with full TMR, Partial-TMR decreases the area and power overheads by 71.6% and 64.9%, respectively. It also has little impact on the performance. Partial-TMR is a more cost-effective fault tolerance method compared with ParShield and full TMR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Rajaei N, Rajaei R, Tabandeh M. A soft error tolerant register file for highly reliable microprocessor design. International Journal of High Performance Systems Architecture, 2017, 7(3): 113-119. https://doi.org/10.1504/IJHPSA.2017.091479.

    Article  Google Scholar 

  2. Pham H. Optimal cost-effective design of triple-modular-redundancy-with-spares systems. IEEE Transactions on Reliability, 1993, 42(3): 369-374. https://doi.org/10.1109/24.257819.

    Article  MATH  Google Scholar 

  3. Jeon H, Ravi G S, Kim N S, Murali A. GPU register file virtualization. In Proc. the 48th International Symposium on Microarchitecture, December 2015, pp.420-432. https://doi.org/10.1145/2830772.2830784.

  4. Leng J, Gilani S, Hetherington T, ElTantawy A, Kim N S, Aamodt T M, Reddi V J. GPUWattch: Enabling energy optimizations in GPGPUs. In Proc. the 40th Annual International Symposium on Computer Architecture, June 2013, pp.487-498. https://doi.org/10.1145/2485922.2485964.

  5. Liu S, Reviriego P, Xiao L. Evaluating direct compare for double error correction codes. IEEE Transactions on Device and Materials Reliability, 2017, 17(4): 802-804. https://doi.org/10.1109/TDMR.2017.2756853.

    Article  Google Scholar 

  6. Montesinos P, Liu W, Torrellas J. Using register lifetime predictions to protect register files against soft errors. In Proc. the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 2007, pp.286-296. https://doi.org/10.1109/DSN.2007.99.

  7. Memik G, Kandemir M T, Ozturk O. Increasing register file immunity to transient errors. In Proc. the 2005 Design, Automation and Test in Europe Conference and Exposition, March 2005, pp.586-591. https://doi.org/10.1109/DATE.2005.181.

  8. Hu J, Wang S, Ziavras S G. In-register duplication: Exploiting narrow-width value for improving register file reliability. In Proc. the 2006 Int. Conf. Dependable Systems and Networks, June 2006, pp.281-290. https://doi.org/10.1109/DSN.2006.43.

  9. Esmaeeli S, Hosseini M, Vahdat B V, Rashidian B. A multi-bit error tolerant register file for a high reliable embedded processor. In Proc. the 18th IEEE International Conference on Electronics, Circuits, and Systems, December 2011, pp.532-537. https://doi.org/10.1109/ICECS.2011.6122330.

  10. Balkan D, Sharkey J, Ponomarev D, Ghose K. Selective writeback: Reducing register file pressure and energy consumption. IEEE Trans. Very Large Scale Integration Systems, 2008, 16(6): 650-661. https://doi.org/10.1109/TVLSI.2008.2000243.

    Article  Google Scholar 

  11. Lozano L A, Gao G R. Exploiting short-lived variables in superscalar processors. In Proc. the 28th Annual International Symposium on Microarchitecture, November 29-December 1, 1995, pp.292-302. https://doi.org/10.1109/MICRO.1995.476839.

  12. Tonfat J, Kastensmidt F L, Artola L et al. Analyzing the influence of the angles of incidence on SEU and MBU events induced by low LET heavy ions in a 28-nm SRAM-based FPGA. In Proc. the 16th European Conference on Radiation and Its Effects on Components and Systems, September 2016. https://doi.org/10.1109/RADECS.2016.8093186.

  13. Wu W, Seifert N. MBU-Calc: A compact model for Multi-Bit Upset (MBU) SER estimation. In Proc. the 2015 IEEE Int. Reliability Physics Symp., April 2015, pp.SE.2.1-SE.2.6. https://doi.org/10.1109/IRPS.2015.7112831.

  14. Abazari M A, Fazeli M, Patooghy A, Miremadi S G. An efficient technique to tolerate MBU faults in register file of embedded processors. In Proc. the 16th CSI Int. Symposium on Computer Architecture and Digital Systems, May 2012, pp.115-120. https://doi.org/10.1109/CADS.2012.6316430.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying-Ke Gao.

Supplementary Information

ESM 1

(PDF 131 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, XG., Gao, YK. & Hua, GX. Partial-TMR: A New Method for Protecting Register Files Against Soft Error Based on Lifetime Analysis. J. Comput. Sci. Technol. 36, 1089–1101 (2021). https://doi.org/10.1007/s11390-021-0852-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-021-0852-8

Keywords

Navigation