Abstract
Increasing vulnerability of transistors and interconnects due to scaling is continuously challenging the reliability of future microprocessors. Lifetime reliability is gaining attention over performance as a design factor even for lower-end commodity applications. In this work we present a low-power hybrid fault tolerant architecture for reliability improvement of pipelined microprocessors by protecting their combinational logic parts. The architecture can handle a broad spectrum of faults with little impact on performance by combining different types of redundancies. Moreover, it addresses the problem of error propagation in nonlinear pipelines and error detection in pipeline stages with memory interfaces. Our case-study implementation of a fault tolerant MIPS microprocessor highlights four main advantages of the proposed solution. It offers (i) 11.6 % power saving, (ii) improved transient error detection capability, (iii) lifetime reliability improvement, and (iv) more effective fault accumulation effect handling, in comparison with TMR architectures. We also present a gate-level fault-injection framework that offers high fidelity to model physical defects and transient faults.
Similar content being viewed by others
References
Avirneni NDP, Somani AK (2012) Low overhead soft error mitigation techniques for high-performance and aggressive designs. IEEE Trans Comput 61(4):488–501
E. Balaji and P. Krishnamurthy (1996) Modeling ASIC memories in VHDL. In: Proc. EURO-VHDL Design Automation Conference, pp. 502–508
J. A. Blome, S. Feng, S. Gupta, S. Mahlke (2006) Online timing analysis for wearout detection. In: Proc. of the 2nd Workshop on Architectural Reliability
Bubrova E (2013) “Hardware redundancy,” in Fault-Tolerant Design. Springer, New York
D. Ernst, Nam Sung Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner, T. Mudge, (2003) Razor: a low-power pipeline based on circuit-level timing speculation. In: Proc. of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 7–18
Introduction to Single-Event Upsets, white paper, Altera Corp (2013) http://www.altera.com/literature/lit-index.html.
K. John, H.K. Chris (2011) Transistor Aging, IEEE Spectrum, http://spectrum.ieee.org.
Johnson BW (1989) Design techniques to achieve fault tolerance. In: Design and analysis of Fault-Tolerant Digital Systems. Addison-Wesley Pub Comp. Inc, USA, pp. 67–68
Li M-L., P. Ramachandran, U.R. Karpuzcu, S.K.S. Hari, S.V. Adve (2009) Accurate microarchitecture-level fault modeling for studying hardware faults. In: Proc. of the 15th IEEE International Symposium on High Performance Computer Architecture, pp. 105–116
P. Liden et al. (1994) On latching probability of particle induced transients in combinational networks. In: Proc. of the Symp on Fault-Tolerant Computing, pp. 340–349
M. Mehrara, M. Attariyan, S. Shyam, K. Constantinides, V. Bertacco and T. Austin(2007) Low-Cost Protection for SER Upsets and Silicon Defects. In: Proc. of the Design, Automation & Test in Europe Conference, pp. 1–6
S. Mitra, E.J. McCluskey (2000) Word-voter: a new voter design for triple modular redundant systems. In: Proc. of the 18th IEEE VLSI Test Symposium, pp. 465–470
M. Prvulovic et al. (2002) ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors. In: Proc. of the Int Symp on Computer Architecture, pp. 111–122
Semiconductor Industry Association (2010) International Technology Roadmap for Semiconductors (ITRS)
P. Shivakumar, M. Kistler, S.W. Keckler, D. Burger and L. Alvisi (2002) Modeling the effect of technology trends on the soft error rate of combinational logic. In: Proc. of the Int Conf on Dependable Systems and Networks, pp. 389–398, .
V. Subramanian, A.K. Somani (2008) Conjoined Pipeline: Enhancing Hardware Reliability and Performance through Organized Pipeline Redundancy. In: Proc. 14th IEEE Pacific Rim International Symposium on Dependable Computing, pp. 9–16
D. A. Tran, A. Virazel, A. Bosio, L. Dilillo, P. Girard, S. Pravossoudovitch and H.-J. Wunderlich (2011) A hybrid fault tolerant architecture for robustness improvement of digital circuits. In: Proc. of the Asian Test Symposium, pp. 136–141
D. A. Tran, A. Virazel, A. Bosio, L. Dilillo, P. Girard, A. Todri, M.E. Imhof and H.-J. Wunderlich (2012) A pseudo-dynamic comparator for error detection in fault tolerant architectures. In: Proc. of the VLSI Test Symposium, pp. 50–55
J. Velamala, R. LiVolsi, M. Torres and Yu Cao (2011) Design sensitivity of Single Event Transients in scaled logic circuits. In: Proc. of the Design Automation Conference, pp. 694–699
I. Wali, A. Virazel, A. Bosio, L. Dilillo, P. Girard, A. Todri (2014) Protecting combinational logic in pipelined microprocessor cores against transient and permanent faults,. In: Proc. of the Int. Symp. on Design and Diagnostics of Electronic Circuits & Systems, pp. 223, 225
Wirth G, Kastensmidt L, Fernanda IR (2008) Single event transients in logic circuits—load and propagation induced pulse broadening. IEEE Trans Nucl Sci 55(6):2928–2935
Yao J, Shimada H, Kobayashi K (2010) A stage-level recovery scheme in scalable pipeline modules for high dependability. In: Proc. of the Int Workshop on Innovative Architecture for Future Generation High Performance, pp.21–29
Yao J et al. (2012) DARA: A low-cost reliable architecture based on unhardened devices and its case study of radiation stress test. IEEE Trans Nucl Sci 59(6):2852–2858
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: M. Abadir
Rights and permissions
About this article
Cite this article
Wali, I., Virazel, A., Bosio, A. et al. A Hybrid Fault-Tolerant Architecture for Highly Reliable Processing Cores. J Electron Test 32, 147–161 (2016). https://doi.org/10.1007/s10836-016-5578-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10836-016-5578-0