Abstract
We propose a low cost concurrent error detection strategy to improve the Reliability, Availability, Serviceability (RAS) of high performance microprocessors, by specifically targeting one of its most critical blocks (from the point of view of the microprocessor RAS), that is the control logic. By discovering codes that are inherently present within the control logic because of its performed functionality and verification needs (referred to as Control Logic Function-Inherent Codes), it allows to achieve concurrent error detection at very limited costs in terms of area, power consumption, impact on performance and design. Considering for instance the case of the instruction decoder of a public domain microprocessor, we will prove that our approach requires significantly lower area and power than traditional parity encoding, while providing higher concurrent error detection ability. Therefore, if adopted together with a system level (generally software implemented) recovery technique, our strategy constitutes a viable and successful approach to increase the microprocessor RAS, at very limited costs.
Similar content being viewed by others
References
“2005 International Technology Roadmap for Semiconductors”, http://public.itrs.net/
Anderson DA, Metze G (1973) Design of Totally Self-Checking Circuits for m-Out-of-n Codes. IEEE Trans Comput 22(no. 3):263–269
Ando H, Yoshida Y, Inoue A, Sugiyama I, Asakawa T, Morita K, Muta T, Motokurumada T, Okada S, Yamashita H, Satsukawa Y, Konmoto A, Yamashita R, Sugiyama H (2003) A 1.3-GHz fifth-generation SPARC64 microprocessor. IEEE J Solid State Circuits 38(11):1896–1905
Batcher KE (1968) Sorting networks and their applications. Proc AFIPS Spring Joint Comput Conf 32:307–314
Bhunia S, Mukhopadhyay S, Roy K (2007) “Process Variations and Process-Tolerant Design”, in Proc IEEE Int Conf VLSI Design, pp. 699–704
Dodd PE, Massengill LW (2003) Basic mechanisms and modeling of single-event upset in digital microelectronics. IEEE Trans Nucl Sci 5(3):583–602
Drineas P, Makris Y (2003) SPaRe: selective partial replication for concurrent fault-detection in FSMs. IEEE Trans Instrum Meas 52(6):1729–1737
Dutta A, Touba NA (2007) “Multiple Bit Upset Tolerant Memory Using a Selective Cycle Avoidance Based SEC-DED-DAEC Code” Proc IEEE VLSI Test Symp pp.349–354
Illinois Verilog Model IVM1.1, http://www.crhc.uiuc.edu/ACS/tool
Kessler RE (1999) The Alpha 21264 microprocessor. IEEE Micro 19(2):24–26
Kim S (2006) “Area-Efficient Error Protection for Caches” Proc Des Autom Test Eur pp. 1–6
Lo J-C (1996) A hyper optimal encoding scheme for self-checking circuits. IEEE Trans Comput 45(No. 9):1022–1030
Mahmood A, McCluskey EJ (1988) Concurrent error detection using watchdog processors-a survey. IEEE Trans Comput 37(2):160–174
Matakias S, Tsiatouhas Y, Haniotakis Th, Arapoyanni A, Efthymiou A (2005) “Fast, Parallel Two-Rail Code Checker with Enhanced Testability” Proc IEEE Int On-Line Test Symp pp. 149–156
Mendelson A, Suri N (2000) “Designing High-Performance and Reliable Superscalar Architectures-the Out of Order Reliable Superscalar (O3RS) Approach” Proc Int Conf Depend Syst Netw pp. 25–28
Metra C, Rossi D, Omaña M, Jas A, Galivanche R (2008) “Function Inherent Codes: A New Low Cost On-Line Testing Approach For High Performance Microprocessor Control Logic” Proc 13th IEEE Eur Test Symp (ETS’08), 25–29 May, pp. 171–176
Metra C, Rossi D, Omaña M, Jas A, Galivanche R (2009) “Low Cost On-Line Testing of the Scheduler of High Performance Microprocessors,” Informal Proc IEEE Eur Test Symp
Mohanram K, Sogomonyan ES, Goessel M, Touba NA (2003) “Synthesis of Low-Cost Parity-Based Partially Self-Checking Circuits” Proc of 9th IEEE Int On-Line Test Symp pp. 35–40
Mueller M, Alves LC, Fischer W, Fair ML, Modi I (1999) RAS strategy for IBM S/390 G5 and G6. IBM J Res Dev 43(No. 5/6):875–888
Nikolos D (1989) “Design of Self-Testing Embedded Parity Checkers Using Two-Input XOR Gates” Proc Int Conf Fault-Tolerant Syst Diagn pp. 158–162
Oh N, Mitra S, McCluskey EJ (2002) ED4I: error detection by diverse data and duplication instructions. IEEE Trans Comput 51(2):180–199
Parulkar I, Cypher R (2005) “Trends and Trade-offs in Designing Highly Robust Throughput Computing Oriented Chips and Systems”Proc of 11th IEEE Int On-Line Test Symp
Piestrak SJ (1995) Design of self-testing checkers for unidirectional error detecting codes. Technical University of Warsaw, Warsaw
Quach N (2000) High availability and reliability in the itanium processor. IEEE Micro 20(No. 5):61–69
Rossi D, Omaña M, Berghella G, Metra C, Jas A, Chandra T, Galivanche R (2010) “Low Cost and Low Intrusive Approach to Test On-Line the Scheduler of High Performance Microprocessors” Proc ACM Int Conf Comput Front, Bertinoro, Italy
Sato T, Chiyonobu A, Joe K (2006) “Improving Instruction Issue Bandwith for Concurrent Error-Detecting Processors” Proc Int Workshop Innov Archit Futur Gener High Perform Process Syst pp. 21–28
Seifert N, Zhu X, Massengill LW (2002) Impact of scaling on soft-error rates in commercial microprocessors. IEEE Trans Nucl Sci 49(6):3100–3106
Smith JE, Metze G (1978) Strongly fault-secure logic networks. IEEE Trans Comput 27(6):491–499
Somani AK, Nickel J (2001) “REESE: a Method of Soft Error Detection in Microprocessors” Proc Int Conf Depend Syst Netw
Touba N, McCluskey EJ (1997) Logic synthesis of multilevel circuits with concurrent error detection. IEEE Trans CAD 16:783–789
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: M. Violante
This work was partially supported by Intel Corporation.
Rights and permissions
About this article
Cite this article
Rossi, D., Omaña, M., Garrammone, G. et al. Low Cost Concurrent Error Detection Strategy for the Control Logic of High Performance Microprocessors and Its Application to the Instruction Decoder. J Electron Test 29, 401–413 (2013). https://doi.org/10.1007/s10836-013-5355-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10836-013-5355-2