# Detecting Intermittent Resistive Faults in Digital CMOS Circuits

Hassan Ebrahimi, Alireza Rohani and Hans G. Kerkhoff Testable Design and Test of Integrated Systems (TDT) Group University of Twente, Centre for Telematics and Information Technology (CTIT) Enschede, the Netherlands Email: {h.ebrahimi, a.rohani, h.g.kerkhoff}@utwente.nl

Abstract—Interconnection reliability threats dependability of highly critical electronic systems. One of most challenging interconnection-induced reliability threats are intermittent resistive faults (IRFs). The occurrence rate of this kind of defects can take e.g. one month, and the duration of defects can be as short as a few nanoseconds. As a result, evoking and detecting these faults is a big challenge. IRFs can cause timing deviations in data paths in digital systems during its operating time. This paper proposes an online digital slack monitor which is able to detect small timing deviations caused by IRFs in digital systems. The simulation results show that the proposed monitor is effective in detecting IRFs.

Keywords—Dependability; Reliability; No Faults Found; Intermittent Resistive Faults; Intermittent Fault Detection

# I. INTRODUCTION

Interconnection reliability issues become extremely important as semiconductor technology scales. Smaller interconnect dimensions, shrinking geometries, and reduced voltage and noise margins threat the dependability of electronic integrated systems. In electronic systems like system chips and printedcircuit boards, interconnection wiring is heavily dominating the infrastructure and therefore faults in these parts are extremely important.

One of the most challenging interconnect reliability issues that threat dependability of highly-dependable systems are intermittent resistive fault (IRF). IRFs are a specific category of No Fault Founds (NFFs). Like other types of NFFs, they result in many product returns in car and avionic industries; moreover, evoking and detection of these faults are highly time and cost consuming [1].

Marginal or unstable interconnections are the most likely cause of IRFs. In advanced integrated circuits, as well as printed-circuit boards, there are a high number of interconnection wires and vias. Electro migration, corrosion, temperature and mechanical stress cause more increased instability.

IRFs manifest themselves as a sequence (burst) of low-level resistance changes in an interconnection. IRFs might occur randomly during system operational time in any interconnect. IRFs emerge repetitively in a location and they gradually become more severe during the life time of the system. Finally they can evolve in a permanent fault [2]. Therefore, it is very important to detect and repair these faults before they become permanent and result in a system failure especially in safety-critical systems.

Intermittent fault (IF) detection is very challenging. IFs are not deterministic and may not appear during testing. Therefore, for IF detection, highly time and cost consuming techniques such as periodic testing [3] or online monitoring are required. In the case of periodic testing, the probability of detecting IRFs increases whereas in online monitoring technique the behavior of a circuits is monitored by different embedded sensors during the operational time.

IRF online monitoring may be performed at board- or chiplevel. One approach for IRF detection at board-level has been published by us in [4]. We suggested an enhanced version of IEEE standard 1149.4 to allow online monitoring of wiring tracks in boards. Here, our investigation on IRF detection at chip-level is researched. The main contributions of this paper include small delay detection and distinguishing the cause of a timing deviation whether it is due to an IRF or an aging fault.

The rest of the paper is organized as follows. Section II reviews related works and the background of IRFs and delay-fault detection. The proposed sensor is introduced in section III. In Section IV, first a generic simulation model for IRFs is introduced, then fault simulation results as well as aging detection are presented for IRFs. The paper is concluded with section V.

# II. RELATED WORKS

Several papers have studied the influence of intermittent faults on digital systems [5]-[7]. In [5] the authors studied the impact of intermittent faults on the behavior of a reduced instruction set computing (RISC) microprocessor by using VHDL-based fault injection. The authors of [6] proposed a metric intermittent vulnerability factor to characterize the vulnerability of microprocessor structures to intermittent faults. Intermittent fault models at logic and RTL abstraction levels have been generated in [7].

However, none of the previous work has considered the problem of IRFs in detail. We have proposed a model for IRFs and analyzed the influence of IRFs on analogue [8] as well as digital circuits [9] at the transistor level. In [4], we presented an extension of the mixed-signal boundary-scan standard, IEEE 1149.4, to detect IRFs in boards. In this paper, we continue the investigation of IRF detection and introduce an IRF detection technique at chip-level.

IRFs occurrences in interconnections of a chip result to timing deviation in circuits paths. There are several approaches toward measuring path delay by using dedicated measurement



Fig. 1. An example of a data path with slack monitor and time window generator



Fig. 2. The proposed slack monitor

circuits such as ring oscillator [11], time-to-digital converter [12], and tunable replica circuits [13]. These approaches can be used to measure the effect of process, temperature and voltage variation and aging degradation [11, 13]. However, they cannot simply be used for IRF detection. As IRFs can occur randomly in time and location, an online monitor should always monitor path activities to be able to detect fault occurrences.

Yet another approach towards measuring the timing errors is slack monitors accompanied by a guard-band. Most of the aging faults increase transistors and connections' delay. In a synchronous system, a delay increment can result in timing failure. Timing failure occurs when the delayed data does not meet a flip-flop's setup requirement and has a late transition near to the clock edge. Therefore, online delay (or slack) monitoring in an integrated circuit is a suitable metric for measuring the aging of synchronous circuits [13], [14].

The timing slack or guard-band is defined as the delay, between the data arrival time and active edge of the clock, minus the flip-flop setup time. The guard-band assignment is done at the pre-silicon design phase based on the target clock frequency. Recent works [15], [16] show slack monitoring methods are effective in detecting timing errors such as aging and process and voltage variations. In [15] a timing slack monitoring methodology of inserting monitors at both path ending nets and path intermediate nets is presented. In [16], the authors presented a digital slack monitor which is able to measure the slack of a selected path. However, their sensor is able to detect only rise transitions and it needs re-initialization for each slack monitoring procedure.

In this paper, we have investigated the usage of slack monitors for IRF detection. We have proposed a digital slack monitor which is able to detect aging and IRFs.



Fig. 3. Scheme of the intermittent resistive faults model generator for injection in simulations [8]

### III. IRF DETECTION AT CHIP-LEVEL

The proposed slack monitor is composed of a sensor and a time window generator (Figure 1). The sensor is inserted beside the D-flip-flop at the end of the data path of a design. It continuously monitors the input and the output of the D-flipflop within a timing window provided by the time window generator to make sure there is no slack violation during operation.

A positive timing slack indicates that a circuit is operating safely, with a margin by which the delay can increase before causing a failure. Timing slack is an excellent measure for the health of a circuit. Measuring timing slack can allow an early warning of deterioration and trigging pre-emptive actions to avoid failure because of aging or IRFs. A small or negative quantity of slack is an indication that the circuit is close to, or beyond the point of failure.

An online slack measurement sensor to measure the timing slack (delay-related) at critical nodes in the circuit has been designed. It is capable of measuring the effects of delay variation resulting from aging and IRFs.

The proposed sensor provides a warning in case of a specified guard-band (safety margin) is violated, which is a sign of an impending timing failure. It consists of four D-flip-flops (see Figure 2). Each flip-flop receives its clock from the previous output of a buffer, except the first flip-flop whose clock is connected to the *CaptureEnable* line. As a result, each D-flip-flop clock is delayed by the delay of one buffer. As all D-inputs are connected to the Data line, all D-flip-flops in the sensor can capture the signal on the Data line at different times.

If there is not any slack violation in the data line, then all flipflops capture the same value. If the delay of target data path (see Figure 1) is sufficiently increased, the first flip-flop of the sensor will latch the wrong value; then the comparison of the outputs of this flip-flop and the path's flip-flop will indicate a guard-band violation. The guard-band window (CaptureEnable) is provided by a time window generator.

#### **IV. SIMULATION RESULTS**

#### A. Intermittent Resistive Faults Model

Several examples of measured IRFs have been presented in [17], [9]. Based on this experimental data, we developed a software module to generate these faults in a Cadence

TABLE I. RANGE OF USED PARAMETERS IN GENERATOR DURING FAULT SIMULATION

| Parameter    | Minimum | Maximum         | Distribution |
|--------------|---------|-----------------|--------------|
| Start time   | 1 ns    | 10 ns           | Uniform      |
| Resistance   | 100 Ω   | $100 \ k\Omega$ | Uniform      |
| T-Active     | 0.1 ns  | 2 ns            | Uniform      |
| T-Inactive   | 0.1 ns  | 2 ns            | Uniform      |
| Burst lenght | 1       | 20              | Uniform      |
| Safe time    | 1 ns    | (years)         | Uniform      |

Virtuoso environment. Figure 3 shows the basic scheme of the IRF injector. There are six parameters that can be set by determining their minimum and maximum possible values. In addition, according to the specific requirements, any type of (random) distribution such as uniform and Gaussian can be chosen.

One example of the values and distributions applied for the simulations in this paper are listed in Table 1. After a random start-time such as 1 ns or more passes in simulation time, the burst of resistance pulses starts. Each pulse of the burst has a random resistance value R witch is active during a random activation time (T-active). After each pulse, an inactivation time (T-inactive) is randomly generated in which a fault-free situation exists (R=100 $\Omega$ ). In the case of a burst (burst length >1), there is a feedback loop and the same procedure will be followed again. The IRF generation procedure is completed by generating a fault-free situation at the end.

The model has been implemented in Verilog-A allows replacing a normal wire in the net list by one including an IRF. By using this model, analogue mixed-signal [8] as well as digital circuits [9] can be evaluated at the transistor level. The next sub-section will show some results of IRF simulation and the validation functionality of the proposed IRF detection sensor.

### B. Intermittent Resistive Fault Simulation

In order to evaluate the ability of the proposed slack monitor sensor to detect IRFs, a well-known concept of fault injection, simulation-based fault injection [7], is used. As an example, an AES-128 encoder circuit in the 45nm Nangate CMOS technology [10] has been used. After synthesis by Synopsys Design Compiler, the circuit operates at a clock frequency of 1.6 GHz.

IRF simulation and the sensor validation has been performed at transistor-level using Virtuoso Cadence. Several critical path



Fig. 4. Used IRF at simulation

and near critical paths were selected based on area constraint and the design's timing information. The proposed sensor was inserted at the endpoint of the target paths and location of IRF was selected randomly in the paths. The proposed sensor was inserted at the endpoint of the target paths and location of IRF was selected randomly in the paths. One example of a simulation result is shown in Figure 5. Figure 4 shows the injected IRF in this simulation. It shows a burst of five changes in resistance from 1 k $\Omega$  to 100 k $\Omega$  during 10 ns.

In Figure 5, the clock of the system is shown on the top, other signals from top to down are as follows. CaptureEnable indicates when the proposed sensor should be active. Input is a sample of data transitions in the beginning of the critical path. Input\* shows the signal Input after the IRF (Figure 4) injection. Signals Output and Output\* are the outputs of the path in faultfree and faulty cases, respectively. Output-FF is the signal captured by the flip-flop at the end of the path in a case of fault, as it can be seen, there is not any functional error in the signal although two small delay degradations are detected by our sensor. Signal Warning is the output of the proposed sensor. It shows the sensor detected two late transition on the data (signal Output\*). The last four signals (Q1, Q2, Q3 and Q4) are the outputs of flip-flops of the sensor. The captured values by these flip-flops show the value of degradation and path-slack reduction.

At the time 3ns, the sensor captures a slack violation. The value of "1100" is stored by flip-flops of the sensor. Two flip-flops captured a wrong value. It means that the amount of the path's slack is violated as much as two buffer delays. Figure 4 shows that the IRF at this time induced a very high resistance (80 k $\Omega$ ). Similarly, at 8 ns, a slack violation is detected as much as 1 buffer delay because the value of "1000" is stored in the sensor's flip-flops. By referring to Figure 4, it can be seen, the value of the IRF has been near 70k $\Omega$  at 8 ns.

# C. Aging Simulation

The experimental results in [18] show that NBTI is the primary factor leads to timing degradation in current technologies. NBTI has been shown to cause shifts in threshold voltages of up to 50mV over an operating lifespan of 10 years in 65nm technologies. To model the delay induced by aging



Fig. 5. Simulated waveform results of the proposed slack monitor after IRF injection



Fig. 6. Simulated waveform results of the proposed slack monitor after NBTI aging

faults (NBTI), the Vth of all PMOSs in the target circuit and the monitor have been increased by 50mV in our Cadence simulator. Figure 6 shows the Cadence Virtuoso simulation results of a combinational circuit where its transistors are subjected to the aging faults. The output (logic) behavior of the circuit as well as the proposed slack monitor sensor has been evaluated. The signals in Figure 6 are the same as ones in Figure 5.

In Figure 5, signal Output shows the simulation results of the target circuit before aging. In this case, there is not any timing (guard-band) violation detected by the monitor, as to be expected. Signal Output\* shows the simulation results of the target data path after aging. By looking at the Warning signal it can be seen that the proposed sensor has detected some timing violations. The first flip-flop of the sensor has captured wrong values but other three flip-flops have not captured any violation.

From the above simulations, it can be concluded that for aging detection one flip-flop for each sensor is sufficient. Because, the amount of timing violations in aging faults are not random and in fact gradually increase during a system lifetime. Whereas in IRFs, the amount of timing violations change randomly and several numbers of flip-flops are required to capture violation.

In both aging and IRF detection, after a timing violation a warning flag raises. It enables the amount of slack violation be sorted in sensor's flip-flop transfer to the iJTAG [19] registers. Therefore, by extracting slack information and the fault locations using iJTAG standard, one can distinguish the type of fault whether is it aging or intermittent fault.

# V. CONCLUSION

In this paper, we presented a digital online slack monitor to detect one of most challenging interconnection reliability issues i.e. intermittent resistive fault (IRF). Early-stage detection of this fault can prevent catastrophic failures in safety-critical systems. The simulation results show that the presented sensor can detect small timing deviation produce by IRFs or aging in data paths. In future, the proposed sensor in actual hardware by hardware-based fault injection will be evaluated.

### ACKNOWLEDGMENT

This research was carried out within the FP7 BASTION project (#619871), financed by the European Committee (EC) and the Netherlands Enterprise Agency (RVO).

#### REFERENCES

- S. Davidson, "Towards and understanding of no trouble found devices," Proc. VLSI Test Symp. (VTS), Palm Springs, California, USA, 2005, pp. 147-152.
- [2] Ridgetop Group Inc., SJ BIST, White Paper Presentation, 2013.
- [3] N. Kranitis, A. Merentitis, et al., "Optimal periodic testing of intermittent faults in embedded pipelined processor applications," in the Proc. of Design, Automation and Test in Europe (DATE), 2006. pp. 65-70.
- [4] H.G. Kerkhoff and H. Ebrahimi, "Detection of Intermittent Faults in Electronic Systems based on the Mixed-Signal Boundary-Scan Standard", in Proc. of the Asian Quality Electronic Design Conference (ASQED), 2015, pp. 77-82.
- [5] J. Gracia-Moran, J. C. Baraza-Calvo, D. Gil-Tomas, L. J. Saiz-Adalid and P. J. Gil-Vicente, "Effects of intermittent faults on the reliability of a reduced instruction set computing (RISC) microprocessor," IEEE Trans. on Reliab. 63, 2014, pp. 144-153.
- [6] S. Pan, Y. Hu and X. Li, "IVF: Characterizing the vulnerability of microprocessor structures to intermittent faults," IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 20, 2012, pp. 777-790.
- [7] D. Gil-Toms, J. Gracia-Morn and P.-J. Gil-Vicente, "Studying the effects of intermittent faults on a microcontroller," Microelectron. Reliab. 52, 2012, pp. 28372846.
- [8] J. Wan and H. G. Kerkhoff, "The influence of no fault found in analogue CMOS circuits," Proc. IEEE Int. Mixed-Signal Test Workshop (IMSTW), 2014, pp. 1-6.
- [9] H.G. Kerkhoff and H. Ebrahimi, "Investigation of Intermittent Resistive Faults in Digital CMOS Circuits", Journal of Circuits, Systems and Computers (JCSC), vol. 25, n 3, 2015, pp. 1640023.
- [10] Nangate Inc., Sunnyvale, CA, "Nangate Open Cell Library," 2008. [Online] Available: http://www.nangate.com/
- [11] T. T. H. Kim, L. Pong-Fei, K. A. Jenkins, and C. H. Kim, "A Ring-Oscillator-Based Reliability Monitor for Isolated Measurement of NBTI and PBTI in High-k/Metal Gate Technology," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, 2015, pp. 1360-1364.
- [12] K. Katoh and K. Namba, "A low area calibration technique of TDC using variable clock generator for accurate on-line delay measurement," in International Symposium on Quality Electronic Design (ISQED), 2015, pp. 430-434.
- [13] K.A. Bowman et al., "Energy-efficient and metastability-immune resilient circuits for dynamic variation tolerance," in IEEE Journal of Solid-State Circuits, vol. 44, 2009, pp. 49-63.
- [14] A. Rahimi, L. Benini and R. Gupta, "Application-adaptive guadbanding to mitigate static and dynamic variability," IEEE Transactions on Computers, vol. 63, 2014, pp. 2160-2173.
- [15] L. Lai, V. Chandra, R.C. Aitken and P. Gupta, "SlackProbe: a flexible and efficient in-situ timing slack monitoring methodology," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 33, 2014, pp. 1168-1179.
- [16] M. Sadi, L. Winemberg, and M. Tehranipoor, "A robust digital sensor IP and sensor insertion flow for in-situ path timing slack monitoring in SoCs," in VLSI Test Symposium (VTS), 2015, pp. 1-6.
- [17] Accenture Report, "Big trouble with 'no trouble found' returns" (2008), http://www.accenture.com/SiteCollectionDocuments/PDF/Accenture Returns Repairs.pdf.
- [18] E. A. Stott, J. S. Wong, P. Sedcole, and P. Y. Cheung. "Degradation in FPGAs: measurement and modelling," in ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), 2010, pp. 229-238.
- [19] K. Shibin, S. Devadze, and A. Jutman, "Asynchronous Fault Detection in IEEE P1687 Instrument Network," in North Atlantic Test Workshop, 2014, pp. 73-78.