Elsevier

Integration

Volume 59, September 2017, Pages 168-178
Integration

Refresh re-use based transparent test for detection of in-field permanent faults in DRAMs

https://doi.org/10.1016/j.vlsi.2017.06.011Get rights and content

Highlights

  • Both Refresh operations and March operations for DRAM involve periodic read and write operations.

  • Refresh Reuse for test purpose ensures periodic testing of DRAM and prevention of permanent faults.

  • Reusing the refresh circuit of test overcomes overhead of Design-for-Testability hardware.

Abstract

In this paper, a transparent test technique for testing permanent faults developed during field operation of DRAMs has been proposed. A three pronged approach has been taken in this work. First, a word oriented transparent March test generation algorithm has been proposed that avoids signature based prediction phase; next the proposed transparent March test is structured in a way that facilitates its implementation during refresh cycles of the DRAM; finally the on-chip refresh circuit is modified to allow its re-use during implementation of the proposed transparent March test on DRAM. Re-use of refresh cycles for test purpose ensures periodic testing of DRAM without interruption. Thus, faults are not allowed to accumulate. Moreover, wait for idle cycles of the processor to perform the test are avoided and test finishes within a definite time. Re-using the refresh circuit for test purpose overcomes requirement of additional Design-For-Testability hardware and brings down the area overhead.

Both analytic predictions and simulation results for the method proposed here indicate real estate benefits and test time savings in comparison to other reported techniques. The proposed refresh re-use based transparent test technique provides a cost effective solution by providing facility for periodic tests of DRAM without requiring additional test hardware.

Introduction

The run-time faults which occur in Dynamic Random Access Memories (DRAMs) are either transient or intermittent [1]. A lot of research has been devoted in devising efficient run time fault detection techniques for DRAMs. The detection techniques of these run-time faults reported in literature have mainly focussed on detection of transient faults (soft errors). However, for deeply scaled CMOS based DRAMs, the occasional run-time faults in DRAMs which are a result of physical effects such as environmental susceptibility, aging and low supply voltage, are intermittent faults [2]. These intermittent faults usually exhibit a relatively high occurrence rate and eventually tend to become permanent [2]. Moreover, wearout of DRAM can also cause intermittent faults to become frequent enough to be classified as permanent [3].

Studies on DRAM failures in field ([4], [5], [6]) provided evidence that DRAMs experience both transient (soft) faults and permanent (hard) faults in field. Thus, for a DRAM based system which develops both soft errors and hard faults during in-field operation, using only software based detection mechanisms (memory diagnostic software programs used to check for memory failures on a computer) such as ECC [7], Chipkill [8] or memory scrubbers [9] may not be sufficient as suggested by the results in [10]. Sridharan et al. in [10] reported that a commonly used ECC technique such as SEC-DED ECC results in undetected errors (causing silent data corruption) at a rate of up to 20 FIT per DRAM device (unacceptably high rate for many enterprise data centers and high-performance computing systems), thus making it poorly suited to modern DRAM subsystems. Memory scrubbing with ECC may be an alternative. However, there are a few drawbacks in implementing scrubbers (in both hardware and software) for DRAM testing as mentioned in [11]. Implementing scrubber as state machine in memory controller increases the hardware complexity and causes performance penalty as memory becomes inaccessible during the scrubbing period. Software implementation requires generation of interrupt to activate a firmware which executes on the processor to perform scrubbing. However, due to limited number of interrupt request signals or vectors in some systems, an interrupt for scrubbing often is not available. As result, the software based solution is often infeasable.

Thus, for a system which develops both soft errors and hard faults during in-field operation, the most preferred solution is to apply a permanent fault detection technique in conjunction with soft error detection schemes such as ECC to cover both hard and soft faults that arise during in-field operation. The characteristics of the permanent fault detection technique should be as follows.

  • The test must be an active test so that functional defects are uncovered.

  • The test must be performed periodically (similar to memory scrubbing) to ensure that no fault gets accumulated

  • However, unlike scrubbing, the test hardware must be cost effective.

In this paper, we propose a refresh re-use based test technique for detection of permanent faults that provides a cost effective solution by supporting periodic tests of DRAM without requiring additional test hardware.

Refresh operations require reading the contents of a memory location and writing them back to the same location. March tests [12] for detecting functional faults in memories also require writing some patterns in to the memory and reading them back. There is a similarity in the operations performed on the memory during both refresh and word-oriented transparent March test. The manner in which the operations need to be performed are also similar. Both require scanning the entire memory row by row and performing read followed by write operation on each row. These similarities in refresh and word oriented transparent March test further motivated us to re-use the refresh circuit for test purpose. Moreover, DRAM is refreshed periodically. Thus tests performed with refresh will also be periodical and will prevent fault accumulation. Further, utilizing the refresh circuit for test purpose overcomes additional DFT overhead due to test circuitry.

Refresh circuit has earlier been used for detection of errors as reported in [13], [14]. However, both [13] and [14] utilize refresh circuit for detection of soft errors requiring signature computation of the memory. Such soft error detection techniques fail to perform active test and hence are not suitable for latent hard failures. The authors of [13], [14] suggest that their architecture can be used for production test using any test algorithm. However, there are problems of using the signature based scheme. Firstly, the approach does not support active error detection. Secondly, the area overhead of storing the signatures. Thirdly, the problem of aliasing associated with any signature based scheme. In [15], the authors propose a transparent on-line memory test (TOMT) for word-oriented memories for detection of soft errors as well as functional faults. The proposed TOMT technique uses transparent March tests with check bits instead of computing signatures. However, there are two major drawbacks with the work in [15]. First, the TOMT algorithm is not time efficient as it executes bit-wise manipulation to obtain the word-oriented transparent test. Secondly, the hardware implementation of the proposed TOMT algorithm incurs an area overhead proportional to the size of the memory. Naturally, with increase in size of memory the area overhead of the test circuit for the TOMT algorithm increases.

The runtime permanent faults considered in this work are assumed to be intermittent faults which have become permanent over time. Consequently, the fault models considered in this paper are that of intermittent faults. The factors which lead to intermittent faults are variations of temperature, voltage and aging effects such as Time Dependent Dielectric Breakdown (TDDB), Electro-migration, Negative bias temperature instability (NBI) and hot carrier injection (HCI) [16]. TDDB causes degradation of MOSFET oxide leading to gate shorts while Electro-migration reduces interconnect conductivity with passage of time and leads to open circuit [16]. Short circuits and open circuits are modelled as stuck-at-fault and stuck-open-faults respectively. However, stuck-open faults behave as stuck-at-faults in DRAM as cells are not implemented as bistable elements [15]. Negative bias temperature instability (NBI) and hot carrier injection (HCI) [16] bring in read and write failures. We model these read and write failures as read disturb fault and write disturb faults respectively. Since DRAMs refresh data after every read operation, read disturb faults are less likely to occur in DRAMs. Thus, we consider only write disturb faults for our work. The other DRAM faults considered in this work are the coupling faults [12]. To summarize, the target fault models considered for this work are stuck-at faults (SAF), coupling faults (CF) and write disturb fault (WDF). Detailed description of these faults can be found in [12].

The rest of the paper is organized as follows. The next section discusses the proposed transparent test technique for DRAMs detailing the steps to convert a March C- test to Transparent March C- test and then modifying it to suit its application during refresh. Section 3 presents the implementation of the proposed test technique during refresh of DRAMs and its hardware implementation is illustrated in Section 4. The experimental results and analysis is presented in Section 5. Finally, section VI concludes the paper.

Section snippets

Proposed transparent test generation technique for DRAMs without ECC

March tests are the most widely accepted tests for detection of permanent faults due to their high fault coverage and linear relation of their test time with respect to the memory size [17]. A March instruction consists of sequence of operations applied to each cell before proceeding to the next cell. An operation can be reading or writing of 0 or 1. Application of March tests involves writing patterns into the memory and reading them back. As a result, the memory contents are destroyed.

Review of DRAM refresh

In Dynamic RAMs, the values stored in each bit cell are not stable. Over time, leakage currents cause the charge stored on the capacitor to drain away and be lost. To prevent the contents of a DRAM from being lost, the DRAM must be refreshed. Refresh is the process of recharging, or re-energizing the cells in a DRAM. Cells are refreshed one row at a time (usually one row per refresh cycle). Refresh cycle refers to the time required to refresh one row. For a DRAM array to operate correctly, all

Hardware overhead

The proposed BIST architecture implementation was described in Verilog. Then it was synthesized on a commercial 90 nm standard cell faraday library and the area of the synthesized BIST architecture was estimated. The DRAM considered was of size 4 M × 16 with refresh time of 16 ms. The considered DRAM requires 4096 refresh cycles to refresh all rows within the refresh time at a refresh cycle time of 130 ns [20].

As shown in Fig. 6 and as mentioned in the previous section, during normal operation of

Conclusion

The MTMC Test proposed in this paper has been shown to be a cost effective technique that can detect errors developed in DRAMs during field operation. We have re-used the refresh cycles of DRAM to act as test cycles while performing the MTMC test on a DRAM. By re-using the refresh circuit for testing, we have shown that our proposed BIST architecture for DRAM exhibits some advantages over other transparent techniques and also conventional memory BIST architectures reported in literature without

References (26)

  • V. Sridharan, D. Liberty, A study of DRAM failures in the field, in: Proceedings of the International Conference for...
  • A. Bondavalli et al.

    Threshold-based mechanisms to discriminate transient from intermittent faults

    IEEE Trans. Comput.

    (1998)
  • C. Constantinescu, Impact of Intermittent Faults on Nanocomputing Devices, in DSN 2007 Workshop on Dependable and...
  • A.A. Hwang, I. Stefanovici, B. Schroeder, Cosmic rays don't strike twice: understanding the nature of DRAM errors and...
  • V. Sridharan, J. Stearley, N. DeBardeleben, S. Blanchard, S. Gurumurthi, Feng Shui of supercomputer memory: positional...
  • A.B.S.G.T. Siddiqua, A. Papathanasiou, Analysis and Modeling of Memory Errors from Large-Scale Field Data Collection in...
  • M. Spica, T.M. Mak, Do We Need Anything More Than Single Bit Error Correction (ECC)? In Records of the International...
  • T.J. Dell, A White Paper on the Benefits of Chipkill-Correct ECC for PC Server Main Memory. IBM Microelectronics...
  • S. Mukherjee, J. Emer, T. Fossum, S.K. Reinhardt, Cache scrubbing in microprocessors: myth or necessity? In:...
  • V. Sridharan, N. DeBardeleben, S. Blanchard, K.B. Ferreira, J. Stearley, J. Shalf, S. Gurumurthi, Memory errors in...
  • G. Hayek, R. Venkataraman, J. Ajanovic, Time-distributed ecc scrubbing to correct memory errors, Nov. 2 1999, US Patent...
  • M. Bushnell et al.

    Essentials of Electronic Testing for Digital, Memory, and Mixed-Signal VLSI Circuits, ser. Frontiers in Electronic Testing

    (2000)
  • S. Hellebrand, H.-J. Wunderlich, A. Ivaniuk, Y. Klimets, V. Yarmolik, Error detecting refreshment for embedded DRAMs,...
  • Cited by (1)

    View full text