# On the Development of Prognostics and System Health Management (PHM) Techniques for ReRAM Applications

Jose Cayo Dept. of Electronic Engineering Universidad Tecnica Federico Santa Maria Valparaiso, Chile jose.cayo.14@sansano.usm.cl Matias Melivilu Dept. of Electronic Engineering Universidad Tecnica Federico Santa Maria Valparaiso, Chile Antonio Rubio Dept. of Electronic Engineering Universitat Politècnica de Catalunya Barcelona, Spain Ioannis Vourkas Dept. of Electronic Engineering Universidad Tecnica Federico Santa Maria Valparaiso, Chile ioannis.vourkas@usm.cl

Abstract—The resistive switching (RS) technology has many promising applications, but the inherent variability of RS devices has been an important obstacle for the progress towards mass production. Nonidealities of device switching performance have been widely modeled so far, and device degradation has been addressed through testing for fault diagnosis. However, online soft-error "prognosis" concerning both the progressive degradation and transition faults, has been given little consideration. In this direction, we present preliminary results towards the development of prognostics and system health management (PHM) techniques for resistive memory (ReRAM) applications. We propose addressing soft errors through a rich in context scheme used to encode binary information in form of resistance. In out simulations we assumed a ReRAM driver with multi-level READ capability and developed an enhanced progressive feedback-WRITE scheme to ensure not only successful WRITE and reliable READ operations, but also to permit the early online prognosis of potential device failure. Preliminary system-level simulation results validate the expected functionality and represent a reasonable approach towards the design of robust ReRAM controllers.

# Keywords—ARCOne; memristor; resistive switching; resistive RAM; ReRAM; memory controller; Knowm; SDC; Variability;

# I. INTRODUCTION

The resistive switching (RS) device technology [1] is promising for a variety of emerging applications, including nonvolatile storage and unconventional computing [2]. In memory modules composed of RS devices, usually called resistive random-access memories (ReRAM) [3], each cell of the memory array has an RS device for binary information storage using low resistive state (LRS) as logic "1" and high resistive state (HRS) as logic "0" (or vice versa.) The transition from HRS to LRS is called SET, whereas the opposite is called RESET, and they are caused by the application of voltage or current WRITE pulses that exceed a certain threshold [4]. Great progress has been noted in the research and development of RS devices but the inherent variability [5] of such technology has hindered mass production of ReRAM modules [6].

As an important feature to be considered in realistic circuit design efforts, the nonidealities of RS device performance have been included in device models [7], and device degradation has been addressed through fault detection and diagnosis [8], [9]. A typical fix for non-recoverable device faults in a ReRAM module would be to use redundant rows and columns in the crossbar array (the core of a ReRAM

module [10]), such that columns with defective cross-point devices can be skipped and replaced with the redundant ones. Nevertheless, such solution causes underutilization of memory capacity and has a significant area overhead. On the other hand, methods for the early online "prognosis" of softerrors, concerning both the progressive performance degradation and the incomplete SET/RESET transitions ("transition faults,") could be an interesting approach to consider, but has been given little consideration so far.

In practical ReRAM applications the memory driver must WRITE information correctly and distinguish reliably the HRS from LRS states. High WRITE pulse amplitudes have a higher probability to cause a successful switching but could cause hard device errors (such as "stuck-at-LRS/HRS faults" [11]) and thus should be avoided. On the other hand, low WRITE pulse amplitudes could lead to incomplete SET/RESET transitions. Moreover, due to the "fading memory" property [12], a progressive degradation in the switching performance can be observed. Given the above, it is valid to assume that it is not possible to achieve a reliable WRITE operation with a single applied pulse. On the contrary, the WRITE phase should include a subsequent READ pulse to ensure that the data was written correctly, and a previous READ phase as well, to not apply a RESET pulse to a device already in HRS, nor a SET pulse to a device already in LRS. Thus, read-monitored progressive WRITE schemes have reasonably emerged for practical ReRAM applications [13].

In this work we elaborate on a read-monitored progressive WRITE scheme introducing prognostics and system health management (PHM) techniques applicable to ReRAM. More specifically, we propose addressing soft errors through a rich in context scheme used to encode binary information in form of resistance. In out simulations we assumed a ReRAM driver with multi-level READ capability and propose a feedback-WRITE scheme to ensure not only successful WRITE and reliable READ operations, but also to permit the early online prognosis of potential device failure. Preliminary system-level simulation results validate the expected functionality and represent a reasonable approach towards the design of robust and more energy efficient ReRAM controllers.

# II. VARIABILITY IN RS devices and $WRITE\,\mbox{failure}$

In order to showcase practical examples of variability as well as transient dynamics during periodic SET-RESET events that can lead to soft errors, we present experimental results from the characterization of a  $2 \times 2$  passive (selector-less) crossbar array of Self-Directed Channel (SDC) RS devices [14] commercialized by *Knowm Inc.* [15]. All devices



Fig. 1 Plots of the resistance evolution of four SDC devices in a 2×2 crossbar topology, at different moments during the application of positive and negative pulse trains for SET and RESET, respectively. Notation W#B# indicate the electrodes of the *Knowm* chip used to access the array. In all plots, black dots correspond to WRITE failure, and the average value corresponds to the average of HRS and LRS achieved in every SET-RESET pulse sequence of a cycle.



Fig. 2 Plots with the histograms representing the distribution of the resistance of four SDC devices in a  $2 \times 2$  crossbar topology, during the application of positive and negative pulse trains for SET and RESET, respectively. The same notation W#B# is used, as in Fig. 1. For every device we show the distribution of the HRS, the LRS, and the failed WRITE operations, concerning mostly failed RESET attempts. There is no overlapping of the HRS and LRS distributions.

were submitted to an identical forming stage [16] and then were RESET to HRS levels via manual pulsing. Afterwards, a sequence of pulses was applied which consisted of 10 consecutive SET pulses of 0.4V amplitude and 1ms width, followed by 10 RESET pulses of 0.8V amplitude and 1ms width. Such sequence of applied pulses was repeated 300 times, thus a total of 6000 pulses were applied to every device of the array. Intermediate read pulses had 0.15V amplitude (below the SET threshold voltage to not disturb the stored device state) and were 1ms wide. All measurements were performed using the ARCOne instrumentation tool, a microcontroller-based automated tool for ReRAM crossbar characterization, commercialized by *ARC Instruments* [17]. The reason why a series of pulses were applied instead of just one, was to increase the possibilities of a successful SET and RESET in every switching cycle. Fig. 1 shows a timestamp of the experiment, to highlight the performance of each cross-point device. The exported data from *ARCOne* were interpreted as follows: during every series of positive (or negative) pulses, a SET (or RESET) was evaluated as successful when a ratio of HRS/LRS  $\geq$  10 was achieved. Otherwise, it was registered as a WRITE failure. By observing the results taken from the devices of the same crossbar array, we notice a considerable variability regarding the range of resistances, especially for the HRS state. Most importantly, we note several failed WRITE operations in some devices; i.e.,



cases where a particular pulse was unable to trigger a SET (or RESET). For clarity, we show the histograms of the collected data for the entire series of the experiment in Fig. 2. We observe that the cross-point device in position W8B8 not only had several incomplete switching events, but it also demonstrated serious degradation effects as it was temporarily kept stuck-at-LRS. Such results underline the importance of intelligent monitoring and early prognosis of potential failures of devices, as well as of potentially erroneous READ operations during normal memory operation.

## III. PROGNOSTICS & SYSTEM HEALTH MANAGEMENT (PHM) FOR RERAM

We assume that read-monitored progressive WRITE schemes represent a viable approach towards practical ReRAM modules. In this context, given that high-voltage pulses are generally avoided to minimize the possibility of hard device errors, we focus on soft-error modeling, concerning both transition faults (incomplete SET/RESET) as well as the "fading memory" effect, which are more probable to occur and should be properly addressed by the memory controller.

To this end, we adopted a variability injection scheme for the RS device model parameters [18], [19] such as the switching thresholds ( $V_{\text{SET}} \& V_{\text{RESET}}$ ) or the switching rate, so that transition errors can be simulated. Using the behavioral RS device model of [20], under the same applied input voltage, owing to variability, the final resistance of the device can differ during WRITE operations in simulation. Moreover, we incorporated the "fading memory" phenomenon at circuit level using a resistive voltage divider. Fading memory is caused by the asymmetry of the SET and RESET dynamics which lead to a progressive net increase (or decrease) in resistance after every SET-RESET cycle. Such behavior can be simulated using a series parasitic resistor, which we assume to be integral part of the device model, as shown in Fig. 3.

In simulations we confirmed the occurrence of incomplete transitions and the progressive degradation of switching performance due to the fading memory property. We used the PyLTSpice library to access netlist files and run simulations in LTSpice controlled by Python scripts. RS device model parameters were set as follows: Ron =  $100\Omega$ , Roff =  $100k\Omega$ , V<sub>SET</sub> = 0,3V, V<sub>RESET</sub> = -0,3V, whereas the parameters for the switching rate were set accordingly to achieve SET/RESET transitions of 20ns. The parasitic resistance value R<sub>P</sub> was set at  $2k\Omega$ . READ operations assumed a voltage divider as well (see Fig. 3) with a series resistor R<sub>read</sub> of  $50k\Omega$ , so the voltage measured at the intermediate note was compared with a series



of threshold values (see Fig. 4) to interpret the stored information as HRS or LRS resistance.

In the context of online soft-error prognosis of progressive degradation and transition faults, here we present a first approach towards the development of prognostics and system health management (PHM) techniques developed ad-hoc for ReRAM applications. In this direction, we propose using the information encoding scheme shown in Fig. 4. While typical methods to distinguish HRS from LRS have considered using either a single threshold or an undefined state in-between the HRS and LRS resistance regions, in this approach we assume both strong and weak HRS and LRS states in a richer encoding scheme. Given the resistance distribution of Fig. 4, it is assumed that, in every READ process the system checks if the resistance level of the device escapes the distribution originally expected. For this reason, a multi-state reading capacity is required. Fig. 4 shows example of how the full range of device resistance can be divided into the required zones and intermediate guard-bands. Consequently, although the system during its operation works only with 1 bit of information, and therefore with only 2 value-ranges of resistance, it is capable of multi-level READ operations which allow detecting states out of the desired regions (e.g., detect a weak HRS instead of a strong HRS) to act accordingly.

Considering the abovementioned information encoding scheme, we elaborated on a feedback-WRITE & READ scheme to permit the early online prognosis of potential device failure. Next, we present the preliminary simulation results.

#### IV. SIMULATION RESULTS

The purpose of our enhanced feedback WRITE & READ scheme is two-fold: Regarding WRITE operations, a following READ pulse will verify whether the WRITE was successful and in case of incomplete transitions, the WRITE pulse is applied again, taking the device to the expected strong resistance range. Likewise, regarding READ operations, they will detect any possible state drift and trigger accordingly WRITE operations to restore the device state. Nevertheless, the dynamic modification/adaptation of the thresholds that designate the undefined region is additionally explored here to further improve throughput of ReRAM READ operations.

More specifically, we designed an algorithm that performs randomly SET and RESET operations to devices of the array to automatically detect the minimum HRS and the maximum LRS measured, which are used to designate the default undefined region. Within the HRS and LRS regions, they are divided in weak and strong regions in a manner proportional to the concentration of the collected READ data. In



Fig. 5 LTSpice simulation results showing the evolution of the resistance of an RS device under the input voltage pulses shown at the bottom. Data correspond to calculated resistance values during READ pulses. Annotations highlight the dynamic modification of the undefined region and other details of interest.

simulation, we conducted this step while activating solely the variability of the model parameters. Next, the fading memory feature was also activated to test the efficacy of the algorithm.

A characteristic example of simulation results is shown in Fig. 5. A series of SET-RESET cycles were performed on a target RS device using the model of [20]. The plots show the evolution of the resistance values, calculated during the READ pulses. Horizontal lines define the strong/weak and the undefined region in the resistance plot. We can observe that the designed algorithm correctly activates additional WRITE pulses whenever an incomplete transition occurred. However, we also established an additional control that checks whether the READ value is very close to the limits of the undefined region. In such case, the controller dynamically modifies the default values of the thresholds corresponding to the undefined region, such that posterior similar cases will be READ correctly at once and will not be considered as incomplete transitions. For instance, we observe that the upper limit of the undefined region was decreased twice during this simulation. Consequently, in the region after 500ns some RESET pulses were successful and did not trigger additional WRITE events, as it would have happened without the previous threshold modifications. Such results demonstrate that adaptive READ processes could benefit memory operations in ReRAM arrays, leading to shorter READ and WRITE cycles which indirectly benefit not only the energy consumption but also the RS device endurance.

#### V. CONCLUSIONS

This work introduced prognostics and system health management techniques applicable to ReRAM technology. An enhanced feedback WRITE & READ algorithm was applied correctly in simulation results, based on a rich in context binary information encoding scheme. The adaptive READ phase was suggested for more energy efficient and robust ReRAM controller units in future applications.

# REFERENCES

- I. Valov, "Interfacial interactions and their impact on redox-based resistive switching memories (ReRAMs)," *Semicond. Sci. Technol.* vol. 32, no. 093006, 2017
- [2] D. Ielmini, H.-S. Philip Wong, "In-memory computing with resistive switching devices," *Nat. Electron.*, vol. 1, pp. 333–343, 13 June 2018

- [3] T.-C. Chang, *et al.*, "Resistance random access memory," *Mater. Today*, vol. 19, no. 5, pp. 254–264, Jun. 2016
- [4] H. Lv, et al., "Voltage driving or current driving: Which is preferred for RRAM programming?," 2011 Int. Symposium on VLSI Technology, Systems and Applications, Hsinchu, Taiwan, Apr. 25-27
- [5] J. Roldán, et al., "Variability in Resistive Memories," Adv. Intell. Syst., no. 2200338, 14 March 2023
- [6] WeebitNano, "A Quantum Leap in Emerging Memory Technology," [Online]. Available: <u>https://www.weebit-nano.com/technology</u>
- [7] C. Bengel, et al., "Variability-Aware Modeling of Filamentary Oxide-Based Bipolar Resistive Switching Cells Using SPICE Level Compact Models," *IEEE Transactions on Circuits and Systems—I: Regular Papers*, vol. 67, no. 12, pp. 4618-4630, 2020
- [8] T. N. Kumar, et al., "Operational Fault Detection and Monitoring of a Memristor-Based LUT," 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, Mar. 09-13
- [9] P. Liu, et al., "Fault Modeling and Efficient Testing of Memristor-Based Memory," *IEEE Trans. on Circ. & Syst.—I: Reg. Papers*, vol. 68, no. 11, pp. 4444-4455, 2021
- [10] A. Flocke and T. G. Noll, "Fundamental analysis of resistive nanocrossbars for the use in hybrid Nano/CMOS-memory," 33rd Eur. Solid-State Circuits Conf., Munich, Germany, Sep. 2007, pp. 328–331
- [11] M. Escudero-Lopez, I. Vourkas, and A. Rubio, "Stuck-at-OFF Fault Analysis in Memristor-Based Architecture for Synchronization," 2019 *IEEE Int. Symp. on On-Line Testing and Robust System Design* (*IOLTS*), Rhodes Island, Greece, July 1-3
- [12] A. Ascoli, et al., "History Erase Effect in a Non-Volatile Memristor," IEEE Transactions on Circuits and Systems—I: Regular Papers, vol. 63, no. 3, March 2016
- [13] L. Gao, P.-Y. Chen, S. Yu, "Programming Protocol Optimization for Analog Weight Tuning in Resistive Memories," *IEEE Electron Device Letters*, vol. 36, no. 11, pp. 1157-1159, 2015
- [14] K. A. Campbell, "Self-directed channel memristor for high temperature operation," *Microelectron. J.*, vol. 59, pp. 10-14, Jan. 2017
- [15] Knowm Inc., [Online]. Available: https://knowm.org
- [16] T. Wang et al. "Electroforming in Metal-Oxide Memristive Synapses", ACS Appl. Mater. Interfaces, vol. 12, pp. 11806-11814, Feb. 2020
- [17] ArC Instruments. High Performance Array Instruments. Accessed: July 16, 2020. [Online]. Available: <u>http://www.arc-instruments.co.uk</u>
- [18] J. Cayo, I. Vourkas, and A. Rubio, "A Circuit-Level SPICE Modeling Strategy for the Simulation of Behavioral Variability in ReRAM," 2022 IFIP/IEEE Int. Conference on Very Large Scale Integration (VLSI-SoC), Patras, Greece, Oct. 03-05
- [19] J. Cayo, et al., "A Comprehensive Simulation Framework to Validate Progressive Read-Monitored Write Schemes for ReRAM," 2023 Spanish Conf. on Electron Devices (CDE), Valencia, Spain, June 6-8
- [20] Y. Pershin and M. Di Ventra "SPICE model of memristive devices with threshold," *Radioengineering*, vol. 22, no. 2, pp. 485–489, 2013