# A Framework to Explore Workload-Specific Performance and Lifetime Trade-offs in Neuromorphic Computing Adarsha Balaji, Shihao Song, Anup Das, Nikil Dutt, Jeff Krichmar, Nagarajan Kandasamy, Francky Catthoor Abstract—Neuromorphic hardware with non-volatile memory (NVM) can implement machine learning workload in an energy-efficient manner. Unfortunately, certain NVMs such as phase change memory (PCM) require high voltages for correct operation. These voltages are supplied from an on-chip charge pump. If the charge pump is activated too frequently, its internal CMOS devices do not recover from stress, accelerating their aging and leading to negative bias temperature instability (NBTI) generated defects. Forcefully discharging the stressed charge pump can lower the aging rate of its CMOS devices, but makes the neuromorphic hardware unavailable to perform computations while its charge pump is being discharged. This negatively impacts performance such as latency and accuracy of the machine learning workload being executed. In this paper, we propose a novel framework to exploit workload-specific performance and lifetime trade-offs in neuromorphic computing. Our framework first extracts the precise times at which a charge pump in the hardware is activated to support neural computations within a workload. This timing information is then used with a characterized NBTI reliability model to estimate the charge pump's aging during the workload execution. We use our framework to evaluate workload-specific performance and reliability impacts of using 1) different SNN mapping strategies and 2) different charge pump discharge strategies. We show that our framework can be used by system designers to explore performance and reliability trade-offs early in the design of neuromorphic hardware such that appropriate reliability-oriented design margins can be set. Index Terms—Neuromorphic computing, Non-voltaile Memory (NVM), Phase-Change Memory (PCM), wear-out, Negative Bias Temperature Instability (NBTI), Spiking Neural Networks (SNNs), and Inter-Spike Interval (ISI). ## 1 Introduction And synapses to implement spiking neural networks (SNNs) [1]. Emerging non-volatile memory (NVM) cells organized into crossbars are used to store synaptic strengths. Certain NVMs such as phase-change memory (PCM) require high voltages ( $\sim 3V-5V$ ) to read and program synaptic strengths. These high voltages not only create reliability issues for NVM cells in a crossbar, but also for the internal CMOS devices of the on-chip charge pump [2], which generates these voltages. In this paper, we study one specific high voltage related reliability issue of a charge pump in the context of neuromorphic computing – that of threshold voltage ( $V_{\rm th}$ ) stress. If the charge pump is activated too frequently, its CMOS devices do not recover from stress, accelerating their *aging* and eventually leading to failures. Typically, a charge pump is several orders of magnitude larger than the size of a crossbar [2]. To mitigate this large size, system designers connect many crossbars to each charge pump. Therefore, charge pump failures are a critical bottleneck to the prolonged operation of a neuromorphic hardware. Redundant charge pumps can improve reliability but increases hardware area. To improve reliability, stressed charge pumps can also be forcefully discharged, where a discharge operation involves applying a low voltage to all CMOS devices in the charge pump. Once discharged, the charge pump requires several cycles to boost its voltage back, before it can safely be used to access NVM cells in a crossbar. During this interval, crossbars are unable to process spikes, introducing a spike propagation delay. This delay negatively impacts performance (such as latency and accuracy) of the SNN workload being executed [3]. Aging of a charge pump depends on how frequently NVM cells in the hardware are activated, which is due to spikes generated by the SNN workload being executed. We propose a novel framework that allows system designers to explore workload-specific trade-offs involving reliability, performance, and design cost, early in the design process such that appropriate reliability-oriented design margins can be set. Our framework incorporates the CARLsim simulator [4] to first extract the precise times of spikes in a SNN workload. We then use a characterized reliability model to estimate aging of charge pumps based on their activation times, which are influenced by the mapping of synapses to crossbars and the connectivity of crossbars to charge pumps in the hardware. We show that this framework can be integrated inside 1) design-time techniques, where neurons and synapses can be efficiently allocated to different crossbars, balancing aging of all charge-pumps, 2) run-time techniques, where stressed charge pumps can be forcefully discharged at appropriate intervals, minimizing A. Balaji, S. Song, A. Das, N. Kandasamy are with Drexel University, Philadelphia, PA, USA E-mail:anup.das@drexel.edu. N. Duti and J. Krichmar are with the Department of Computer Science, University of California, Irvine, CA, USA. F. Catthoor is with Imec, Belgium and KU Leuven, Belgium. Fig. 1: An illustration of a typical neuromorphic architecture and how SNNs are mapped to a crossbar in this architecture. their aging without significantly hurting performance, and 3) architectural techniques, where the number of charge pumps can be budgeted to achieve a target lifetime. #### 2 BACKGROUND AND MOTIVATION SNNs are networks of spiking neurons interconnected via synapses. A neuron fires a spike when its membrane voltage exceeds a threshold and subsequently the membrane voltage is reset. The moment of threshold crossing defines the *firing time*. SNNs can be used to implement many machine learning techniques. One example is the supervised approach, where a SNN is first *trained* with examples from the field and then used for *inference* with in-field data. Performance of supervised machine learning is measured in terms of *accuracy*, which is assessed from inter-spike intervals (ISIs) [5]. To define ISI, we let $\{t_1, t_2, \dots, t_K\}$ be a neuron's firing times in the time interval [0, T]. The average ISI of this spike train is given by [5]: $$\mathcal{I} = \sum_{i=2}^{K} (t_i - t_{i-1}) / (K - 1). \tag{1}$$ A neuromorphic hardware, shown in Figure 1(a), consists of 6 crossbars, three of which are connected to charge pump 1 and the remaining three to charge pump 2. All crossbars are interconnected using a time-shared interconnect. Figure 1(b) illustrates the mapping of an SNN to a crossbar. Synaptic weight $w_{13}$ is programmed on the NVM cell P1 and $w_{23}$ on P2. Output spike voltages $x_1$ from N1 and $x_2$ from N2 inject currents into the crossbar, which are obtained by multiplying a pre-synaptic neuron's output spike voltage with the NVM cell's conductance at the cross-point of the preand post-synaptic neurons (following Ohm's law). Current summations along columns are performed in parallel using Kirchhoffs current law, and implement the sums $\sum_j w_{ij} x_i$ , needed for forward propagation of neuron excitation $x_i$ . Figure 2(a) shows the spike train generated by N1 of Figure 1(b). Each spike injects current to read the conductance of the NVM cell P1. Figure 2(b) illustrates the charge pump's operating voltage to process this spike train. The charge pump is operated at 1.8V for the entire 60ms interval, boosting its voltage to 3V only to process spikes. Aging of the charge pump is 8.3 units (see Section 3 for aging computation) and the average ISI is 5.9ms (Equation 1). Figure 2(c) illustrates the charge pump's operating voltage when it is discharged to 1.2V after processing every spike and boosted again to 1.8V before processing the next. Once discharged, the crossbar becomes unavailable to process spikes, introducing latency in processing the (a) Example spike train from N1 of Figure 1(b). (b) Charge pump voltage to process the spike train. (c) Charge pump reset to 1.2V after processing every spike. Fig. 2: Illustrating the trade-off between charge pump aging and SNN performance, considering PCM crossbars. spike train. The average ISI increases to 7.4ms, compared to 5.9ms in Figure 2(b). *ISI deviation leads to accuracy loss* [3]. Frequently discharging the charge pump, however, reduces its aging to 7.1 units, compared to 8.3 units in Figure 2(b). This reduction in aging leads to an improvement of meantime-to-failure (MTTF) of the charge pump by an average 8.7%. Thus, *aging reduction improves a charge pump's lifetime*. ### 3 Proposed Workload-Aware Framework We first review NBTI, which is a dominant reliability issue in scaled technology nodes, and then present our proposed framework for PCM-based crossbars. We use characterized NBTI model [6]. Our framework can also be extended with minimal efforts to consider 1) any NBTI model, 2) other NVMs such as FeRAM and Flash, and 3) other reliability issues such as time dependent dielectric breakdown (TDDB), which is still the dominant one in older technology nodes. NBTI aging manifests as 1) decrease in drain current and transconductance, and 2) increase in off current and threshold voltage. NBTI aging is accelerated at high temperature and high oxide electric field. Recent works such as [6] suggest that NBTI is the collective response of two independent defects – the *as-grown hole traps* (AHTs) and *generated defects* (GDs). AHTs and a small proportion of GDs can be recovered by annealing at high temperatures if the NBTI stress voltage is removed. We focus on GDs, which contribute to permanent degradation of charge pumps. In fact, once introduced, GDs cannot be eliminated. Their effect can, however, be delayed by applying lower voltages (i.e., forcefully discharging stressed charge pumps). To formulate NBTI aging, we divide the SNN execution time [0,T] into m equal intervals $0=t_0< t_1\cdots < t_m=T$ , with $[t_i,t_{i+1})$ as the $(i+1)^{\text{th}}$ interval and $V_i$ is the charge pump's voltage in this interval. Reliability at the end of SNN execution can be expressed as $R(T)=e^{-\left(\sum_{i=0}^{m-1}G(V_i)\right)^{\beta}}$ , where $G(V_i)$ is the generated defect at voltage $V_i$ , expressed Fig. 3: Framework to evaluate aging of charge pumps. as power law, $G(V_i) = g_0 \cdot (V_i - V_{\text{th}})^m \cdot (t_{i+1} - t_i)^n$ and $\beta, g_0, m, n$ are material-dependent constants [6]. We define NBTI aging in a stressed charge pump as $$\mathcal{A} = \sum_{i=0}^{m-1} g_0 \cdot (V_i - V_{th})^m \cdot (t_{i+1} - t_i)^n, \text{ such that } R(T) = e^{-\mathcal{A}^{\beta}}.$$ (2) Here (2) assumes all synapses are mapped to the same crossbar, which is connected to a single charge pump. In practice, however, 1) synapses are distributed across different crossbars because a crossbar can accommodate only a limited number of synapses and 2) a neuromorphic hardware typically has more than one charge pump to limit the power supply load. We now describe how to extend (2) to incorporate these practical constraints. We consider the SNN $\mathcal{G}$ , with N neurons and S synapses, excited with an input over the time interval [0,T]. We arrange the spikes in this interval by synapses they excite as $$\mathcal{S} = \{\tau_1^1, \tau_2^1, \cdots, \tau_{k_1}^1\}, \{\tau_1^2, \tau_2^2, \cdots, \tau_{k_2}^2\}, \cdots, \{\tau_1^S, \tau_2^S, \cdots, \tau_{k_S}^S\}, (3)$$ where $\tau_j^s$ is the $j^{\text{th}}$ spike on $s^{\text{th}}$ synapse of the SNN. We introduce the following notation. $\mathcal{A}_s$ : aging to process spike train $\{\tau_1^s,\cdots,\tau_{k_s}^s\}$ on $s^{\text{th}}$ synapse C: number of crossbars L: number of charge pumps $\mathcal{M} \in \mathbb{R}^{S \times C}$ :synapse-to-crossbar mapping, such that $$m_{ij} \in \mathcal{M} = \begin{cases} 1 & \text{if synapse } i \text{ is mapped to crossbar } j \\ 0 & \text{otherwise} \end{cases}$$ (4) $\mathcal{P} \in \mathbb{R}^{C imes L}$ :crossbar-to-charge pump mapping, such that $$p_{jk} \in \mathcal{P} = \begin{cases} 1 & \text{if crossbar } j \text{ is powered by charge pump } k \\ 0 & \text{otherwise} \end{cases}$$ (5) Combining these two equations, we generate the synapseto-charge pump mapping as $$m_{ij} \cdot p_{jk} = \begin{cases} 1 & \text{if synapse } i \text{ is powered by charge pump } k \\ 0 & \text{otherwise} \end{cases}$$ (6) The total aging of charge pump k is therefore $$\operatorname{aging}_{k} = \sum_{i=1}^{S} \sum_{j=1}^{C} m_{ij} \cdot p_{jk} \cdot \mathcal{A}_{i} \tag{7}$$ **Proposed Framework** – Figure 3 illustrates our framework to evaluate aging of charge pumps in a neuromorphic hardware. We use CARLsim [4] to train SNN models. The output of CARLsim are the trained weights and the precise times of spikes on all synapses of the SNN $\mathcal{S}$ . A SNN mapping approach such as [3] uses CARLsim output to generate a synapse-to-crossbar mapping $\mathcal{M}$ , optimizing some objective function. In [3], the objective function is to minimize the number of spikes communicated between crossbars, which leads to lower energy and latency on the shared interconnect. Once the SNN is mapped to crossbars of the hardware, its performance is obtained in terms of the inter-spike interval $\mathcal{I}$ using (1). Using this synapse-to-crossbar and crossbar-to-charge pump mapping, our novel formulation in (7) evaluates the aging of all charge pumps in the hardware when executing an SNN workload. This design flow is shown using solid arrows. Figure 3 also illustrates three future directions based on this framework using dashed arrows. First, *Aging Evaluation*, as developed in (7), can be combined with the *SNN Mapping* step to generate an optimum mapping of the SNN to the hardware that balances aging of all charge pumps. This is shown by the dashed arrow labeled *aging-aware mapping*. Second, crossbar-to-charge pump mapping can be optimized to achieve a desired lifetime of charge pumps for executing the SNN. This is shown using the dashed arrow labeled *application-specific charge pump placement*. Third, strategies can be developed to discharge charge pumps at run-time, improving their lifetime. This is shown in the *Discharge Management* step. #### 4 EVALUATION RESULTS This section presents evaluation results using our framework. We use the neuromorphic hardware of Figure 1(a) to evaluate the following SNNs [3], [7], [8], [9]. | SNN | Synapses | Topology | Spikes | |-------------|-----------|--------------------------------------|-----------| | ImgSmooth | 136,314 | FeedForward (4096, 1024) | 17,600 | | EdgeDet | 272,628 | FeedForward (4096, 1024, 1024, 1024) | 22,780 | | MLP-MNIST | 79,400 | FeedForward (784, 100, 10) | 2,395,300 | | HeartEstm | 636,578 | Recurrent | 3,002,223 | | HeartClass | 2,396,521 | CNN <sup>1</sup> | 1,036,485 | | CNN-MNIST | 159,553 | CNN <sup>2</sup> | 97,585 | | LeNet-MNIST | 1,029,286 | CNN <sup>3</sup> | 165,997 | | LeNet-CIFAR | 2,136,560 | CNN <sup>4</sup> | 589,953 | - 1. (82x82) [Conv, Pool]\*16 [Conv, Pool]\*16 FC\*256 FC\*6 - <sup>2</sup> (24x24) [Conv, Pool]\*16 FC\*150 FC\*10 - <sup>3.</sup> (32x32) [Conv, Pool]\*6 [Conv, Pool]\*16 Conv\*120 FC\*84 <sup>4.</sup> (32x32x3) - [Conv, Pool]\*6 - [Conv, Pool]\*6 - FC\*84 - FC\*10 #### 4.1 Evaluating reliability of SNN mapping strategies We use our framework to evaluate two state-of-the-art SNN mapping strategies – SCO [10] and SpiNeMap [3], in terms of performance (measured as change in ISI) and reliability (measured as aging). Figure 4 illustrates the result of SCO, normalized to SpiNeMap. SCO, which balances crossbar utilization, has on average 16.4% lower aging (better lifetime) than SpiNeMap for these workloads. This is because SpiNeMap explicitly minimizes spike latency on the shared interconnect. To do so, some crossbars get more utilized than others. Heavily utilized crossbars activate charge pumps more frequently, causing their higher aging. Conversely, SpiNeMap has lower ISI change (higher performance). SCO has on average 21% higher change in ISI than SpiNeMap. From a performance perspective, SpiNeMap is better than SCO, while from a reliability perspective, SCO is better than SpiNeMap. ## 4.2 Discharging stressed charge pumps Figure 5 illustrates aging and ISI with discharge intervals of 10ms, 50ms, and 100ms for the evaluated SNN workloads, normalized to when charge pumps are stressed for the Fig. 4: Aging and ISI of SCO [10] vs. SpiNeMap [3]. (a) Aging for different discharge intervals normalized to the aging when charge pumps are not discharged. (b) ISI for different discharge intervals normalized to the ISI when charge pumps are not discharged. Fig. 5: Aging and ISI with different discharge intervals. entire execution duration. We make the following three key observations. First, aging is the lowest for discharge interval of 10ms, while ISI variation is the highest. This is because, with smaller discharge intervals, a charge pump's internal CMOS devices recover partially from stress and therefore, the rate of aging reduces improving lifetime. The performance is lower because of the delay introduced in frequent charge pump discharge. Second, when the discharge interval changes from 10ms to 100ms, aging increases, reducing charge pump's lifetime, and ISI variation reduces, improving application performance. Third, aging of charge pumps varies across different SNN workloads. For MLP-MNIST, aging increases by 10% when the discharge interval increases from 10ms to 100ms, while for LeNet-CIFAR, aging increases by a factor of 2 for the same range. This is because for MLP-MNIST, spikes are generated less frequently due to sparsity of synaptic weights. There is therefore, no significant variation in aging when charge pumps are discharged differently. The ISI variations are, however, due to delay of spike propagation when charge pumps are being discharged. We see no significant variations across different workloads. Our framework enables exploration of SNN workload-specific lifetime and performance trade-offs. #### 5 DISCUSSION AND FUTURE OUTLOOK Aging-related defects in charge pumps constitute a critical bottleneck to the prolonged operating lifetime of neuromorphic hardware. These defects are different from an NVM cell's endurance failures, which are due to repeated programming of the cell. In recent prototypes, e.g. [11], PCM endurance is in the order of $10^7$ cycles ( $\approx 4-5$ years lifetime). A charge pump's lifetime is $\approx$ 2-3 years operating at 3V supply. Impact-wise, aging issues in a neuromorphic hardware arise during inference (reading of synaptic weights) and training (update of synaptic weights) in supervised machine learning, while endurance issues arise only during training. To this end, we proposed a novel framework to evaluate SNN workload-specific lifetime and performance trade-offs in neuromorphic architectures. The framework incorporates the CARLsim simulator to extract the precise time of spike generation on all synapses of an SNN workload. Using this timing information, together with 1) synapse-to-crossbar mapping, and 2) crossbar-to-charge pump mapping, this framework evaluates aging of different charge pumps when executing an SNN workload. We use this framework to evaluate two state-of-the-art SNN mapping strategies in terms of performance and reliability. We also demonstrated lifetime and performance trade-offs by changing the charge pump's discharge interval. Our framework can also incorporate: 1) other SNN simulators such as Brian [12], and 2) other reliability issues such as electromigration [13]. #### **ACKNOWLEDGMENT** This work is supported by the National Science Foundation Award CCF-1937419 (RTML: Small: Design of System Software to Facilitate Real-Time Neuromorphic Computing). #### REFERENCES - [1] W. Maass, "Networks of spiking neurons: the third generation of neural network models," *Neural networks*, vol. 10, no. 9, pp. 1659– 1671, 1997. - [2] B. Shen and M. L. Johnston, "Zero reversion loss, high-ffficiency sharge pump for wide output current load range," in *Symposium* on circuits and systems, 2018, pp. 1–5. - [3] A. Balaji, A. Das, Y. Wu, K. Huynh, F. Dellanna, G. Indiveri, J. L. Krichmar, N. Dutt, S. Schaafsma, and F. Catthoor, "Mapping spiking neural networks to neuromorphic hardware," in *Transactions on very large scale integration (VLSI) systems*, 2019. - [4] T. Chou, H. J. Kashyap, J. Xing, S. Listopad, E. L. Rounds, M. Beyeler, N. Dutt, and J. L. Krichmar, "CARLsim 4: An open source library for large scale, biologically detailed spiking neural network simulation using heterogeneous clusters," in *International* joint conference on neural networks (IJCNN), 2018, pp. 1–8. - [5] S. Grün and S. Rotter, Analysis of parallel spike trains, 2010, vol. 7. - [6] R. Gao, Z. Ji, A. B. Manut, J. F. Zhang, J. Franco, S. W. M. Hatta, W. D. Zhang, B. Kaczer, D. Linten, and G. Groeseneken, "NBTI-generated defects in nanoscaled devices: fast characterization methodology and modeling," *Transactions on electron devices*, vol. 64, no. 10, pp. 4011–4017, 2017. - [7] MLPerf: Fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services. https: //mlperf.org/training-overview/overview. - [8] A. K. Das, F. Catthoor, and S. Schaafsma, "Heartbeat classification in wearables using multi-layer perceptron and time-frequency joint distribution of ECG," in Conference on connected health: Applications, systems and engineering technologies, 2018, pp. 69–74. - [9] A. Das, P. Pradhapan, W. Groenendaal, P. Adiraju, R. Rajan, F. Catthoor, S. Schaafsma, J. Krichmar, N. Dutt, and C. Van Hoof, "Unsupervised heart-rate estimation in wearables with Liquid states and a probabilistic readout," *Neural networks*, vol. 99, 2018. - [10] M. K. F. Lee, Y. Cui, T. Somu, T. Luo, J. Zhou, W. T. Tang, W.-F. Wong, and R. S. M. Goh, "A system-level simulator for RRAM-based neuromorphic computing chips," *Transactions on architecture and code optimization*, vol. 15, no. 4, p. 64, 2019. - [11] Z. Song, D. Cai, X. Li, L. Wang, Y. Chen, H. Chen, Q. Wang, Y. Zhan, and M. Ji, "High endurance phase change memory chip implemented based on carbon-doped Ge2Sb2Te5 in 40 nm node for embedded application," in *International electron devices meeting*, 2018, pp. 27–5. - [12] D. F. Goodman and R. Brette, "The Brian Simulator," Frontiers in Neuroscience, vol. 3, p. 26, 2009. [13] A. Das, A. Kumar, and B. Veeravalli, "Aging-aware hardware-software task partitioning for reliable reconfigurable multiprocessor systems," in *Conference on Compilers, Architectures and Synthesis for Embedded Systems*, 2013, pp. 1:1–1:10.