# Fault-Tolerant Nanoscale Architecture based on Linear Threshold Gates with Redundancy Nivard Aymerich and Antonio Rubio Department of Electronic Engineering Universitat Politècnica de Catalunya (UPC Barcelona Tech) Barcelona, Spain nivard@eel.upc.edu, antonio.rubio@upc.edu Abstract—One of the main objectives of the data computing and memory industry is to keep and ever accelerate the increase of component density reached in nowadays integrated circuits in future technologies based on ultimate CMOS and new emerging research devices. The worldwide-accepted predictions with these technologies indicate a remarkable reduction of the components quality, because of the manufacturing process complexity and the erratic behavior of devices, causing a drop in the system reliability if we maintain the same design rules than today. Together with the introduction of new devices, new architectural design paradigms have to be included. Fault tolerant techniques are considered necessary and relevant in this scenario. In this paper we present a Fault-Tolerant Nanoscale architecture based on the implementation of logic systems with averaging cells linear threshold gates (AC-LTG). The sensitivity of the gates in relation with manufacturing and environment deviation is investigated and compared with the well known NAND multiplexing concept, showing that the AC-LTG is a valuable alternative in specific nanoscale conditions. Index Terms—NAND multiplexing, Averaging Cell, Linear Threshold Gate, Fault Tolerance, Reliability, Nanotechnology ### I. INTRODUCTION THE successful development in semiconductor industry during the last decades has led to the production of increasingly smaller devices reaching as a consequence very high densities of device integration. Current microprocessors have more than 109 transistors with feature sizes around 32nm. This fast trend of evolution is expected to continue in the coming years thanks to the emergence of new devices that promise major advantages over the latest generations of CMOS technology. Smaller devices which consume less power and provide higher performance capabilities. Some of the most representative candidates are carbon nanotubes (CNT), nanowire, single electron tunneling (SET) and resistive switching devices [1]. Along with these important advantages, new emerging devices as well as ultimate CMOS generations are expected to have associated higher levels of variability and ratios of defective devices. As a consequence, consideration of system reliability in nanoscale digital implementations is becoming a major concern at present. One of the first proposals to build reliable systems from unreliable components was the Von Neumann's NAND multiplexing technique [2]. Since its publication, in 1950's, many different alternative techniques based on redundancy have appeared. Some of them provide better performance at lower levels of device reliability [3]. Others, despite being much more difficult to implement and only useful for permanent failures, achieve better performance regardless of the failure rate [4]. Notwithstanding all these contributions, NAND multiplexing has been an accepted benchmark for reliability comparison between fault-tolerant techniques. It shows some advantages when considering possible trade-off. In this paper we suggest a new fault-tolerant architecture based on redundancy. Our proposal combines the concepts of Averaging Cell (AC) [5] and Linear Threshold Gate (LTG) [6]. Both configurations share the same structure of weighting average followed by a threshold operation, as a result, the extension of LTGs with the AC takes full advantage of its natural capabilities and does not require additional effort in design stages. Section II analyzes the performance of this new approach in terms of mitigation of variability. In section III we compare the new proposed fault-tolerant architecture with the well known NAND multiplexing technique. Finally, some conclusions are drawn from the obtained results. ## II. REDUNDANT LINEAR THRESHOLD GATES The AC-LTG combination has already been proposed as a way to implement fault-tolerant computing architectures [7]. This strategy is revisited in this paper with a detailed study of its tolerance in front of many sources of variability. #### A. AC-LTG Mathematical Model Every logical operation implemented by an AC-LTG can be expressed by the following equations: $$\hat{y} = sign\left(\sum_{i=1}^{M} w_i x_i - T\right) \tag{1}$$ $$x_i = \sum_{j=1}^{N} c_j x_i^j \qquad \forall i \in (1 \dots M)$$ (2) Where M is the number of binary variables $x_i$ composing the Boolean function, N is the number of available replicas of each input and $T(=t\cdot V)$ is the threshold decision level. Weights $W=(w_1,w_2,\ldots,w_M)$ define the specific synthesized Boolean function whereas $c_j$ implement the average of each input bundle (usually $c_j=1/N,\ \forall j\in (1\ldots N)$ ). 1 Figure 1 shows the generic scheme of a M-input AC-LTG with redundancy level N. Fig. 1. Schematic of a general M-input AC-LTG with redundancy level N Each input $x_i$ , $\forall i \in (1...M)$ , comes from the average of N error-prone physical replicas $x_i^j$ , $\forall j \in (1...N)$ , which represent the ideal binary variable $x_i^* \in (0, V)$ . Logic values 0 and 1 are physically represented by voltage levels 0 and V respectively. Variability in the input voltage levels $x_i^j$ comes from drift sources such as internal noise, device parameter deviations and physical defects. These fluctuations are modeled in this paper with a variable $\eta_i$ normally distributed $f_{\eta_i}(\eta_i) \sim N(0, \sigma_{\eta_i})$ : $$x_i^j = x_i^* + \eta_i \tag{3}$$ Therefore, by the properties of the normal distribution and considering homogeneous variability, each input $x_i$ is normally distributed with parameters $\mu_{x_i} = x_i^*$ and $\sigma_{x_i} = \sigma_{\eta_i}/\sqrt{N}$ . Each averaged input $x_i$ is associated with a weight $w_i$ controlling its impact on the final average y'. Many different Boolean functions can be synthesized adjusting the configuration of weights. Finally, a threshold operation is performed with an equivalent decision level T. At the output the restored binary value $\hat{y}$ is obtained. #### B. N-redundant 2-input NAND AC-LTG In order to study the effects of variability in the AC-LTG architecture we consider in this subsection a simple case: the N-redundant 2-input NAND AC-LTG. All reasonings developed for this particular case are easily transportable to the AC-LTG versions of AND, OR and NOR gates by redefining the threshold level or inverting the weights configuration. From the literature [8] we extract an optimized configuration of weights $W^*=(w_1^*,w_2^*)$ and threshold $t^*$ for a 2-input NAND AC-LTG. $$w_1^* = w_2^* = -18$$ $t^* = -25$ $(T^* = -25V)$ (4) This configuration has been optimized, although with limited precision, to tolerate the maximum possible level of manufacturing inaccuracy. This tolerant capability is measured as the maximum deviation $d_{max}$ that can affect the weights and threshold $(\Delta w_1 = |w_1 - w_1^*|, \Delta w_2 = |w_2 - w_2^*|, \Delta t = |t - t^*|)$ before affecting the synthesized Boolean function. In our particular case the robustness parameter is $d_{max} = 3.5$ , for this reason, the following condition must always hold: $$\max(\Delta w_1, \Delta w_2, \Delta t) \le d_{max} = 3.5 \tag{5}$$ Fig. 2. Schematic of N-redundant 2-input NAND AC-LTG with optimized precision-limited configuration of weights and threshold robust to manufacturing inaccuracies ( $W=(-18,-18),\, T=-25V$ ) The generic configuration of a 2-input AC-LTG, like the previous one, is $W=(w_1,w_2)$ and $T=t\cdot V$ . We compute for the generic implementation the probability of producing an erroneous output given the NAND Boolean function to synthesize and the four possible input combinations $\{00,01,10,11\}$ . Averaging these expressions assuming same probability of each input combination the following expression for the output gate error probability $P_e$ is obtained: $$P_{e} = \frac{1}{8} \left( 1 + erf \left( \frac{(w_{1} + w_{2} - t)V}{\sqrt{2 \left( w_{1}^{2} \frac{\sigma_{\eta_{1}}^{2}}{N} + w_{2}^{2} \frac{\sigma_{\eta_{2}}^{2}}{N} + t^{2} \sigma_{t}^{2} \right)}} \right) \right)$$ $$+ \frac{1}{8} \left( 1 - erf \left( \frac{(w_{1} - t)V}{\sqrt{2 \left( w_{1}^{2} \frac{\sigma_{\eta_{1}}^{2}}{N} + w_{2}^{2} \frac{\sigma_{\eta_{2}}^{2}}{N} + t^{2} \sigma_{t}^{2} \right)}} \right) \right)$$ $$+ \frac{1}{8} \left( 1 - erf \left( \frac{(w_{2} - t)V}{\sqrt{2 \left( w_{1}^{2} \frac{\sigma_{\eta_{1}}^{2}}{N} + w_{2}^{2} \frac{\sigma_{\eta_{2}}^{2}}{N} + t^{2} \sigma_{t}^{2} \right)}} \right) \right)$$ $$+ \frac{1}{8} \left( 1 - erf \left( \frac{-tV}{\sqrt{2 \left( w_{1}^{2} \frac{\sigma_{\eta_{1}}^{2}}{N} + w_{2}^{2} \frac{\sigma_{\eta_{2}}^{2}}{N} + t^{2} \sigma_{t}^{2} \right)}} \right) \right)$$ $$(6)$$ We observe that weights $W=(w_1,w_2)$ , and threshold parameter t do not depend on any re-scaling factor. They can be normalized and no change is produced in the resulting error probability $P_e$ . Applying this observation we define here the normalized weights $W^n=(w_1^n,w_2^n)$ in order to reduce the number of variables involved in the $P_e$ expression: $$w_1^n \equiv w_1/|t|$$ $$w_2^n \equiv w_2/|t|$$ $$t^n \equiv t/|t| = -1$$ Before introducing this change into equation (6) we present another definition in order to reorganize and normalize the remaining intervening terms, converting them into unitless parameters. It refers to the input standard deviations $\sigma_{x_1}$ , $\sigma_{x_2}$ , the drift in the threshold level $\sigma_t$ , the voltage V and the redundancy N: $$\sigma_1^n \equiv \sigma_{\eta_1}/N\sigma_t \sigma_2^n \equiv \sigma_{\eta_2}/N\sigma_t V^n \equiv V/\sigma_t$$ Applying all the above definitions, equation (6) yields: $$P_{e} = \frac{1}{8} \left( 1 + erf \left( \frac{(1 - w_{1}^{n} - w_{2}^{n})V^{n}}{\sqrt{2((w_{1}^{n}\sigma_{1}^{n})^{2} + (w_{2}^{n}\sigma_{2}^{n})^{2} + 1)}} \right) \right)$$ $$+ \frac{1}{8} \left( 1 - erf \left( \frac{(1 - w_{1}^{n})V^{n}}{\sqrt{2((w_{1}^{n}\sigma_{1}^{n})^{2} + (w_{2}^{n}\sigma_{2}^{n})^{2} + 1)}} \right) \right)$$ $$+ \frac{1}{8} \left( 1 - erf \left( \frac{(1 - w_{2}^{n})V^{n}}{\sqrt{2((w_{1}^{n}\sigma_{1}^{n})^{2} + (w_{2}^{n}\sigma_{2}^{n})^{2} + 1)}} \right) \right)$$ $$+ \frac{1}{8} \left( 1 - erf \left( \frac{V^{n}}{\sqrt{2((w_{1}^{n}\sigma_{1}^{n})^{2} + (w_{2}^{n}\sigma_{2}^{n})^{2} + 1)}} \right) \right)$$ $$(7)$$ This expression of $P_e$ (7) allows us to study in detail the effects of variability due to input drift sources $(\sigma_{\eta_1}, \sigma_{\eta_2})$ , drift in the threshold decision level $(\sigma_t)$ and deviation in weights and threshold assignment $(\Delta w_1, \Delta w_2, \Delta t)$ . It also provides us a way to study how much the impact of this effects can be diminished by increasing the level of redundancy N or the voltage V. ## • Ideal Case (Null variability): Figure 3 depicts the $P_e$ color map in the plane of normalized weights for the ideal case. Cool colors represent low probabilities while hot high probabilities. In this ideal case, with null variability, there is a region in the $W^n$ -plane where the error probability $P_e$ is null (the blue triangle). With small red marker the position of the previously referred configuration is shown (See (4), $W=(-18,-18),\,t=-25\Rightarrow W^n=(0.72,0.72)$ ). This configuration is robust to small displacements in the $W^n$ -plane since it has been designed to withstand deviations in weights and threshold assignment. ## Non-Ideal Symmetric Case: When variability is introduced in the input variables $\sigma_{\eta_1}$ , $\sigma_{\eta_2}$ as well as in the threshold value $\sigma_t$ , the shape of the $P_e$ color map deforms as shown in Figure 4. It is observed in this case that the optimal configuration previously presented (small red marker) is not necessarily the configuration with lower probability of error $P_e$ (small Fig. 3. $P_e$ color map of N-redundant 2-input NAND AC-LTG with null variability ( $\sigma_{\eta_1}=\sigma_{\eta_2}=\sigma_t=0$ ). In small red marker the optimized precision-limited configuration $W=(-18,-18),\,t=-25$ ( $W^n=(0.72,0.72)$ ) Fig. 4. $P_e$ color map of 10-redundant 2-input NAND AC-LTG with variability parameters ( $\sigma_{\eta_1}=\sigma_{\eta_2}=\sigma_t=0.1V$ ). In small red marker the optimized precision-limited configuration $W=(-18,-18),\,t=-25$ ( $W^n=(0.72,0.72)$ ) and in small white cross the configuration with lower probability of error $P_e$ white cross) as shown in Figure 4. Optimizing weights considering only deviation in the weights assignment produces optimal solutions slightly different from the ones obtained when considering only variability sources. #### Non-Ideal and Non-Symmetric Case: We consider here different levels of variability depending on the input in order to model the effect of degradation, which affects randomly different parts of the system. Figure 5 shows an example, only high levels of asymmetry have significant impact on the optimal configuration. #### C. Numeric results Figure 6 depicts the impact of redundancy level N in a typical case with variability parameters $\sigma_{\eta_1}=\sigma_{\eta_2}=\sigma_t=0.1V$ Fig. 5. $P_e$ color map of 10-redundant 2-input NAND AC-LTG with variability parameters $(\sigma_{\eta_1}=\sigma_t=0.1V,\,\sigma_{\eta_2}=2V)$ . In small red marker the optimized precision-limited configuration $W=(-18,-18),\,t=-25$ $(W^n=(0.72,0.72))$ and in small white cross the configuration with lower probability of error $P_e$ and different levels of deviation in the assignment of weights and threshold $d \geq \max(\Delta w_1, \Delta w_2, \Delta t)$ . The shown results correspond to the worst case obtained within the respective range of deviation. In this simulation, deviation d has been configured to sweep levels from 0% to 40% of the maximum admissible $d_{max}$ (Remember $d_{max} = 3.5$ (5)). Fig. 6. Probability of error $P_e$ versus redundancy level N at different levels of deviation in the assignment of weights and threshold $d \geq \max(\Delta w_1, \Delta w_2, \Delta t)$ . Case with variability parameters $\sigma_{\eta_1} = \sigma_{\eta_2} = \sigma_t = 0.1 V$ Figure 7 shows the effect produced on the probability of error $P_e$ versus the redundancy N at different levels of homogeneous variability $\sigma_{\eta_1} = \sigma_{\eta_2} = \sigma_t$ . A great improvement from N=1 to 10 is observed. Input levels of variability are extracted from the previsions of ITRS 2009 [1]. Fig. 7. Probability of error $P_e$ versus redundancy level N at different levels of homogeneous variability $\sigma_{\eta_1}=\sigma_{\eta_2}=\sigma_t$ and null deviation in weights and threshold assignment d=0 ## III. RELIABILITY COMPARISON OF NAND AC-LTG VERSUS NAND MULTIPLEXING In this section we make a comparison between the presented architecture and the well known NAND multiplexing technique. Both designs tolerate variability and faulty behavior of its compounding devices by means of redundancy. ## A. NAND multiplexing topology There are many studies on the NAND multiplexing architecture. Therefore, a set of useful formulations of its performance have been provided. They allow us to analyze it and have a clear view of its capabilities. We take advantage of these contributions to guide our discussion [9]. Figure 8 presents the general topology of a NAND multiplexing unit. It consists of a first stage performing the NAND operation and a second stage to restore the output value. Restoration is implemented with two NAND operations in series and intercalated randomizing blocks (U). This restoring unit can be replicated as many times as necessary to improve reliability level although this implies an additional increase in overhead. Fig. 8. Schematic of a NAND multiplexing architecture Parameters usually used to formulate the NAND multiplexing characteristics are: - N, the number of redundant inputs and outputs, - $\delta$ , the ratio between the faulty input lines and the total number of lines N, - $\bullet$ $\epsilon$ , the probability of a device producing a faulty output - and n, the number of restoring stages added at the output (the NAND multiplexing scheme of Figure 8 has n = 1). ## B. Parameter Equivalences In order to compare both techniques, NAND AC-LTG and NAND multiplexing, some equivalences between characteristic parameters must be established. Some of them are direct, like the level of redundancy N, but others, like $\epsilon$ and $\delta$ , require a more detailed analysis. • Redundancy N and number of restoring stages n: We add N threshold operations in parallel at the output of AC-LTG architecture so as to have the same topology in both fault-tolerant techniques: N redundant inputs and N redundant outputs. Figure 9 presents the general scheme of the NAND AC-LTG considered in this section. It is remarkable that the effective number of devices of both architectures differ linearly from each other with the number of restoring stages n (See Table I). NAND multiplexing requires more devices than NAND AC-LTG for the same redundancy N. Fig. 9. Schematic of a NAND AC-LTG with N redundant inputs and outputs TABLE I Number of devices versus level of redundancy for NAND MULTIPLEXING AND AC-LTG | Architecture | Redundancy level | Number of devices | |-------------------|------------------|-------------------| | NAND multiplexing | (N, n) | $N \cdot (1+2n)$ | | AC-LTG | N | N | ## • Ratio of faulty input lines $\delta$ : $\delta$ parameter from NAND multiplexing is directly related with the input level of variability in AC-LTG, which is expressed by $\sigma_{\eta_1}$ and $\sigma_{\eta_2}$ . Parameter $\delta$ expresses the probability of an input line being faulty. It corresponds to the probability of a given level of input variability $\sigma_{\eta}$ deviating the correct value more than V/2. This relation is expressed by equation (8). $$\delta = \frac{1}{2} \left( 1 - erf\left(\frac{V/2}{\sqrt{2\sigma_{\eta}^2}}\right) \right) \tag{8}$$ ## • Ratio of faulty operations per device $\epsilon$ : NAND multiplexing parameter $\epsilon$ is related with the remaining AC-LTG variability parameters: drift in the threshold level $\sigma_t$ and deviation level in weights and threshold assignment d. All of them concern to device faulty behavior. Given a certain deviation d, drift in threshold level $\sigma_t$ must complete the level of variability expressed by $\epsilon$ . Equation (9) manifests this relation. $$\epsilon = \frac{1}{2} \left( 1 - erf\left( \frac{(d_{max} - d)V}{\sqrt{2t^2 \sigma_t^2}} \right) \right) \tag{9}$$ #### C. Comparison Results The above relations along with the presented formulation of NAND multiplexing architecture allow us to make a reliability comparison. Given a logic input $X_1=1$ , $X_2=1$ , the worst case, having 10% of errors in the input bundle ( $\delta=0.1$ ), the probability of having less than 10% of errors in the output bundle is computed for both strategies. Results for redundancy level N=10 and n=7 are depicted in Figure 10. We have picked these parameters because they imply a good performance in both strategies and do not require too high redundancy. Fig. 10. Reliability comparison between NAND multiplexing with redundancy N=10 and number of restoring stages n=7 and NAND AC-LTG with redundancy N=10 and different levels of inaccuracy in weights and threshold assignment d We can see in Figure 10 that AC-LTG have better performance against device failure rate than NAND multiplexing provided that deviation levels in weights and threshold assignment d are lower than 60% of the maximum admissible $d_{max}$ . Given a restriction for the probability of having less than 10% of errors in the output bundle, it is easy to see how NAND AC-LTG improves NAND multiplexing performance. For example, imposing $Pr(P_e < 10\%) > 90\%$ implies having $\epsilon < 10^{-2.75}$ for the NAND multiplexing technique while $\epsilon < 10^{-0.60}$ for the NAND AC-LTG with deviation level in weights and threshold assignment of d = 40% of $d_{max}$ . #### IV. CONCLUSIONS The combination of Averaging Cell with Threshold Logic Gates AC-LTG provides a way to perform reliable computing in spite of the inherent unreliability of the compounding devices. This architecture exhibits a good performance at moderate levels of redundancy (N=10) against different sources of variability considered in this paper, such as drift in the input signals and deviation in the LTG parameters. AC-LTG has been compared with NAND multiplexing technique through some established relations between characteristic parameters. It is concluded that under moderate levels of manufacturing inaccuracies the improvement in tolerance to faulty device behavior is two orders of magnitude in AC-LTG versus NAND multiplexing. #### ACKNOWLEDGMENTS This research work was supported by the Spanish Ministry of Science and Innovation (MICINN) through the project TEC2008-01856, with additional funding from the ERDF and the European TRAMS project, FP7 248789. #### REFERENCES - "International technology roadmap for semiconductors, 2009," http:// www.itrs.net/Links/2009ITRS/Home2009.htm, 2009. - [2] J. V. Neumann, "Probabilistic logics and the synthesis of reliable organisms from unreliable components," *Brain theory: reprint volume*, pp. 43–98. 1956. - [3] M. Stanisavljevic, A. Schmid, and Y. Leblebici, "Optimization of Nanoelectronic Systems Reliability Under Massive Defect Density Using Distributed R-fold Modular Redundancy (DRMR)," in 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2009. DFT'09, 2009, pp. 340–348. - [4] K. Nikolic, A. Sadek, and M. Forshaw, "Fault-tolerant techniques for nanocomputers," *Nanotechnology*, vol. 13, p. 357, 2002. - [5] F. Martorell, S. Cotofana, and A. Rubio, "An analysis of internal parameter variations effects on nanoscaled gates," *IEEE transactions on nanotechnology*, vol. 7, no. 1, p. 24, 2008. - [6] S. Muroga, Threshold logic and its applications. John Wiley & Sons, 1971. - [7] F. Martorell and A. Rubio, "Defect and fault tolerant cell architecture for feasible nanoelectronic designs," in *International Conference on Design* and Test of Integrated Systems in Nanoscale Technology, 2006. DTIS 2006, 2006, pp. 244–249. - [8] M. Goparaju, A. Palaniswamy, and S. Tragoudas, "A Fault Tolerance Aware Synthesis Methodology for Threshold Logic Gate Networks," in IEEE International Symposium on Defect and Fault Tolerance of VLSI Systems. IEEE, 2008, pp. 176–183. - [9] J. Han and P. Jonker, "A system architecture solution for unreliable nanoelectronic devices," *IEEE Transactions on Nanotechnology*, vol. 1, no. 4, pp. 201–208, 2002.