# Weak Inversion Performance of CMOS and DCVSPG Logic Families in Sub-300mV Range Omer Can Akgun, Yusuf Leblebici Swiss Federal Institute of Technology (EPFL) Microelectronic Systems Laboratory (LSM) Lausanne, CH-1015, Switzerland e-mail:{omercan.akgun, yusuf.leblebici}@epfl.ch Abstract—In this paper the advantages of using Differential Cascode Voltage Switch Pass Gate (DCVSPG) logic with regard to standard CMOS for subthreshold operation are presented. The two families are compared in terms of their performance and Energy-Delay-Product (EDP) figures. Multiple gates were simulated using $0.18\mu m$ standard CMOS technology. Simulation results show that DCVSPG NAND2 gate has 71%, DCVSPG NOR2 gate has 82% and DCVSPG full adder has 66% EDP savings over the CMOS counterparts. #### I. INTRODUCTION Power density and power consumption of microprocessors has become a significant concern during the recent years. Modern microprocessors consume around 160W today and this value is expected to rise and saturate according to the ITRS 2004 report [1]. Power density is a more important concern for high performance microprocessor design because of the large number of transistors on a single die and the increasing clock frequencies. The power density limit of a processor is set by the thermal design of the system and the reliability of operation under high temperature conditions. Power consumption is equally important for mobile systems where the long battery life is desirable. Any significant reduction in power dissipation can only be achieved by lowering the operating voltage of the circuits. This would be possible by relaxing the constraints of classical strong-inversion operation of MOSFETs, and by accepting the notion that transistors can (and will) be operated well below threshold, in the subthreshold regime, e.g. with power supply voltages of 200-300mV. Hence, to solve the power consumption and power density problems, subthreshold logic emerges as a very strong candidate. With transistors working in the subthreshold regime, the supply voltage can be scaled aggressively and power dissipation can be decreased significantly. There are successful implementations of digital circuits working in the subthreshold region [2]–[4] and techniques to improve the performance of subthreshold CMOS circuits have also been proposed [5]. Recently, subthreshold operation of static CMOS logic has been analytically analyzed using the EKV model. According to the analysis in [6], to benefit the most from the subthreshold operation, the logic circuits should be run at their maximum operating frequency by an activity factor $\alpha$ as close to 1 as possible. In light of the mentioned analysis, we define the maximum operating frequency of the logic gate by $$f_{max} = \frac{1}{T_{P_{min}}} \tag{1}$$ where $T_{P_{min}}$ is the minimum operating period determined by the propagation delay, rise and fall times. We also define the energy delay product (EDP) as $$EDP = E_{SW_{ava}} T_{P_{min}} \tag{2}$$ where $E_{SW_{avg}}$ is the average energy per switching. In this paper we present the utilization of DCVSPG logic family for subthreshold operation and demonstrate its advantages over static CMOS. The organization of the paper is as follows: Section II briefly describes the operation of MOS transistor in subthreshold region. The DCVSPG logic family is reviewed in Section III. Two logic families are compared using basic gates in Section IV and using a full adder cell in Section V. The work is concluded in Section VI. ## II. SUBTHRESHOLD MOS OPERATION The MOS digital circuits operate in subthreshold regime when the supply voltage is lower than the threshold voltage $(V_T)$ of the transistors. The drain current of an n-channel MOS transistor operating in this regime is given by [6] $$I_{DS} = I_{S}e^{\frac{V_{GS} - V_{T}}{nU_{T}}} \left(1 - e^{\frac{-V_{DS}}{U_{T}}}\right)$$ (3) where n is a process dependent term called slope factor and is typically in the range of 1.3-1.5 for modern CMOS processes. The value of n depends on the depletion region characteristics of the transistor. $V_{GS}$ and $V_{DS}$ are the gate to source and drain to source voltages, respectively. The parameter $I_S$ is the specific current which is given by, $$I_S = 2n\mu C_{ox} U_T^2 \frac{W}{L} \tag{4}$$ where $\mu$ is the mobility of carriers, $C_{ox}$ is the gate oxide capacitance per unit area, $U_T$ is the thermal voltage whose value is 26mV at 300K and $\frac{W}{L}$ is the aspect ratio of the transistor. Fig. 1. Basic DCVSPG logic gates Due to the second term in (3), the drain current is 0 when $V_{DS}=0$ but reaches its maximum value and saturates with $V_{DS}$ values higher than a few $U_T$ . As it is apparent from (3), the drain current of a MOS transistor in subthreshold region shows exponential dependence on the gate-to-source and drain-to-source voltages. This exponential dependence on the terminal voltage values increases the influence of temperature-voltage supply variations and the substrate noise on circuit operation. To minimize the mentioned effects and to achieve better performance in the subthreshold regime, we propose using a differential logic family for subthreshold operation. ## III. DCVSPG LOGIC FAMILY In the Differential Cascode Voltage Switch (DCVS) logic family the load consists of a pair of cross coupled PMOS transistors. The smaller sized PMOS network reduces the internal capacitances, and hence improves the performance and reduces the power consumption. Moreover, the availability of differential signals increase the noise margin and provides more reliable operation. The DCVSPG logic family replaces the NMOS logic tree of the DCVS by an NMOS pass-gate logic tree [7]. Using a pass-gate logic tree solves the floating node problem that exists in other DCVS family of circuits and results in DCVSPG being non-ratioed logic. By eliminating the floating node problem the power consumption of DCVSPG cells is reduced with respect to DCVS [8]. Two basic DCVSPG logic gates are shown in Fig. 1. Another beneficial feature of DCVSPG is that when synthesizing some logic functions, the sources of some NMOS transistors are connected to rails, reducing the load on the previous stages. ## IV. SUBTHRESHOLD LOGIC COMPARISON The CMOS implementation of a basic logic function requires an equal number of PMOS and NMOS transistors, and for k inputs, the number of transistors used is 2k. In DCVSPG the maximum total number of NMOS transistors is $2^{k+1}-4$ and the number of PMOS transistors, which are acting as load devices, is 2. For DCVSPG the NMOS stack depth is k-1 and PMOS stack depth is always 1, while in the CMOS implementation, both stack depths can be as high as k for the worst case. When operating CMOS logic in subthreshold mode, the PMOS pull-up network is the main bottleneck for achieving better performance. In the smaller feature-sized technologies, Fig. 2. CMOS inverter noise margins in subthreshold operation regime the mobility of the electrons is usually 4 to 5 times higher than the mobility of the holes. During strong inversion operation, to have equal pull-up and pull-down performance with the same stack depth, the PMOS transistors need to be sized 2 to 2.3 times larger than the NMOS transistors and to have the greatest noise margin possible, the PMOS transistors should be sized 4 to 5 times larger than the NMOS transistors. The situation is even worse for subthreshold operation. To achieve the greatest noise margin possible, the PMOS transistors should be sized 13 times larger than the NMOS transistors (Fig. 2). This increase in the transistor size adds extra capacitance and because the current available during subthreshold operation is limited, the extra capacitances due to the larger sizing of the PMOS transistors severely limit the performance of the gates. The influence of the PMOS transistor sizes in CMOS NAND2 and NOR2 gates on the maximum operation frequency and EDP is shown in Fig. 3. For both gates minimum EDP occurs at the ratio $\frac{W_p}{W_n}=4$ . The improvement in the EDP metric comes from the fact that the maximum operation frequency increases. On the other hand, with increasing PMOS transistor sizes the energy per switching also increases. It should also be noted that the noise margin for small EDP levels is quite low because of the small size of the PMOS transistors. For DCVSPG, the noise margin is basically doubled when compared to static CMOS due to the availability of differential signals. Because of the increased noise margin, the PMOS load transistors can be made smaller. Another advantage of DCVSPG family is that it uses only 2 parallel PMOS transistors for any logic function with arbitrary number of inputs, thus decreasing the influence of lower hole mobility on the circuit operation explained previously. For DCVSPG gates the minimum EDP is obtained for the minimum PMOS size (Fig. 4), and the improvement in the maximum operation frequency by increasing the PMOS size is not as much as in the CMOS case. Hence, if the DCVSPG gates are used, minimum sized transistors can be used to achieve the best Fig. 3. Maximum operating frequency and EDP for basic CMOS gates in subthreshold regime at VDD=300mV, as a function of $W_p/W_n$ . Fig. 4. Maximum operating frequency and EDP for basic DCVSPG gates in subthreshold regime at VDD=300mV, as a function of $W_p/W_n$ . energy consumption and best EDP figure without significant degradation of the performance. TABLE I COMPARISON OF MAXIMUM OPERATING FREQUENCY AND EDP OF CMOS AND DCVSPG GATES FOR DIFFERENT SUPPLY VOLTAGES | | EDP (10 <sup>-21</sup> Js) | | Max. Frequency (MHz) | | | | |-------------|----------------------------|--------|----------------------|--------|--|--| | VDD | CMOS | DCVSPG | CMOS | DCVSPG | | | | 0.20 | 1.46 | 0.43 | 0.14 | 0.13 | | | | 0.25 | 0.77 | 0.25 | 0.45 | 0.38 | | | | 0.30 | 0.50 | 0.20 | 1.10 | 0.83 | | | | 0.35 | 0.14 | 0.08 | 4.11 | 2.91 | | | | (-) NIANIDO | | | | | | | | (a) NAND2 | | | | | | | | | |-----------|----------------------------|--------|----------------------|--------|--|--|--|--| | | EDP (10 <sup>-21</sup> Js) | | Max. Frequency (MHz) | | | | | | | VDD | CMOS | DCVSPG | CMOS | DCVSPG | | | | | | 0.20 | 1.76 | 0.50 | 0.12 | 0.22 | | | | | | 0.25 | 1.44 | 0.27 | 0.41 | 0.65 | | | | | | 0.30 | 0.57 | 0.20 | 1.04 | 1.43 | | | | | | 0.35 | 0.33 | 0.08 | 3.47 | 5.04 | | | | | (b) NOR2 Furthermore, due to the fact that the current available for charging and discharging the capacitances is limited in subthreshold regime, the positive feedback provided by the crosscoupled load also increases the switching performance of the DCVSPG logic gates. All these features make DCVSPG a better alternative compared to CMOS for subthreshold operation. Table I shows the maximum operating frequency and EDP comparisons of NAND and NOR gates for different supply voltages. For all the supply values, the EDP of DCVSPG gates is lower than that of CMOS gates. # V. FULL ADDER COMPARISON To make a realistic comparison of DCVSPG and CMOS logic families for practical applications, a full adder (FA) block was simulated. For the CMOS FA the well-known mirror configuration was used. The transistor sizes were chosen in order to improve the performance of the adder in the subthreshold regime while keeping the EDP minimum. The DCVSPG implementation of the FA is also straightforward. The *sum* and *carry* blocks were implemented separately. The schematics of the implemented DCVSPG blocks Fig. 5. DCVSPG full adder gates can be seen in Fig. 5. Because the DCVSPG logic is differential, the inverters at the output of the logic blocks are not needed and this property gives DCVSPG a performance advantage over the CMOS. Moreover, because the extra switching at the output is avoided, the energy consumption of DCVSPG is lower than that of the CMOS. Another advantage of the DCVSPG gates is the availability of the complementary signals. This property greatly simplifies and reduces the number of transistors where the complementary signals can be used. For example, the number of transistors used in the DCVSPG sum gate is 10, two of which are PMOS. On the other hand, the same function implemented in a static CMOS mirror adder (which is the optimal for performance and transistor count) uses 7 NMOS and 7 PMOS transistors. The simulation results for different voltage supply values and for different corners of the FA circuits are presented in Fig. 6 and Table II, respectively. Although the maximum achievable operating frequency is slightly higher in the CMOS implementation, the EDP figure of the DCVSPG implementation is much smaller than that of CMOS both for different supply voltage values and different process corners. Thus, we can conclude that DCVSPG is a better choice for subthreshold operation for improved performance and less energy consumption. # VI. CONCLUSIONS In this paper the advantages of employing DCVSPG logic family for subthreshold operation has been presented. Due to the single PMOS stack depth, positive feedback and reduced NMOS stack depth, DCVSPG performs better than CMOS in terms of energy per switching and EDP, and achieves comparable operation speed. It has been shown through simulations that the EDP savings using DCVSPG gates can be as much Fig. 6. Comparison of maximum operating frequency and EDP of CMOS and DCVSPG full adder gates for different supply voltages TABLE II CORNER SIMULATIONS FOR FULL ADDER CELLS AT VDD=0.3V | | EDP $(10^{-21})$ | | Max. Frequency (MHz) | | |--------|------------------|--------|----------------------|--------| | CORNER | CMOS | DCVSPG | CMOS | DCVSPG | | TT | 2.09 | 0.93 | 0.64 | 0.50 | | SS | 6.40 | 2.55 | 0.20 | 0.15 | | FF | 1.35 | 0.27 | 1.29 | 1.79 | | SF | 1.74 | 0.96 | 0.84 | 0.60 | | FS | 4.14 | 1.26 | 0.33 | 0.29 | as 82% and by utilizing DCVSPG gates more energy efficient systems can be implemented. #### REFERENCES - [1] Itrs roadmap. [Online]. Available: http://public.itrs.net - [2] A. Wang and A. Chandrakasan, "A 180-mv subthreshold fft processor using a minimum energy design methodology," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 1, pp. 310–319, 2005. - [3] J. Kao, M. Miyazaki, and A. Chandrakasan, "A 175-mv multiply-accumulate unit using an adaptive supply voltage and body bias architecture," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 11, pp. 1545–1554, 2002. - [4] C.-I. Kim, H. Soeleman, and K. Roy, "Ultra-low-power dlms adaptive filter for hearing aid applications," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 11, no. 6, pp. 1058–1067, 2003. - [5] H. Soeleman, K. Roy, and B. Paul, "Robust subthreshold logic for ultralow power operation," *IEEE Transactions on Very Large Scale Integration* (VLSI) Systems, vol. 9, no. 1, pp. 90–99, 2001. - [6] E. Vittoz, Low-Power Electronics Design. CRC Press LLC, 2004, ch. 16. - [7] F. Lai and W. Hwang, "Differential cascode voltage switch with the pass-gate (dcvspg) logic tree for high performance cmos digital systems," in VLSI Technology, Systems, and Applications, 1993. Proceedings of Technical Papers. 1993 International Symposium on, 1993, pp. 358–362. - [8] F.-S. Lai and W. Hwang, "Design and implementation of differential cascode voltage switch with pass-gate (dcvspg) logic for high-performance digital systems," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 4, pp. 563–573, 1997.