# Improving the Power-Delay Performance in Subthreshold Source-Coupled Logic Circuits Armin Tajalli<sup>1</sup>, Massimo Alioto<sup>2</sup>, Elizabeth J. Brauer<sup>3</sup>, and Yusuf Leblebici<sup>1</sup> <sup>1</sup> Microelectronic Systems Lab. (LSM) Ecole Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland <sup>2</sup> Information Eng. Dept., University of Siena, 53100 Siena, Italy <sup>3</sup> Electrical Eng. Dept., Northern Arizona Univ., Flagstaff AZ 86011, USA Abstract. Subthreshold source-coupled logic (STSCL) circuits can be used in design of low-voltage and ultra-low power digital systems. This article introduces and analyzes new techniques for implementing complex digital systems using STSCL gates with an improved power-delay product (PDP) based on source-follower output stages. A test chip has been manufactured in a conventional digital $0.18\mu m$ CMOS technology to evaluate the performance of the proposed STSCL circuit, and speed and PDP improvements by a factor of up to 2.4 were demonstrated. #### 1 Introduction The demand for implementing very low power integrated circuits is making subthreshold circuit design techniques increasingly attractive [1]. Applications such as sensor networks [2], [3], portable battery powered systems [4], [5], and implantable circuits for biological applications [6], need to have a very low power consumption as well as low sensitivity to the supply voltage and its variations. It has already been shown that by proper biasing of CMOS logic circuits in subthreshold regime, it is possible to achieve a very low power consumption [7]-[10]. However, supply dependence of the maximum speed of operation $(f_{op})$ and power consumption $(P_{diss})$ of the CMOS logic circuits have made such circuits very sensitive to the supply voltage variations. Therefore, a precise supply voltage with low variation is required. Using smart voltage regulators with high power supply rejection capability and low power consumption is a popular approach to provide the suitable supply voltage for CMOS digital circuits. However, the power and area overhead associated with this technique is generally very high. This underlines the need for power efficient circuits with low sensitivity to the supply voltage variations. Due to their fully differential topology, the source-coupled logic (SCL) circuits exhibit a very low sensitivity to the supply voltage and its variations [11]. In addition, they inject less noise to the supply and substrate and hence exhibit less cross-talk. These properties make this topology very attractive for high-speed mixed-signal applications [12]. Some recent developments have shown that it is possible to use this topology for ultra low power applications [13],[14]. Subthreshold SCL (STSCL) circuits Fig. 1. Generic source-coupled logic circuit and replica bias circuit to control the output voltage swing. can operate with a very low bias current per cell (down to few pA) and still provide a low sensitivity to the supply voltage. In this article, after a brief overview on the subthreshold source-coupled logic circuits, some new techniques for improving their performance in terms of power-delay product (PDP) will be described. Using a structural approach, it is possible to provide the basis for creating complicated digital circuits using STSCL gates. ## 2 Ultra Low Power Source-Coupled Logic Source-coupled logic circuits are well known mainly due to their superior performance in high frequencies compared to the CMOS logic gates [11]. Figure 1 shows a generic SCL circuit in which the NMOS differential pair network performs the logic operation. To operate with an ultra-low power consumption, the tail bias current of an SCL gate $(I_{SS})$ can be reduced considerably without degrading the logic operation of the NMOS switching network. Indeed, as far as the leakage currents in the circuit are negligible compared to the tail bias current and also the voltage swing at the output is large enough $(V_{SW} > 4n_nkT/q$ where $n_n$ is the subthreshold slope factor of NMOS devices, k is Boltzmann constant, q is electron charge, and T stands for temperature in degree Kelvin), the differential NMOS network operates properly as a logic switching network. On the other hand, maintaining the required output voltage swing at reduced tail bias current values calls for higher load resistance values $(R_L)$ . Regarding $R_L = V_{SW}/I_{SS}$ , very high resistivity load devices are required to reduce the bias current to or below nano-ampere range. Recently, a very effective approach has been proposed for implementing very high resistivity load devices [13], based on Fig. 2. Subthreshold source-coupled logic circuit. bulk-drain connected PMOS transistors. Figure 2 shows the circuit diagram of a simple SCL buffer stage with high resistance load devices. The bulk and drain of PMOS load devices are connected to each other to extend the controllable resistivity range of the subthreshold PMOS load devices, up to approximately 400mV which is sufficient to switch the NMOS devices of the following stages successfully. Using this topology, it can be shown that since the devices are in subthreshold regime, the voltage gain of each stage is $A_{v0} = n_p/(n_n(n_p-1))$ ( $n_p$ and $n_n$ are the subthreshold slope factors of PMOS load devices and NMOS differential pair devices, respectively) [13]. By properly choosing the output voltage swing, the stage gain is high enough to allow using this circuit as a logic stage. Different test structures have been implemented to verify the proper operation of STSCL circuits based on the load device concept shown in Fig. 2. Measurements show that the tail bias current of each cell can be selected in the range of $10pA < I_{SS} < 200nA$ with a supply voltage of as low as 350mV [14]. #### 3 Performance of STSCL Circuits In contrast to the CMOS gates in which there is no static power consumption (neglecting the leakage current), each STSCL gate draws a constant bias current of $I_{SS}$ from the supply [Fig. 1]. Therefore, the power consumption of each STSCL gate is $$P = V_{DD}I_{SS}. (1)$$ Meanwhile, the time constant at the output node of each STSCL gate, i.e. $$\tau_{STSCL} = R_L \cdot C_L \approx (V_{SW}/I_{SS}) \cdot C_L \tag{2}$$ Fig. 3. A buffered SCL gate to reduce the capacitive loading effect on the core. is the main speed limiting factor in this topology. Based on (2), one can choose the proper $I_{SS}$ value to be able to operate in the required frequency range. Regarding (1), it can be concluded that the power consumption is constant and independent to the operation frequency. Therefore, it is necessary that the STSCL circuit is always operated at its maximum activity rate to achieve the maximum passible efficiency. The other possibility for improving the power-delay product (PDP) of STSCL circuits is using the minimum possible tail current for the logic operation, and placing a buffer between the gate and load capacitance $(C_L)$ . Based on (1) and (2), PDP of each gate can be approximately indicated by $$PDP_{STSCL} \approx ln(2) \times V_{DD}V_{SW}C_L$$ (3) which is directly proportional to the load capacitance. Therefore, it is possible to improve PDP using a simple buffer stage at the output of each STSCL gate. Figure 3 shows a simple topology that uses two source follower buffers (SFBs) at the complementary outputs of an STSCL gate to isolate the load capacitance $C_L$ from the core circuit. In this case, the total capacitive load seen by the core STSCL circuit will be reduced to the input capacitance of the SFB stage $(C_B)$ . Operating with very low bias currents, the size of devices used in the SFB stage can be very small and hence this stage would have a very small loading effect on STSCL core. Therefore, the dominant time constant at the circuit topology shown in Fig. 3 will be $$\tau_{SFB} \approx C_L/g_{m3} \tag{4}$$ which is valid for small signal variations. In a real case when the output swing is in the order of several hundreds of mV, however, this equation will not be valid. Indeed, at each rising edge, more current will flow into the proposed common-source device. Hence, in this case the time constant of the node would be even smaller than the value predicted in (4). On the other hand, for falling transitions, the common-drain transistor will be turned off and the only path for discharging the output node will be $I_B$ . Therefore, the output will slew down by the slope of $I_B/C_L$ . This means that the improvement predicted by (4) can be expected only at the rising edges. Neglecting the delay of STSCL core and typical conditions where $V_{SW}$ =200mV at room temperature, it can be shown that the slew mode will increase the total delay approximately to $$t_{d.SFB} \approx 1.6C_L/g_{m3}. (5)$$ Here, it is assumed that M3 will turn off very quickly at the falling edges. This assumption can be acceptable when the time constant at the output of STSCL gate is much less than the time constant at the output of SFB stage. Including the delay of STSCL core to the total delay, assuming $\tau_{STSCL-SFB} \approx \tau_{STSCL} + \tau_{SFB}$ , then $$\gamma_d = \frac{t_{d,STSCL}}{t_{d,STSCL-SFB}} \approx \frac{\gamma_I}{(1+\gamma_I)\frac{3.2\gamma_I U_T}{\ln(2)V_{SW}} + \gamma_C} \tag{6}$$ in which $\gamma_C = C_B/C_L$ ( $C_B$ is the input capacitance of the SFB stage) and $\gamma_I = I_{SS,C}/(2I_B)$ . Here, it is assumed that the total bias current in both topologies is equal. This equation also implies that by properly choosing the $\gamma_I$ with respect to the $\gamma_C$ , it is possible to achieve a balanced design for different load capacitance values. This property is especially useful in the design of digital library cell elements as will be explained in Section 4. It is also interesting to notice that for very large load capacitances, $\gamma_d \approx 2.25/(1+\gamma_I) \approx 2.25$ (for small values of $\gamma_I$ ). Therefore, using SFBs, it is possible to improve the PDP of STSCL circuits by a factor of approximately 2.25. ### 4 Results and Discussion #### 4.1 Circuit Performance Figure 4 shows the total delay improvement using SFB stage stage at the output of STSCL gates compared to a simple STSCL gate, under the assumption that both circuit solutions are dissipating the same amount of power. The comparison is shown for different load capacitances and for different ratios of the bias currents ( $\gamma_I = I_{SS,C}/(2I_B)$ ). For low load capacitances (less than 20fF), the simple STSCL gate without the SFB stage shows smaller total delay. However, as the load capacitance increases, the topology shown in Fig. 3 exhibits less delay compared to a simple STSCL gate. In complex digital systems where the output load is dominated by interconnect capacitance, an improvement in the PDP by a factor of approximately 2.5 can be observed. Note that the amount of delay Fig. 4. Total delay improvement using source-follower buffer at the output of subthreshold source-coupled logic circuit in equal total power consumption based on transistor level simulations. improvement also depends on the ratio $\gamma_I$ of core (logic block) current versus the SFB stage current. Generally, a larger delay improvement can be expected for smaller $\gamma_I$ ratios, i.e. where the SFB bias current is much larger than the core current. The choice of the output buffer topology also reflects a careful balance between circuit complexity and performance. Using a more complex output stage, more improvement can be achieved. For example, a class A output stage would reduce the sensitivity to the load capacitance even further. However, in this case the circuit complexity would increase rapidly and controlling the power consumption and voltage swing would be very difficult. Using a class A output stage can also increase the sensitivity to the supply voltage variations. The simple SFB stage output buffer technique can simplify the design of library cells. Based on this approach, to provide different driving strengths for a specified logic operation, it is sufficient to design a single logic cell and provide the required driving strength by using different SFB stages as shown in Fig. 5. Illustrated as an example in Fig. 5, a single STSCL gate together with different SFB stages with different bias or driving capabilities can provide the required specifications. Based on this approach, $I_{SS,C}$ is constant for all STSCL gates while N can be changed to achieve different driving capabilities. Since all devices are biased in subthreshold regime, it is sufficient to change the bias current in the SFB stage without changing the size of source follower devices (i.e., $(W/L)_{SF}$ remains constant) to implement different driving strengths. Therefore, the only required modification is changing the size of tail bias transistors in the output buffer stage. It is possible to use (6) in order to determine the proper bias current for the SFB stage with respect to the load capacitance $(C_L)$ . Indeed, by solving Fig. 5. Design of standard library cells with different driving strengthes using STSCL-SFB topology. $\partial \gamma_d/\partial \gamma_I=0$ , it can be shown that the optimum value for $\gamma_I$ for a given $\gamma_C$ is: $$\gamma_I = \sqrt{\frac{ln(2)V_{SW}}{2.75U_T}\gamma_C} \tag{7}$$ which indicates that for larger load capacitances (i.e., a smaller $\gamma_C$ ), a smaller current should be dissipated in the STSCL core (i.e., smaller $\gamma_I$ should be selected). Regarding (7), it can be also concluded that for increasing the driving capability of the gate by a factor of M, it is sufficient to increase the bias current of the SFB stage by a factor of $\sqrt{M}$ which is always smaller than M for M>1. Using this optimum value for $\gamma_I$ , simulation results show that STSCL gates that are using source follower buffer have a better performance for $C_L > 10C_B$ . With minimum size devices and a compact layout, it is possible to reduce $C_B$ to about 1fF-3fF. Therefore, using a careful design strategy it is possible to have a superior performance for load capacitances as low as 10-30fF using STSCL-SFB topology. For $C_L < 10C_B \approx 10$ fF-30fF, the simple STSCL topology will exhibit a comparable or better performance. However, it is not possible to have a mixed design consisting of simple STSCL gates and STSCL-SFB gates mainly because of the voltage drop of the source follower stage. Since the mentioned limit on the load capacitance is relatively low ( $C_L < 10C_B \approx 10$ fF-30fF), it is expected that even in low-complexity designs the proposed topology provides considerable advantages in the power-delay product. #### 4.2 Measurement Results A test chip has been fabricated in a conventional $0.18\mu m$ CMOS technology to verify the performance of STSCL gates with and without source-follower buffers in each stage. For this purpose two ring oscillators have been implemented, where Fig. 6. Photomicrograph of the test chip implemented in $0.18\mu m$ technology. one uses simple STSCL MUX (multiplexer) gates configured as buffer stages and the other one uses the same configuration where each MUX gate is followed by a source-follower buffer. Each ring oscillator has a capacitor bank that can change the loading capacitance in all intermediate nodes of the oscillator. In this way, it is possible to study the delay of cells for different capacitance load values. Both oscillators have eight delay stages. The chip photomicrograph is shown in Fig. 6. The measured oscillation frequency of the first ring oscillator (which uses simple SCL gates) is shown in Fig. 7(a). The measured oscillation frequency of the proposed ring oscillator shows a very good agreement with the post layout simulation results. The results shown in Fig. 7(a) have been used to estimate the exact value of the internal capacitances in capacitor bank. Figure 7(b) shows the measured delay ratio ( $\gamma_D$ ) for two ring oscillators for total bias currents of 1nA and 10nA per stage (i.e., the total current consumption of the ring oscillators is 8nA and 80nA, respectively). Both oscillators are connected to the same supply voltage and consume the same amount of power. In these measurements, $V_{DD} = 0.7V$ , $V_{SW} = 0.2V$ , and the total power consumption (excluding the replica bias circuit) is 5.6nW and 56nW for $I_{SS} = 1$ nA and 10nA, respectively. This figure shows the results for three different $\gamma_I$ values ( $\gamma_I = 0.1, 0.3, 0.5$ ). It can be seen that in all cases, the SFB output stage offers a clear speed improvement for large output capacitance values ( $C_L > 100$ fF-200fF). The speed gain can be as high as factor of 2.4, and consistently independent of the bias current level. #### 5 Conclusion It is shown that the power-delay product of subthreshold source-coupled logic circuits can be improved by utilizing an output source-follower buffer stage. A test chip has been implemented in digital $0.18\mu m$ CMOS technology to verify **Fig. 7.** Measurement results: (a) Oscillation frequency of the simple SCL-based circuit in comparison to the simulation results, (b) Total delay improvement for total bias current per stage of $I_{TOT} = I_{SS} = I_{SS,C} + 2I_B = 1$ nA and 10nA. Each ring oscillator constructed of 8 delay cells. the proposed concept. Based on the simulation and measurement results, improvements on the power-delay product of the circuit by a factor of as high as 2.4 can be demonstrated using the SFB output buffers. ## Acknowledgment The authors would like to thanks F. K. Gurkaynak and S. Badel for their valuable help in test chip design step and S. Hauser for preparing the test setup. ## References - E. Vittoz, "Weak Inversion for Ultimate Low-Power Logic", in Low-Power Electronics Design, Editor C. Piguet, CRC Press, 2005. - 2. G. Gielen, "Ultra-low-power sensor networks in nanometer CMOS," Int. Symp. on Sig., Circ. and Sys. (ISSCC), vol. 1, Jul. 2007, pp. 1-2. - 3. B. A. Warneke and K. S. J. Pister, "An ultra-low energy microcontroller for smart dust wireless sensor networks," in *IEEE Int. Solid State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2004, pp.316-317. - 4. M. Horowitz et al., "Low-power digital design," in IEEE Int. Symp. Low Power Electron. Design, 1994, pp. 8-11. - 5. D. Suvakovic and C. A. T. Salama, "A low $V_t$ CMOS implantation of an LPLV digital filter core for portable audio applications," in *IEEE Trans. on Circ. and Syst.-II: Analog and Digital Sig. Processing*, vol. 47, no. 11, pp. 1297-1300, Nov. 2000. - L. S. Wong, and et al., "A very low-power CMOS mixed-signal IC for implantable pacemaker applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2446-2456, Dec. 2004. - B. H. Calhoun, A. Wang, and A. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits," *IEEE J. Solid-State Circuits*, vol. 40, no. 9, pp. 1778-1786, Sep. 2005. - 8. B. H. Calhoun and A. Chandrakasan, "Ultra-dynamic voltage scaling (UDVS) using sub-threshold operation and local voltage dithering," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 238-245, Jan. 2006. - 9. R. Amirtharajah and A. Chandrakasan, "A micropower programmable DSP using approximate signal processing based on distributed arithmetic," *IEEE J. Solid-State Circuits*, vol. 39, no. 2, pp. 337-347, Feb. 2004. - 10. H. Soeleman, K. Roy, and B. C. Paul, "Robust subthreshold logic for ultra-low power operation," *IEEE Trans. Very Large Scale Integ. (VLSI) Syst.*, vol. 9, no. 1, pp. 90-99, Sep. 2001. - 11. S. Badel and Y. Leblebici, "Breaking the power-delay tradeoff: design of low-power high-speed MOS current-mode logic circuits operating with reduced supply voltage," in Proc. IEEE Int. Symp. on Circ. and Syst. (ISCAS), May 2007, pages 1871-1874. - 12. A. Tajalli, P. Muller, and Y. Leblebici, "A power-efficient clock and data recovery circuit in 0.18 $\mu$ m CMOS technology for multi-channel short-haul optical data communication," *IEEE J. of Solid-State Circuits*, vol. 42, no. 10, pp. 2235-2244, Oct. 2007. - 13. A. Tajalli, E. Vittoz, Y. Leblebici, and E. J. Brauer, "Ultra low power subthreshold MOS current mode logic circuits using a novel load device concept," in Proc. of Eur. Solid-State Cir. Conf. (ESSCIRC), Munich, Germany, Sep. 2007, pp. 281-284. - 14. A. Tajalli, E. J. Brauer, Y. Leblebici, and E. Vittoz, "Sub-threshold source-coupled logic circuit design for ultra low power applications," to appear in *IEEE J. of Solid-State Circuits*, vol. 43, no. 7, Jul. 2008. This article was processed using the LATEX macro package with LLNCS style