7.1

# Approaches to Run-time and Standby Mode Leakage Reduction in Global Buses

Rahul Rao, Kanak Agarwal, Dennis Sylvester, Richard Brown, Kevin Nowka\* and Sani Nassif\*

Dept. of EECS, University of Michigan, Ann Arbor, MI, US 48109

\*Austin Research Laboratories, IBM, Austin, TX, US 78758

{rmrao,agarwalk,dennis,brown@eecs.umich.edu}

# \*{nowka,nassif@us.ibm.com}

## ABSTRACT

In this paper, we present various design approaches to leakage minimization in global repeaters. We demonstrate the applicability of the MTCMOS scheme to global repeaters for leakage reduction. We then analyze two design approaches called *Duplicated Skewed Buses* and *Skewed Pulsed Buses*. We show that significant reduction in standby leakage power can be obtained using these approaches while providing significant improvements in performance. We also illustrate the use of these proposed techniques with the MTCMOS approach to obtain further savings in leakage power. Simulations results in a 90nm process show that skewed pulsed buses with MTCMOS can provide 20% improvement in performance with over 25% reduction in active mode leakage and nearly 100X reduction in standby mode leakage.

## **Categories and Subject Descriptors**

B.7.1 [Hardware]: Types and Design Styles.

## **General Terms**

Performance and Design.

## **Keywords**

Repeaters, leakage, pulsed buses, MTCMOS.

## **1. INTRODUCTION**

The control of device leakage currents while maintaining adequate performance and reliability has emerged as a critical challenge to the continued scaling of CMOS devices [1]. With every new technology generation, device dimensions and threshold voltages are scaled to enable higher complexity and better performance. However, such scaling has resulted in an large increase in leakage currents. Leakage power, which was less than 5% of the total power consumption a few years ago, has grown exponentially to become a significant fraction (over 30%) of the total chip power consumption for both low-power and high-performance designs in current nanometer technologies [1].

Technology scaling has also resulted in an era of interconnect dominated designs. In recent years, the number and length of global signal lines necessary to communicate between different modules in a design have increased significantly.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

*ISLPED'04*, August 9–11, 2004, Newport Beach, California, USA. Copyright 2004 ACM 1-58113-929-2/04/0008...\$5.00.

This trend has resulted in an increased number of repeaters required to optimally buffer these global signals and reduce their interconnect delay [2-4]. Optimally sized repeaters are quite large and can burn significant leakage power, especially if they remain in an idle state for a significant percentage of time. In some designs, repeaters contribute nearly 50% of the total chip leakage power [5]. The exponential increase in leakage currents coupled with the increasing number of repeaters makes them a critical bottlenecks towards meeting the power budget of the system.

There have been some attempts to address the problem of increasing power consumption in repeaters, but most of these have focused primarily on dynamic or switching power [6-8]. In [9], the authors emphasized the need to include leakage power in the repeater insertion methodology for global interconnects and presented an optimization framework for the same. However, they do not present any technique for leakage power reduction in global buses. Though there has been a significant amount of work in the analysis and reduction of leakage currents for logic blocks [10-21], most of these techniques exploit the stacking effect and the presence of noncritical paths in the design to minimize leakage power. However, these techniques are inapplicable to repeaters due to the inherent lack of transistor stacks in global repeaters. Also, the performance penalty associated with these leakage reduction techniques may not always be acceptable for repeaters in interconnect delay dominated designs. Hence, there is a need to look at approaches to both runtime and standby leakage reduction in repeaters that can be achieved with negligible effect on interconnect delay.

In this paper, we investigate the applicability of MTCMOS to global repeaters. We propose various design approaches that enable considerable reduction in leakage power while providing significant improvements in performance. We combine the proposed techniques with MTCMOS to obtain further savings in leakage power. We show that the proposed configuration provides increased leakage savings over MTCMOS at much improved performance.

The rest of the paper is organized as follows. In Section 2, we discuss various leakage reduction techniques in global repeaters. This section also contains a discussion of the proposed new signaling approaches. The experimental results and the conclusions are discussed in Sections 3 and 4 respectively.

## 2. LOW LEAKAGE REPEATER DESIGNS

In this section, we discuss various design approaches for leakage reduction in global repeaters. A simple global bus containing repeaters is shown in Fig. 1. The approaches discussed in this paper are aimed at achieving lower leakage power than this configuration without trading off performance.

## 2.1 MTCMOS for Traditional Buses

The MTCMOS scheme exploits the spatial proximity of devices in a design to share the sleep transistor between multiple levels of logic thereby minimizing the associated area penalty. This also reduces the performance degradation due to



Fig. 1. Global Signal line with repeaters (low threshold voltage inverters) placed at optimal distances.

the added sleep device, since the various levels of logic switch at different times, minimizing the IR drop across the sleep device. However, sharing sleep devices across repeaters on a single wire is infeasible due to their physical separation. For example, for the global bus shown in Fig. 1, adjacent repeaters can be significantly far apart. Sharing a sleep transistor device among them would require routing a virtual ground line over a long distance resulting in significant performance degradation and noise issues.

In many cases, repeaters on global buses are buffered regularly at nearly identical distances. This physical proximity between repeaters on different bus lines can be used to share the sleep devices between them as shown in Fig. 2. These sleep devices, which are sized for the peak current drawn thorough them, will need to be larger than those used for normal combinational blocks. This is because all of the repeaters may switch at the same time, increasing the peak current and the noise injected on the virtual ground lines. An increase in the size of the sleep devices will ensure a reduction in these ill-effects but will also result in lower leakage savings in sleep mode.

#### 2.2 Duplicated Skewed Buses

Global lines are usually performance critical and hence require the use of low threshold (LVT) devices in the repeaters. For example, the global line shown in Fig. 1 contains inverters made up of low threshold PMOS and NMOS devices. In this configuration, both NMOS and PMOS are LVT devices, so it is not possible to put this repeater chain into a low leakage state during idle mode since in either state (0 or 1), the subthreshold leakage is primarily through low threshold voltage devices. However, this problem can be overcome by the use of two lines containing skewed inverters [22]. The repeaters are skewed by the use of alternate high threshold (HVT) and low threshold devices in the pull-up and pull-down trees as shown



Fig. 2. MTCMOS applied to Static Buses. The sleep transistor is shared between repeaters on different data lines.

in the Fig. 3. In this case, the upper line is skewed for a rising transition at the input since the first repeater has a LVT NMOS and HVT PMOS device. Similarly, the lower line is skewed for a falling transition as the first repeater has LVT PMOS and HVT NMOS devices. At the far end, the output of the two lines can be fed to decoding logic (an Xor gate). For any transition at the input, the difference in the propagation delay along the two lines generates a rising pulse at the Xor output. The rising edge of this pulse, which is generated by the faster of the two lines, can be used to toggle the state of the receiver latch.

The use of alternating LVT and HVT devices enables us to place both of the lines in their respective low leakage states in standby mode using the simple circuitry at the source of the line as shown in Fig. 3. Here it is assumed that the signal has propagated to the far end of the line before the sleep signal is asserted. To put the repeater into low leakage sleep state the upper line is fed with state 1 and the lower line is fed with state 0. This ensures that sub-threshold leakage is only through HVT devices, resulting in considerable leakage savings in standby mode. The Nand and Nor gates associated with the sleep state logic do not add levels of logic, since they act as inverters in active mode. Further savings in leakage can be obtained by decreasing the sizes of HVT devices. This does not degrade performance because transitions through HVT devices are noncritical from a performance perspective.

In addition to the power savings, this configuration also reduces the worst-case delay through the repeater chain. This performance improvement is primarily due to the reduction in driving strength of the opposing tree for the critical transition. Also, in this configuration, both of the lines always switch in the same direction, though they may get skewed towards the far end of the repeater chain. This causes further improvements in delay because same direction switching of coupled wires results in reduced effective coupling capacitance seen by the



Fig. 3. Duplicated Skewed Buses. Each line is skewed for one direction transition using alternate high and low threshold voltage devices.

lines. Further improvements in performance can be obtained since gate capacitances can be reduced by making the devices in the opposing tree smaller. All of these factors contribute to a significant delay improvement of around 20-25%. This performance improvement can be utilized to either minimize the number of repeaters in the chain or reduce the size of the repeaters. Either of these trade-offs will result in further reduction in standby mode leakage power.

However, a duplication of global lines will exacerbate the routing congestion, in addition to increasing the active device area. The leakage savings will be reduced due to the increase in the number of repeaters (on the duplicated lines). Also, since two lines are switching for each transition at the input, there is a dynamic power overhead associated with this approach. The increase in dynamic power is lower than 2X due to the reduced size of the HVT devices and reduced effective coupling capacitance.

#### 2.3 Skewed Pulse Buses

Some of the problems associated with the duplicated skewed line approach described in the previous section can be overcome by merging the two lines over much of the length of the bus, as illustrated in Fig. 4. In this case, the two skewed lines are run for a short distance that is sufficient to create a reasonable skew between the signals on the lines. The lines are then fed to an Xor gate that creates a pulse, the width of which is determined by the delay difference between the two paths. This is essentially identical to the approach suggested in [22] which focussed on improving interconnect delay and minimizing dynamic power and peak current but did not consider leakage power. Since the output of the Xor gate is always rising, irrespective of the direction of transition at the input, the rest of the repeater chain can be skewed to favor this transition using alternate HVT and LVT devices in the pull-up and pull-down trees. The width of the pulse increases as it progresses down the repeater chain due to the falling transition propagating more slowly than the rising transition. This allows the initial pulse width to be small.

In standby mode, the sleep state logic circuitry places each of the initial two line segments into their respective low leakage states (similar to the case with duplicated skewed lines). This forces the output of the Xor to logic 1 state thereby minimizing the standby leakage power of the repeater chain, since all of the devices are biased such that the leakage is only through HVT devices.

The performance benefits obtained due to the reduced coupling capacitance and strength of the opposing tree are identical to those with duplicated skewed lines, while the routing congestion and the area penalty in terms of the number of repeaters is considerably reduced. The dynamic power is also comparable, since a pulse (two transitions) is sent down the line for each transition at the input. However, the performance benefits can be traded off for leakage savings by suitably resizing the low threshold devices on the critical transition path. This also enables a further reduction in the size of the



Fig. 4. Static Pulse Buses. A pulse is sent down the signal line for each input transition. The path is skewed to favor rising transition at the output of Xor.

high threshold voltage devices and hence both active and standby mode leakages can be reduced.

## 2.4 Noise and Timing Constraints

The leakage savings obtained for each of the two approaches described in the previous sections is significantly dependent on the width of the HVT devices. This is because both sub-threshold and gate leakage reduce with a reduction in the width of the devices. However, scaling of the widths of HVT devices is constrained by the following requirements:

1. *Noise Constraints:* HVT devices cannot be made arbitrarily weak because that results in a significant skew in the repeaters which can drastically degrade the noise margins.

2. Timing Constraints: Though HVT devices are non-critical to timing, these devices should still meet certain timing constraints to ensure correct operation. If the falling transition is arbitrarily slow, then it may interfere with the rising transition of the next clock cycle, causing a loss in the signal. As can be seen from Fig. 5, the width of the pulse increases as the pulse propagates down the repeater chain. If the falling transition is too slow, the pulse width can grow large enough that the rising edge of the pulse from the next clock cycle can collide with the falling edge of the pulse of the current clock cycle at the far end of the line, resulting in a quiet signal at the receiver latch (shown in Fig. 5). Hence, to ensure correct data reception at the receiver latch, it would be necessary to ensure that propagation delay through the slow path is smaller than the clock width plus the propagation delay of the fast path [23]. This can be expressed as

 $T\_slow < T\_fast + T\_clk - T\_skew - T\_setup$ (1) Here T\\_slow is the maximum propagation delay along the repeater chain for the slower transition, T\_fast is the minimum propagation delay along the repeater chain for the faster transition, T\\_skew is the worst case clock skew and T\\_setup is the setup time of the receiver latch. Equation 1 places a lower bound on the width of the devices in the non-critical path and thus limits the savings in leakage that can be obtained. The above constraint should be satisfied at all process corners and needs to account for variations in process, temperature and supply voltage.

## 2.5 Detecting Loss of Signal

If the timing constraint specified by Equation 1 is violated, the falling transition will be lost along the repeater chain. However, this loss of signal in the line can be detected at the



Fig. 5. Timing constraint on slow path for static pulse buses. Arbitrarily slow falling transition results in loss of signal along the repeater chain and a quiet far end of line.

far end of the line by adding a simple circuit as shown in Fig. 6. The multiplexer causes the state of the latch to be toggled if the far end of the line is at a logic high state. In other words, the far end of the line is now a level sensitive circuit, rather than an edge sensitive circuit. The timing constraint on the slow path propagation delay can now be relaxed to be less than twice the clock cycle period [23]. The constraint will ensure that a falling transition of the pulse will arrive at the far end of the line before the end of the next clock cycle. The timing constraint can now be written as

$$T_slow < 2^T_Clk - T_skew - T_setup$$
 (2)

Satisfying this constraint ensures that the far-end of the line will always latch the correct state even when the pulses are lost. Consider the input transition in a particular clock cycle. Under this condition, if the input does not make a transition in the next clock cycle (cycle E and F in Fig. 5), then the falling edge of the pulse from the cycle E will reach the far-end of the line before the end of cycle F. This means that the far-end of the line would be at logic high at the end of cycle E and at logic zero at the end of the cycle F. Thus, the output of the latch will be toggled at cycle F but not at cycle G, correctly capturing the input transitions. However, if the input makes a transition in the two successive clock cycles (cycle C and D), then the falling edge of the pulse of cycle E will be lost in the repeater chain as it would be over-ridden by the faster moving rising edge of the pulse for cycle D. This will cause the far end of the line to remain at logic high state at the end of both clock cycles (as shown in Fig. 5). This will cause the state of the receiver latch to be correctly toggled at the end of the clock cycles C and D to correctly capture the input transition.

The relaxation in the timing constraint on the slow path enables further reduction in the sizes of the HVT devices resulting in additional leakage savings. However, this constraint can cause the receiver to latch into a meta-stable state, since the slower transition arrives just before the rising edge of the clock. The meta-stable condition can be avoided by adding some slack to the timing constraint on the slow path.

#### 2.6 MTCMOS for Skewed Buses

It has been shown that power gating using MTCMOS is a powerful technique for minimizing the standby leakage power consumption [7]. The MTCMOS scheme can be easily applied to skewed buses by placing the sleep transistors on the noncritical trees in repeater chains as shown in Fig. 7. This enables further reduction in leakage at identical performance, since the critical transition remains unaffected. In the sleep state, the sub-threshold leakage of the circuit is determined by the width of the sleep transistors. Hence the devices on the non-critical path can be made narrow regular threshold voltage devices to reduce their gate leakage. The timing constraints specified in the previous sections can now be easily satisfied. The sleep transistor can be shared across repeaters on adjacent buses in a fashion similar to that used for traditional buses (Section 3.2).



Fig. 6. Detecting loss of signal with a level sensitive latch at the far end of the line.



Fig. 7. Using alternate headers and footers with static pulsed buses on non-critical paths.

Alternate headers and footers placed on the non-critical path result, however, in increased gate leakage. In sleep mode, all blocks with headers have their virtual supply lines near ground potential. Similarly, all blocks with footers have their virtual ground lines near supply potential. This is illustrated in Fig. 8. Since a repeater from a header block (repeater X) drives a repeater in the footer block (repeater Y), the devices experience high gate-to-drain and gate-to-source voltage differences resulting in substantial gate leakage currents (shown in Fig. 8 by dotted arrows) that dominate the total leakage in standby mode. With gate leakage becoming more important, the use of alternate headers and footers may become less attractive.

Conventional MTCMOS can also be used with skewed pulse buses as shown in Fig. 9. In this case, footer devices are inserted at all stages of repeaters, ensuring nearly identical voltage on the virtual ground lines in the footer blocks in standby mode. This results in nearly negligible gate leakage since all of the devices in the repeaters now see identical voltages at the gate, drain and source terminals. Also, since the sub-threshold leakage in sleep state is controlled by the footer device, the devices on the non-critical paths can be small regular threshold voltage devices. Suitably sizing the devices in the skewed inverters can ensure that the timing constraints are met at no performance penalty. This is because the size of the footer devices is determined by the worst case current drawn through them which occurs when all of the repeater lines are switching at the same time. However, this is not the critical case of repeater delay for skewed pulse buses, since in such a scenario, the coupling capacitance is zero as all the signal lines are switching in the same direction so the impact of the footer device is minimal. The worst case delay occurs



Fig. 8. Increased gate leakage due to voltage differences between header and footer blocks.



Fig. 9. Minimizing gate leakage by using only footers with static pulsed buses.

when only one line switches, while its neighboring lines are quiet. But in this case, the peak current drawn through the sleep device is small and hence the footer device sized for the peak current transition results in no performance degradation.

#### 3. RESULTS AND ANALYSIS

For our simulation and analysis, we used an advanced industry SOI 90nm process [24]. Simulations were performed at nominal process conditions with a supply voltage of 1.0V and temperature of 85C. The test setup consisted of an 8mm long 8-bit data bus with repeaters inserted at every 0.5mm (i.e., each bit line had 16 repeaters). At the source, we applied a 50ps rise-time signal with a maximum switching frequency of 2GHz. We started with the traditional bus configuration shown in Fig. 1, which consists of all LVT repeaters. These LVT repeaters were sized such that all bits in the bus have a delay D[i] < Dmax. We use the results of this configuration as our baseline results and compare various approaches to this configuration. Table I shows performance, dynamic power in the active mode of operation, leakage power in active and standby modes and total device area for the different repeater configurations. The numbers shown in the table are normalized with respect to the traditional LVT repeater configuration discussed above. The leakage numbers represent total leakage, i.e. the sum of sub-threshold and gate leakage. The line capacitance contributed to nearly 70% of the total load capacitance for this baseline configuration. The results for each configuration are discussed below:

1. *MTCMOS for Traditional Buses:* The MTCMOS configuration discussed in Section 2.1 was optimized to allow for a 5% performance degradation over traditional bus configuration, i.e, worst case delay through any bit line is set at 1.05D. The results show that this MTCMOS configuration results in over 65X reduction in standby leakage with a slight increase in the active mode leakage (due to the sleep devices).

One of the major drawbacks of this configuration remains the 5% performance degradation, in addition to the marginal increase in device area which may limit its applicability in high performance signaling.

2. Duplicated Skewed Bus Design: This configuration (section 2.2) used similar sized LVT devices as the traditional bus configuration while the HVT devices were downsized to satisfy the constraint specified by Equation 1. Such a configuration results in over 20% performance improvement. The performance improvement is due to various reasons discussed in section 2.2 (reduced effective coupling capacitance, reduced gate capacitance, reduction in the strength of opposing tree). This configuration also resulted in around 3.5X standby mode leakage reduction. The drawbacks of this configuration are the 2X increase in switching activity, nearly 50% increase in device area and nearly 2X increase in the routing area. The reduced effective load coupling capacitance (which is the dominant capacitive component), minimizes the switching power penalty to 45%. Due to these holds, this configuration is useful only for the performance critical signals that do not switch very often.

3. Skewed Pulsed Bus Design: The skewed bus consisted of repeaters optimized to match the performance of the duplicated bus design. However, the number of repeaters were reduced as described in Section 2.3. As a result, the device area reduced by 15% as compared to the traditional bus. This combined with the nearly 50% reduction in effective coupling capacitance causes a considerable reduction in the dynamic power penalty associated with this approach. The results show that this configuration (with the timing constraint of Equation 1) is excellent in nearly all metrics of interest. It provides over 20% performance improvement, 20% reduction in active mode leakage and over 6X leakage savings in the standby mode.

4. Skewed Pulsed Bus Design with Signal Loss: Further improvements in power were obtained when the timing constraint for the HVT devices was relaxed to satisfy the constraint specified in equation 2 which enabled a reduction in the size of the HVT devices. The repeaters were then reoptimized to match the performance of the duplicated bus design. The device active area was reduced by nearly 25% as compared to the traditional bus, while the switching activity is also reduced when the input signal switches in consecutive clock cycles, completely cancelling the dynamic power penalty. The reduced device sizes resulted in leakage savings of 25% in active mode and over 7X in the standby mode.

5. Skewed Pulsed Buses with MTCMOS: The best results were obtained when MTCMOS was used with skewed pulsed buses. Footers were used at each stage because the standby mode leakage savings while using alternate headers and footers was lower due to the increased gate leakage current between adjacent blocks. The footers and repeaters were optimized to match the performance of the duplicated bus design, while maximizing the leakage savings. The results showed that this configuration provides 20% improvement in performance with 25% reduction in active mode leakage and 99X reduction in standby mode leakage. The active device area is also reduced

 Table 1. Comparison of delay, dynamic and leakage power for various repeater bus configurations in a 90nm technology with 1V supply, nominal process corner and 85C. The results are normalized to the traditional bus configuration (Fig. 1)

|    |                                   | Delay        | Dynamic | Leakage Power | Leakage Power  | Active      |
|----|-----------------------------------|--------------|---------|---------------|----------------|-------------|
| No | Configuration                     | (Worst Case) | Power   | (Active Mode) | (Standby Mode) | Device Area |
| 1  | Traditional Bus                   | 1.00         | 1.00    | 1.00          | 1.000          | 1.00        |
| 2  | MTCMOS for Traditional Bus        | 1.05         | 1.00    | 1.02          | 0.015          | 1.06        |
| 3  | Duplicated Skewed Bus             | 0.79         | 1.44    | 0.96          | 0.273          | 1.50        |
| 4  | Skewed Pulse Bus                  | 0.78         | 1.08    | 0.80          | 0.151          | 0.86        |
| 5  | Skewed Pulse Bus with Signal Loss | 0.78         | 0.98    | 0.75          | 0.136          | 0.76        |
| 6  | Skewed Pulse Bus with MTCMOS      | 0.79         | 0.82    | 0.73          | 0.010          | 0.76        |



#### Fig. 10. Contributions of sub-threshold and gate leakage for various repeater configurations in active mode (bars on the left) and standby mode (bars on the right)

by nearly 25% in comparison with the traditional bus configuration.

The relative contributions of sub-threshold and gate leakage to the total leakage in active and standby mode for the various bus configurations is shown in Fig. 10. For each configuration, the first bar shows the sub-threshold and gate leakage in active mode, while the second bar represents the two leakage components in standby mode. The total leakage in active mode is clearly dominated by sub-threshold leakage while in the standby mode, gate leakage is the dominant component for most of the bus configurations since sub-threshold leakage is controlled by the HVT devices. Fig. 11 shows the input-output characteristics of the nominal (LVT-LVT) repeater in comparison to that of the skewed repeaters. It can be seen that the switching thresholds for the skewed repeaters are within 100mV of the switching threshold of the nominal repeater and that reasonable noise margins have been maintained in the skewed repeaters.

#### 4. CONCLUSIONS

We describe various design techniques for leakage reduction in the global repeaters that provide improvements in leakage power without trading off performance. In fact, the performance improves significantly compared to the traditional bus design approach for each of these schemes. Our results show that the skewed pulsed buses with MTCMOS can provide nearly 25% reduction in active mode leakage and 99X reduction in standby mode leakage while enabling ~20% improvement in performance. Another advantage of the proposed techniques is that they are simple to implement and have minimal design overhead. The proposed techniques can be very useful as alternative design approaches to performance critical and power hungry repeaters.

#### REFERENCES

- [1] International Technology Roadmap for Semiconductors, http://public.itrs.net/Files/2001ITRS/Home.htm, 2001.
- [2] D. Sylvester and K. Keutzer, "Getting to the bottom of deepsubmicron II: the global wiring paradigm," *ISPD*, pp. 193-200, 1999.
- J. Cong, "Challenges and oppurtunities for design innovations in nanometer technologies," SRC working papers, http://www.src.org/prg\_mgmt/frontier.dgw, 1997.
- [4] V. Adler and E. Friedman, "Repeater design to reduce delay and power in resistive interconnect," *IEEE Trans. On Circuits* and Systems II, vol. 45, pp. 607-618, 1998.
- [5] K. Bernstein, et.al, "Design and CAD challenges in sub-90nm CMOS technologies," *ICCAD*, pp. 129-136, 2003.
- [6] P. Saxena, et.al, "The scaling challenge: can correct-byconstruction design help ?," *ISPD*, pp. 51-58, 2003.



# Fig. 11. Input-Output characteristics for the nominal and skewed inverters illustrating noise margins

- [7] A. Nalamalpu and W. Burleson, "A practical approach to DSM repeater insertion: satisfying delay constraints while minimizing area and power," *14th IEEE Intl. ASIC/SOC Conf.*, pp. 152-156, 2001.
- [8] H. Kaul and D. Sylvester, "Transition-aware global signaling (TAGS)," *ISQED*, pp. 53-59, 2002.
- [9] K. Banerjee and A. Mehrotra, "A power-optimal repeater insertion methodology for global interconnections in nanometer designs," *IEEE Trans. On Electron Devices*, vol. 49, no. 11, pp. 2001-2007, 2002.
- [10] S. Mutoh, et.al, "A 1-V power supply high-speed digital circuit technology with multi-threshold voltage CMOS," *IEEE JSSC*, vol. 30, no. 8, pp. 847-845, 1995.
- [11] Y. Ye, S. Borkar and V. De, "A new techniques for standby leakage reduction in high-performance circuits," *Symp. on VLSI Circuits*, pp. 40-41, 1998.
- [12] D. Duarte, "Evaluating run-time techniques for leakage power reduction," 15th Intl. Conf. on VLSI Design, pp. 31-38, 2002.
- [13] B. Chatterjee, et.al, "Effectiveness and scaling trends of leakage control techniques for sub-130nm CMOS technologies," *Intl. Symp. on Low Power Design*, pp. 122-127, 2003.
- [14] J. Halter and F. Najm, "A gate-level leakage power reduction method for ultra-low-power CMOS circuits," *CICC*, pp. 475-478, 1997.
- [15] M. Johnson, D. Somasekhar and K. Roy, "Leakage control with efficient use of transistor stacks in single threshold CMOS" *Design Automation Conference*, pp. 442-445, 1999.
- [16] T. Kuroda, et.al, "A 0.9V, 150-MHz, 10-MW, 4mm2, 2-D Discrete Cosine Tranform Core Processor with variable threshold-voltage (VT) scheme," *IEEE JSSC*, vol.31, no. 11, pp. 1770-1779, 1996.
- [17] L. Wei, et.al, "Design and optimization of dual-threshold circuits for low-voltage low-power applications, *IEEE Trans.* on VLSI Systems, vol.7, no. 1, pp. 16-24, 1999.
- [18] N. Sirisantana, W. Lei, and K. Roy, "High-performance lowpower CMOS circuits using multiple channel length and multiple oxide thickness," *ICCD*, pp. 227-232, 2000.
- [19] F. Assaderaghi, et.al, "Dynamic threshold-voltage MOS (DTMOS) for ultra-low voltage VLSI," *IEEE Trans. on Electron Devices*, vol. 44, no. 3, pp. 414-422, 1997.
- [20] K. Das, et.al, "New optimal design strategies and analysis of Ultra-Low leakage circuits for nano-scale soi technology," *ISLPED*, pp. 168-171, 2003.
- [21] R. Rao, et.al, "Circuit techniques for gate and sub-threshold leakage minimization in future CMOS technologies," *ESSCIRC*, pp. 313-316, 2003.
- [22] M. Khellal, et.al, "Static pulsed buses for on-chip interconnects," Symp. on VLSI Circuits, pp. 78-79, 2002.
- [23] L. Cotten, AFIPS Proc. Spring Joint Comp. Conf, vol. 34, pp. 581-586, 1969.
- [24] M. Khare, et.al, *IEEE IEDM*, pp. 407-410, 2002.