# Minimum Energy CMOS Design with Dual Subthreshold Supply and Multiple Logic-Level Gates

Kyungseok Kim and Vishwani D. Agrawal Department of ECE, Auburn University, Auburn, AL 36849, USA kyungkim@auburn.edu, vagrawal@eng.auburn.edu

Abstract—This paper presents a method for minimum energy digital CMOS circuit design using dual subthreshold supply voltages. Stringent energy budget and moderate speed requirements of some ultra low power systems may not be best satisfied just by scaling a single supply voltage. Optimized circuits with dual supply voltages provide an opportunity to resolve these demands. The delay penalty of a traditional level converter is unacceptably high when the voltages are in the subthreshold range. In the present work level converters are not used and special multiple logic-level gates are used only when, after accounting for their cost, they offer advantage. Starting from a lowest per cycle energy design whose single supply voltage is in the subthreshold range, a new mixed integer linear program (MILP) finds a second lower supply voltage optimally assigned to gates with time slack. The MILP accounts for the energy and delay characteristics of logic gates interfacing two different signal levels. New types of linearized AND and OR constraints are used in this MILP. We show energy saving up to 24.5% over the best available designs of ISCAS'85 benchmark circuits.

**Keywords**— Ultra-low power design, Subthreshold circuits, Dual voltage design, Mixed integer linear program.

#### I. Introduction

Subthreshold circuits offer a promising solution for implementing highly energy-constrained systems for remote or mobile applications. When we scale the power supply voltage  $(V_{dd})$  below the device threshold voltage  $(V_{th})$ , the subthreshold current ever so slowly charges and discharges nodes for the circuit's logic function [20]. The weak driving current limits the performance but minimum energy operation of the circuit is achieved with reduced dynamic and leakage power resulting in long battery life [11].

A successful subthreshold design is possible in clock ranges of low to medium frequencies for biomedical and micro-sensor network applications [8], [16], [21]. Ultra dynamic voltage scaling (UDVS) [3] can provide more opportunity to spread subthreshold circuit design in various applications by switching between a nominal voltage high performance mode and an energy efficient subthreshold mode according to the system workload. Without the performance requirement, a subthreshold circuit can operate at its minimum energy  $(E_{min})$  operating point that is somewhat above the absolute minimum voltage  $(V_{min})$  [22]

that would guarantee the correct logic function. Some applications that require moderate speed may not aggressively scale the supply voltage down to the minimum energy point to maintain the performance. Near-threshold operating circuit design is another choice to cover a wider range of system performances for applications with tolerable energy increase ( $\sim$ 2X) from  $E_{min}$  by scaling  $V_{dd}$  to near  $V_{th}$  [5]. Technology down-scaling improves the speed of a subthreshold circuit, but greater variability may adversely affect  $E_{min}$  for extremely small feature size [2].

Utilizing the time slack for dual  $V_{dd}$  is a well-known technique for a circuit operating with nominal  $V_{dd}$  for reducing the power consumption with small extra cost in physical design [18], [19]. However, operation in the subthreshold voltage region has been long predicted and since verified [20]. Most previous works in subthreshold circuit design only used a single supply voltage scaled down to reduce the energy consumption without considering the time slack. The authors of [10] derived a MILP algorithm to minimize the energy consumption of a subthreshold logic circuit using dual  $V_{dd}$ . Their work limits full use of the time slack by topological constraints considering multiple voltage boundaries without level converters. Thus, the energy saving of dual  $V_{dd}$  design is not achieved as much as expected.

In the present work, we are motivated to exploit full time slack on non-critical paths in a subthreshold circuit using multiple logic-level gates to further reduce  $E_{min}$  at its original speed or alternatively have the circuit operate at a higher speed holding the energy consumption close to  $E_{min}$ . Figure 1 shows the benefit of dual voltage design for a 32-bit ripple carry adder in 90nm CMOS technology operating in the subthreshold regime. Energy per cycle for the optimized dual voltage design  $(E_{dual})$  is reduced  $\sim 0.67$ X from  $E_{min}$  that is obtained by scaling down a single supply voltage to its minimum energy operating point at  $V_{dd}$ =0.31V. This 32-bit ripple carry adder can also operate  $\sim$ 7X faster with same energy as  $E_{min}$  in another dual voltage design using  $V_{dd}$ =0.45V. Finding an optimal lower supply voltage  $(V_{DDL})$  for a given higher supply voltage  $(V_{DDH})$  and its assignments is the main problem in dual voltage design. We formulate a mixed integer linear program (MILP) to solve this problem with multiple logic-level gates considering multiple voltage boundaries.

The paper is organized as follows. Section II introduces the dual voltage design from the literature and considers the cost of level converting in subthreshold regime. In Section III, we present new mixed integer linear program



Fig. 1. Energy and speed benefits of dual  $V_{dd}$  design in subthreshold voltage operation for a 32-bit ripple carry adder in PTM 90nm CMOS (activity factor  $\alpha = 0.17$ , number of gates = 352).

(MILP) models for dual voltage design with multiple logic-level gates. Section IV reports SPICE simulation results to validate the MILP solution. Finally, conclusion and future work are given in Section V.

## II. Dual Voltage Design and Level Converters in Subthreshold Regime

In a dual voltage design, assigning lower supply voltage  $(V_{DDL})$  only to gates on non-critical paths reduces both dynamic and static leakage power of the circuit. Higher supply voltage  $(V_{DDH}=V_{dd})$  is assigned to gates on critical paths to maintain the overall circuit performance. By utilizing the time slack, we ensure that there is no performance loss. But, an asynchronous level converter (ALC) is considered essential to suppress DC leakage current and guarantee the correct switching of a  $V_{DDH}$  gate driven by a low voltage input signal. Level converting cost, however, reduces the power saving of the dual  $V_{dd}$  scheme.

Clustered voltage scaling (CVS) [18] and extended clustered voltage scaling (ECVS) [19] algorithms are two main heuristic methods of assigning dual supply voltages to gates in a circuit. CVS assigns  $V_{DDL}$  to gates with positive time slack starting from primary outputs to primary inputs and so dose not allow the  $V_{DDL}$  gates to feed directly into  $V_{DDH}$  gates by grouping gates into  $V_{DDH}$  and  $V_{DDL}$ clusters.  $V_{DDH}$  cluster is always located upstream as signals flow. This topological constraint reduces the potential power saving from full use of the time slack that exists inside a circuit. Asynchronous level converters are not needed inside a combinational circuit block, but the level converting flip-flops (LCFF) are needed in sequential elements. No overheads of power and delay from ALCs exist in CVS. For removing the topological constraint in CVS, ECVS inserts an ALC at a point, where a  $V_{DDL}$  gate drives a  $V_{DDH}$  gate, to assign  $V_{DDL}$  to more gates with time slack. This gives more power saving than CVS.

We apply the dual voltage technique to subthreshold



(a) Differential cascode voltage switched (DCVS) level converter.



(b) Pass gate (PG) level converter.

Fig. 2. Two traditional level converter schematics [13].

supply combinational circuits. To maximize energy saving from the time slack, a level converter is still considered essential. In Figure 2, two traditional ALCs, a differential cascode voltage switched (DCVS) level converter and a pass gate (PG) level converter, are shown. The PG level converter consumes less energy than the DCVS level converter due to fewer devices in it and reduced contention [13]. Compared to the delay of a circuit operating with nominal  $V_{dd}$ , the delay of a subthreshold circuit increases exponentially as supply voltage  $V_{dd}$  reduces [20]. This means that the time slack is consumed quickly by assigning  $V_{DDL}$ , quite close to  $V_{DDH}$ , to gates on non-critical paths. With such delay characteristic, the delay overhead of the ALC is more critical for implementing a dual  $V_{dd}$  design in the subthreshold regime.

We use the HSPICE simulator [7] to size properly for reducing the delay of two ALCs in subthreshold region. Predictive Technology Model (PTM) for 90 nm CMOS [23] was used in the simulations. Table I shows the delay penalty of the two optimized ALCs in a range of  $28\sim 60\times$  INV(FO4) delay, where INV(FO4) is the delay of a standard inverter with fanout of four. The normal ALC delay is considered as  $2\times$  INV(FO4) delay [4] for a nominal supply voltage. A low voltage microprocessor has  $\sim 400\times$  INV(FO4) delay for a single pipeline stage. The microprocessor operating in subthreshold region would prefer shallow pipeline to mitigate variability and a  $40\times$  INV(FO4) delay is considered as a typical design case [17]. To reduce the delay penalty of level converting, we need to investigate alternative ap-

TABLE I

Delays of two optimal sized ALCs with a single INV load at  $V_{DDL}=230mV$  and  $V_{DDH}=300mV$  in PTM 90nm CMOS.

| ALCs | Delay    | Norm. to INV(FO4) |
|------|----------|-------------------|
| DCVS | 79.1 ns  | 60.4              |
| PG   | 37.6  ns | 28.7              |



Fig. 3. Multiple logic-level NAND2 gate [4].

proaches to remove ALCs without topological constraints in the dual  $V_{dd}$  design.

As discussed in the literature, two types of logic gate designs have the capability to handle multiple logic levels. Among these the embedded logic level converting circuit [13] may not be a good choice because the previous ALC structures when integrated in logic gates will not reduce the overall delay penalty. A level-shifter free design using dual  $V_{th}$  [4] places high  $V_{th}$  devices in the pull-up PMOS network of a logic gate to suppress DC static leakage with low input signals as shown in Figure 3. This causes the rise time of the gate to increase, thus the overall level shifting logic gate delay is larger than that of a normal gate (PMOS  $V_{th}$ =0.21). As shown in Table II, the delay penalty of these multiple logic-level gates is much less than that of standard ALCs in the subthreshold region. Within some range of low input voltages close to  $V_{dd}$ , a multiple logic-level INV consumes less leakage power than a standard INV, which increases as the low input voltage goes down in Figure 4. Considering the delay and power overheads, we are compelled to use the multiple logic-level gates instead of ALCs in our dual voltage design.

# III. MILP for Dual Voltage Design with Multiple Logic-Level Gates

In this section, we design minimum energy circuits with dual  $V_{dd}$  assignments without ALCs using mixed integer linear programing (MILP) [6]. Multiple logic-level logic gates eliminate the use of ALCs and allow  $V_{DDL}$  gates to drive  $V_{DDH}$  gates with affordable overheads in terms of delay and leakage power in a combinational circuit. First, the performance requirement (critical path delay  $T_c$ ) of a system is given. Therefore,  $V_{DDH}$  is determined to satisfy

#### TABLE II

Multiple logic-level gate delays with a single INV load at  $V_{DDL}=230mV$  and  $V_{DDH}=300mV$  in PTM 90nm CMOS (High PMOS  $V_{th}=0.29V$ ).

| Multiple logic-level gates | Delay Norm. to INV(FO4) |  |  |  |  |  |
|----------------------------|-------------------------|--|--|--|--|--|
| INV                        | 1.3                     |  |  |  |  |  |
| NAND2                      | 2.3                     |  |  |  |  |  |
| NAND3                      | 3.1                     |  |  |  |  |  |
| NOR2                       | 3.9                     |  |  |  |  |  |



Fig. 4. Multiple logic-level gate leakage power normalized to a standard INV  $(V_{dd}=V_{in}=300mV)$  in PTM 90nm CMOS.

the system speed (or clock cycle time). The MILP automatically assigns the predetermined  $V_{DDH}$  to gates on critical paths to maintain the performance and finds optimal  $V_{DDL}$  for gates on non-critical paths to reduce the total energy consumption (i.e., minimum energy per cycle) by a global optimization. Inherently, CVS and ECVS are heuristic algorithms that tend to be non-optimal, because of the backward traversal from primary outputs through gates with time slack for assigning lower supply voltage  $V_{DDL}$ .

Assuming that gates become active once per clock cycle, the total energy per cycle  $(E_{tot})$  is given by following equations [20]:

$$E_{dyn} = \alpha_{0 \to 1} \cdot C_{load} \cdot V_{dd}^{2}$$

$$= C_{sw} \cdot V_{dd}^{2}$$

$$E_{leak} = I_{off} \cdot V_{dd} \cdot T_{c}$$

$$= P_{leak} \cdot T_{c}$$

$$E_{tot} = E_{dyn} + E_{leak}$$

$$= C_{sw} \cdot V_{dd}^{2} + P_{leak} \cdot T_{c}$$

$$(1)$$

where  $\alpha_{0\to 1}$  is the low to high transition activity for the gate output node and  $C_{load}$  is the load capacitance of the gate. In (1), dynamic energy  $(E_{dyn})$  quadratically depends on scaling the power supply voltage  $V_{dd}$  with the total switched capacitance  $C_{sw}$  of a circuit, while the leakage en-

ergy  $(E_{leak})$  is linearly proportional to leakage power  $P_{leak}$  during a clock cycle.

Before we formulate the MILP model of the optimal minimum energy  $V_{DDL}$  assignment, all variables and constant parameters in the MILP model are presented here:

- $V_v$ : supply voltage integer variable that is 1 for two selected  $V_{DDH}$  and  $V_{DDL}$  in a span of scaling supply voltage v.
- $X_{i,v}$ : voltage assignment integer variable that is 1 for gate i with supply voltage v.
- $F_{i,v}$ : fan-in integer variable that is 1 for gate i having at least one fan-in gate that is powered by supply voltage v.
- $P_{i,v}$ : penalty integer variable that is 1 when gate i driven by low input voltage v.
- $T_i$ : latest arrival time variable at gate i output from primary input events.
- $\alpha_i$ : low to high transition activity of gate i.
- $V_{dd,v}$ : supply voltage value of v.
- $C_{i,v}$ : load capacitance of gate i with supply voltage v.
- $P_{leak,i,v}$ : leakage power of gate i with supply voltage v.
- $P_{leako,i,v}$ : leakage power overhead of multiple logic-level gate i driven by low input voltage v.
- $td_{i,v}$ : gate delay of gate i with supply voltage v.
- $tdo_{i,v}$ : gate delay overhead of multiple logic-level gate i driven by low input voltage v.
- $N_i$ : number of inputs for gate i.
- $T_c$ : critical path delay of a circuit.
- $G_{tot}$ : total number of gates in a circuit.
- $V_{nom}$ : nominal supply voltage value (1.2V) for 90nm CMOS

The optimal  $V_{DDL}$  assignment for the minimum energy design is modeled by MILP equations:

Minimize 
$$\left[\sum_{i} \sum_{v \in V} \left(\alpha_{i} \cdot C_{i,v} \cdot V_{dd,v}^{2} + P_{leak,i,v} \cdot T_{c}\right) \cdot X_{i,v} + \sum_{i} \sum_{v \in V_{L}} P_{leako,i,v} \cdot T_{c} \cdot P_{i,v}\right], \quad \forall i \in \text{all gates}$$
$$V_{min} \leq V \leq V_{DDH}, \quad V_{low} \leq V_{L} < V_{DDH}$$
(2)

where  $V_{min}$  is the minimum operating voltage for the correct logic function of a gate with subthreshold supply voltage and  $V_{low}$  is the lowest input voltage to keep 10% to 90% output voltage swing for a logic gate when  $V_{DDH}$  is predetermined. The timing constraints are [14]:

$$T_{i} \geq T_{j} + \sum_{v \in V} t d_{i,v} \cdot X_{i,v} + \sum_{v \in V_{L}} t do_{i,v} \cdot P_{i,v}$$

$$\forall i \in \text{all gates, } \forall j \in \text{all fanin gates of gate } i$$
(3)

$$T_i \leq T_c$$
  $\forall i \in \text{all primary output gates}$ 

Penalty condition:

$$\sum_{j} X_{j,v} \leq N_i \cdot F_{i,v} \qquad \forall j \in \text{all fanin gates of gate } i$$

$$\sum_{j} X_{j,v} \geq N_i \cdot F_{i,v} - (N_i - 1) \quad \forall i \in \text{all gates, } \forall v \in V_L$$
(5)

$$F_{i,v} + X_{i,V_{DDH}} \ge 2 \cdot P_{i,v}$$
  $\forall i \in \text{all gates}$   
 $F_{i,v} + X_{i,V_{DDH}} \le 2 \cdot P_{i,v} + 1$   $\forall v \in V_L$  (6)

$$\sum_{v \in V} V_{dd,v} \cdot X_{i,v} \le \sum_{v \in V} V_{dd,v} \cdot X_{j,v} + \sum_{v \in V_L} V_{nom} \cdot P_{i,v}$$
(7)

 $\forall j \in \text{all fanin gates of gate } i$ 

Dual supply voltages selection:

$$\sum_{v \in V} V_v = 2 \tag{8}$$

$$V_{V_{DDH}} = 1 (9)$$

$$\sum_{v \in V} X_{i,v} = 1 \qquad \forall i \in \text{all gates}$$
 (10)

$$\sum_{i} X_{i,v} \le G_{tot} \cdot V_v \quad \forall i \in \text{all gates}, \ \forall v \in V$$
 (11)

As mentioned before,  $T_c$  is given by the performance requirement. Therefore,  $V_{DDH}$  is selected from (9) in scaling supply voltage span. In dual power supply constraints, MILP only chooses two supply voltages, given  $V_{DDH}$  and optimal  $V_{DDL}$ , then each gate in the circuit must be assigned to one of them from (11); we use a bin-packing technique [1]. Penalty condition tests the existence of a  $V_{DDH}$  gate driven by at least one  $V_{DDL}$  fan-in gate from (5) (Boolean Or) and (6) (Boolean AND). The nonlinear Boolean functions are expressed as linear constraints. When penalty exists,  $P_{i,VDDL}$  becomes 1 and (7) allows low voltage inputs to drive a  $V_{DDH}$  gate by replacing it with a multiple logic-level gate. During assigning  $V_{DDL}$ to the time slack gate, MILP checks the timing violation against clock time using (3) and (4) timing constraints. Cost function (2) favorably balances both delay and leakage penalties of the multiple logic-level gates.

### IV. Results

All simulation results are from SPICE using PTM 90nm CMOS at room temperature (300K). The CMOS device threshold voltages are  $V_{th,pmos} = 0.21 \text{V}$  and  $V_{th,nmos} = 0.29 \text{V}$  at nominal  $V_{dd} = 1.2 \text{V}$ . For simplicity, we use only four types of basic standard cells, namely, INV, NAND2, NAND3, and NOR2, to synthesize ISCAS'85 benchmark circuits. Therefore, only four types of multiple logic-level gates are used with high PMOS threshold voltage assigned to the pull-up PMOS network of basic cells. High PMOS threshold voltage ( $V_{th,pmos} = 0.29$ ) is selected.

We assume that randomly generated input signals with high input voltage  $V_{DDH}$  drive all primary inputs of the circuit. Two subthreshold supply voltages,  $V_{DDH}$  and  $V_{DDL}$ ,



(a) Single  $V_{dd}$  design at  $V_{dd}=0.24$ V.



(b) Dual  $V_{dd}$  design at  $V_{DDH}$ =0.24V and  $V_{DDL}$ =0.19V.

Fig. 5. Gate slack distribution (number of gates vs. slack) for minimum energy per cycle c880; slacks obtained by static timing analysis using gate delays for PTM 90nm CMOS.

can be provided by a voltage scalable DC to DC converter [15]. We also assume that combinational benchmark circuits have no restriction for primary output voltage level either of  $V_{DDH}$  or  $V_{DDL}$ . In reality, level shifting flip-flops (LCFF) [18] can be placed at low voltage primary outputs as the sequential elements of the design.

MILP algorithm of Section III is applied to find the optimal  $V_{DDL}$  for the benchmark circuits with given performance (i.e.,  $V_{DDH}$ ) in subthreshold region. Table III shows SPICE simulation results for single  $V_{dd}$  total energy per cycle as a reference and dual  $V_{dd}$  optimized energy per cycle with the optimal  $V_{DDL}$  selection. Activity  $\alpha$  is the average number of low to high transitions at circuit nodes and  $V_{DDL}$  is the optimal low voltage supply corresponding to  $V_{DDH}$ . Multiple logic-level gates were not required for c432, c499 and c1355, and therefore, there were no  $V_{DDH}$  gates driven by  $V_{DDL}$  gates in optimized circuits; they were

same as [10]. From (7), MILP algorithm automatically determines whether or not a multiple logic-level gate is to be used based upon the benefit of energy saving. The design of c3540 shows that energy saving of the dual  $V_{dd}$  circuit is improved 15.7% more than [10]. Multiple logic-level gates remove topological constraints and allow  $V_{DDL}$  gates to drive  $V_{DDH}$  gates. Thus, MILP can assign  $V_{DDL}$  to more gates on non-critical paths and further increase energy saving as expected. For the dual  $V_{dd}$  design with multiple logic-level gates, the best case is about 24.5% energy reduction for c880 (8-bit ALU). Another circuit, c6288 (a 16×16 multiplier), has only 3.8% reduction. There is little benefit of dual  $V_{dd}$  design for c432, c499, and c1355, where most of paths are balanced. The optimized circuits show energy saving of 14.0% on an average, even it includes the energy savings of path balanced circuits. Figure 5 shows the gate slack distributions obtained from static timing analysis [9] of the single  $V_{dd}$  and dual  $V_{dd}$  designs of c880. Clearly, it is the large number of gates with large slack in the single  $V_{dd}$  design that allows many low  $V_{dd}$  assignments.

The energy saving from dual voltage design depends on the time slacks of gates. In subthreshold region it is also affected by the number of  $V_{DDL}$  gates driven by  $V_{DDH}$  gates. Leakage current of PMOS devices in a  $V_{DDL}$  gate is suppressed by high voltage input signal from a  $V_{DDH}$  gate, because the source to gate voltage,  $V_{sg}$ , in PMOS devices is negative. The leakage energy is comparable to dynamic energy in subthreshold region. This leakage reduction is another benefit of dual voltage design for low voltage circuits. The dual voltage technique for a nominal voltage circuit is mainly applied for dynamic power saving, while leakage power saving is considered negligible [12].

# V. Conclusion and Future Work

This paper presents dual voltage design in the subthreshold regime. Level converters are eliminated and special multiple logic-level gates are used instead. This approach is particularly beneficial for subthreshold voltage operation. A new MILP is devised to find an optimal low supply voltage below a given subthreshold supply voltage. The given supply voltage is chosen for the minimum energy per cycle for any single voltage. When paired with the lower voltage from the MILP, the energy is further reduced. The MILP optimally selects the boundaries between the supply voltage domains to position multiple logic-level gates. With this MILP, ISCAS'85 benchmark circuits could save up to 24.5% energy per cycle. Notably, the energy per cycle for these designs is always less than the absolute minimum energy point for the circuit for single voltage operation. Alternatively, the MILP can trade energy reduction for speed increase without letting the energy rise. For large circuits, the MILP may suffer from an unacceptably long run-time as the optimization algorithm for dual  $V_{dd}$  design has exponential-time complexity. Gate slack analysis [9] provides an opportunity to reduce the time complexity to

TABLE III

Total energy per cycle with optimal  $V_{DDL}$  for given  $V_{DDH}$  and performance of ISCAS'85 benchmark circuits and 32-bit ripple carry adder.

| Benchmark  | Total | Activity | $V_{DDH}$ | $V_{DDL}$ | $V_{DDL}$ | Multiple logic- | $E_{single}$ | $E_{dual}$ | Reduc. | Reduc.[10] | Freq. |
|------------|-------|----------|-----------|-----------|-----------|-----------------|--------------|------------|--------|------------|-------|
| Circuit    | gates | $\alpha$ | (V)       | (V)       | gates (%) | level gates     | (fJ)         | (fJ)       | (%)    | (%)        | (MHz) |
| c432       | 154   | 0.19     | 0.25      | 0.23      | 5.2       | 0               | 7.9          | 7.8        | 1.1    | 1.1        | 14.4  |
| c499       | 493   | 0.21     | 0.22      | 0.18      | 9.7       | 0               | 20.2         | 19.8       | 2.0    | 2.0        | 11.9  |
| c880       | 360   | 0.18     | 0.24      | 0.19      | 56.7      | 23              | 14.4         | 10.9       | 24.5   | 22.2       | 13.6  |
| c1355      | 469   | 0.21     | 0.21      | 0.18      | 10.2      | 0               | 19.5         | 19.0       | 2.5    | 2.5        | 9.8   |
| c1908      | 584   | 0.20     | 0.24      | 0.21      | 27.6      | 71              | 26.5         | 23.2       | 12.4   | 5.8        | 11.8  |
| c2670      | 901   | 0.16     | 0.25      | 0.19      | 40.2      | 41              | 32.8         | 26.9       | 18.1   | 14.8       | 17.4  |
| c3540      | 1270  | 0.33     | 0.23      | 0.16      | 40.8      | 69              | 88.0         | 70.8       | 19.5   | 3.8        | 7.2   |
| c5315      | 2077  | 0.26     | 0.24      | 0.19      | 60.5      | 62              | 116.8        | 92.2       | 21.1   | 16.1       | 9.8   |
| c6288      | 2407  | 0.28     | 0.29      | 0.19      | 4.7       | 20              | 165.4        | 159.1      | 3.8    | 2.1        | 9.4   |
| c7552      | 2823  | 0.20     | 0.25      | 0.21      | 51.6      | 201             | 131.7        | 112.1      | 14.9   | 11.1       | 13.6  |
| 32-bit RCA | 352   | 0.17     | 0.31      | 0.18      | 52.3      | 11              | 21.2         | 14.1       | 33.5   | 31.3       | 16.7  |
| Average    |       |          |           |           | 32.7      |                 |              |            | 14.0   | 10.2       |       |

linear instead of the exponential-time MILP for assigning  $V_{DDL}$  and  $V_{DDL}$  to gates. We plan to develop a time-efficient MILP algorithm for dual  $V_{dd}$  design. A subthreshold circuit is susceptible to process variation, which affects the delay of gates. For validating dual voltage design in the subthreshold regime, we are investigating that aspect in our ongoing research.

#### References

- M. Anis, S. Areibi, M. Mahmoud, and M. Elmasry, "Dynamic and Leakage Power Reduction in MTCMOS Circuits using an Automated Efficient Gate Clustering Technique," in *Proc. 39th* Design Automation Conf., 2002, pp. 480 – 485.
- [2] D. Bol, D. Kamel, D. Flandre, and J. Legat, "Nanometer MOS-FET Effects on the Minimum-Energy Point of 45nm Subthreshold Logic," in *Proc. 14th International Symp. Low Power Electronics and Design*, ACM, 2009, pp. 3–8.
- [3] B. H. Calhoun and A. P. Chandrakasan, "Ultra-Dynamic Voltage Scaling (UDVS) Using Sub-Threshold Operation and Local Voltage Dithering," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 1, pp. 238–245, 2006.
- [4] A. U. Diril, Y. S. Dhillon, A. Chatterjee, and A. D. Singh, "Level-Shifter Free Design of Low Power Dual Supply Voltage CMOS Circuits Using Dual Threshold Voltages," *IEEE Trans.* on VLSI Systems, vol. 13, no. 9, pp. 1103 – 1107, Sept. 2005.
- [5] R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, "Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits," *Proc. IEEE*, vol. 98, no. 2, pp. 253–266, Feb. 2010.
- [6] R. Fourer, D. M. Gay, and B. W. Kernighan, AMPL: A Mathematical Programming Language. Brooks/Cole-Thomson Learning, 2003.
- [7] http://www.synopsys.com. HSPICE User Guide: Simulation and Analysis.
- [8] C. H. I. Kim, H. Soeleman, and K. Roy, "Ultra-Low-Power DLMS Adaptive Filter for Hearing Aid Applications," *IEEE Trans. on VLSI Systems*, vol. 11, no. 6, pp. 1058–1067, 2003.
- [9] K. Kim and V. D. Agrawal, "Dual Voltage Design for Minimum Energy Using Gate Slack," in *IEEE International Conference* on Industrial Technology & 43rd Southeastern Symposium on System Theory, Mar. 2011.
- [10] K. Kim and V. D. Agrawal, "True Minimum Energy Design Using Dual Below-Threshold Supply Voltages," in *Proceedings* of 24th International Conference on VLSI Design, Jan. 2011.

- [11] M. Kulkarni and V. D. Agrawal, "A Tutorial on Battery Simulation Matching Power Source to Electronic System," in Proc. 14th IEEE VLSI Design and Test Symp., July 2010.
- [12] S. H. Kulkarni, A. N. Srivastava, and D. Sylvester, "A New Algorithm for Improved VDD Assignment in Low Power Dual VDD Systems," in *Proc. International Symp. Low Power Electronics and Design*, IEEE, 2004, pp. 200–205.
- [13] S. H. Kulkarni and D. Sylvester, "High Performance Level Conversion for Dual  $V_{DD}$  Design," *IEEE Trans VLSI Systems*, vol. 12, no. 9, pp. 926–936, 2004.
- [14] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Minimum Dynamic Power CMOS Circuit Design by a Reduced Constraint Set Linear Program," in Proc. 16th International Conf. VLSI Design, Jan. 2003, pp. 527–532.
- [15] Y. K. Ramadass and A. P. Chandrakasan, "Voltage Scalable Switched Capacitor DC-DC Converter for Ultra-Low-Power On-Chip Applications," in *Proc. Power Electronics Specialists Con*ference, 2007, pp. 2353–2359.
- [16] M. Seok, S. Hanson, Y. S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw, "The Phoenix Processor: a 30pW Platform for Sensor Applications," in *Proc. IEEE Symposium* on VLSI Circuits, 2008, pp. 188–189.
- [17] M. Seok, D. Sylvester, and D. Blaauw, "Optimal Technology Selection for Minimizing Energy and Variability in Low Voltage Applications," in Proc. 13th International Symp. on Low Power Electronics and Design, ACM, 2008, pp. 9–14.
- [18] K. Usami and M. Horowitz, "Clustered Voltage Scaling Technique for Low-Power Design," in Proc. International Symp. on Low Power Design, 1995, pp. 3–8.
- [19] K. Usami, M. Igarashi, F. Minami, T. Ishikawa, M. Kanzawa, M. Ichida, and K. Nogami, "Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to a Media Processor," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 3, pp. 463–472, 1998.
- [20] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-Threshold Design for Ultra Low-Power Systems. Springer, 2006.
- [21] A. Wang and A. Chandrakasan, "A 180mV FFT Processor Using Subthreshold Circuit Techniques," in *IEEE International Solid-*State Circuits Conf. Digest of Technical Papers, 2004, pp. 292– 529.
- [22] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, "Theoretical and Practical Limits of Dynamic Voltage Scaling," in *Proc.* 41st Design Automation Conf., 2004, pp. 868–873.
- [23] W. Zhao and Y. Cao, "New Generation of Predictive Technology Model for Sub-45 nm Early Design Exploration," *IEEE Trans. Electron Devices*, vol. 53, no. 11, pp. 2816–2823, Nov. 2006.