PAPER Special Section on Analog Circuits and Related SoC Integration Technologies

# A Forward/Reverse Body Bias Generator with Wide Supply-Range down to Threshold Voltage

Norihiro KAMAE<sup>†\*a)</sup>, Student Member, Akira TSUCHIYA<sup>†</sup>, Member, and Hidetoshi ONODERA<sup>†</sup>, Fellow

**SUMMARY** A forward/reverse body bias generator (BBG) which operates under wide supply-range is proposed. Fine-grained body biasing (FGBB) is effective to reduce variability and increase energy efficiency on digital LSIs. Since FGBB requires a number of BBGs to be implemented, simple design is preferred. We propose a BBG with charge pumps for reverse body bias and the BBG operates under wide supply-range from 0.5 V to 1.2 V. Layout of the BBG was designed in a cell-based flow with an AES core and fabricated in a 65 nm CMOS process. Area of the AES core is 0.22 mm<sup>2</sup> and area overhead of the BBG is 2.3%. Demonstration of the AES core shows a successful operation with the supply voltage from 0.5 V to 1.2 V which enables the reduction of power dissipation, for example, of 17% at 400 MHz operation.

key words: body bias generator, dynamic voltage frequency scaling, low supply voltage, analog-assisted digital, switched-capacitor circuits

## 1. Introduction

Performance tuning and variability compensation on digital LSIs is one of emerging technologies.

The variability can be categorized into wafer-to-wafer, die-to-die, within-die location correlated, and within-die random variations. In Ref. [1], within-die location correlated variability is measured on an 80-core processor. They report silicon data on the variation of the maximum operating frequency of each core and it spreads 62% at  $V_{DD} = 0.8$  V.

A concept of a fine-grained body bias (FGBB) is shown in Fig. 1(a) to compensate location correlated variability. We call each grain "substrate island" here. It is assumed that a digital LSI is partitioned into substrate islands and each of them has a body bias generator (BBG) to compensate variability.

Delay reduction and die-to-die variability compensation can be achieved by body biasing. Reference [2] shows that the body bias is an important technique for variability compensation. According to a theoretical study [3], dynamic FGBB technique is expected to increase performance by 7–16% from zero body bias.

In FGBB, precise resolution, small area, and small power overhead of the BBG are required. In the Ref. [4], operating frequency and leakage current are measured under



**Fig.1** (a) fine-grained body bias, (b) conventional body bias generator with additional supply voltage, and (c) the wide supply-range body bias generator.

body bias and it is reported that the resolution of the BBG should be smaller than 100 mV to meet their frequency and leakage constraints.

Desired features for the BBG are summarized below.

- Wide supply-range without requiring additional supply lines: It increases cost to add supply lines for BBG as shown in Fig. 1(b). It is preferred to reduce number of supply lines as shown in Fig. 1(c). The BBG should operate under the same supply line with the digital circuits under control by the BBG. The supply of the digital circuits could take a wide range down to threshold voltage (*V*<sub>th</sub>) of MOSFETs.
- Wide output-range that exceeds the range of supply voltage: For variability compensation, output voltage range of the BBG should be wide enough. For instance, assuming the variation in threshold voltage of 40 mV and the backgate conductance of 1/5, the voltage range of the body bias should be at least 200 mV. It is more preferable to output both forward body bias (FBB) and reverse body bias (RBB). FBB will enhance timing and RBB will reduce leakage current.
- Automated design: A lot of substrate islands will be implemented in one chip and each substrate island has different requirements for its BBG hence design cost of BBGs should be low enough.

In our previous work [5], we achieved wide supplyrange and design automation. In this work, there are three advantages from the previous work. First, to satisfy demand for lower supply voltage, we introduce a constant common mode scheme which further lowers the lowest supply voltage from near threshold voltage of our previous work [5] down to threshold voltage. Second, to output both FBB and RBB, two types of charge pumps are employed. One charge pump is designed to feedback accurate voltage of RBB by synchronizing to DAC conversion phase. The other is designed to acquire enough current to drive large load by uti-

Manuscript received October 19, 2014.

Manuscript revised February 3, 2015.

<sup>&</sup>lt;sup>†</sup>The authors are with the Department of Communications and Computer Engineering, Graduate School of Informatics, Kyoto University, Kyoto-shi, 606-8501 Japan.

<sup>\*</sup>Presently, with SanDisk, Yokohama-shi, 108-0075 Japan.

a) E-mail: kamae@vlsi.kuee.kyoto-u.ac.jp

DOI: 10.1587/transele.E98.C.504

with very narrow input swing.



(a) Conventional differential to single-end amplifier.

Fig. 2 Near threshold voltage amplifier design.



Fig. 3 The forward/reverse BBG with wide supply voltage range.

lizing high frequency clock. Third, to verify applicability of the cell-based design flow, the BBG has been embedded into an application circuit of AES cipher. Measured results of the BBG are presented briefly in Ref. [6]. In this paper, we show a detailed operation of the BBG with comprehensive results of the measurement. We also present an experiment of an AES cipher core controlled by the BBG.

The rest of this paper is organized as follows. In Sect. 2, we describe a circuit implementation of the BBG. In Sect. 3, we show a physical implementation of the BBG with an AES core. In Sect. 4, we show measured results of the BBG and AES core under body bias generated by the BBG, and conclude in Sect. 5.

## 2. Key Feature and Implementation of the BBG

## 2.1 BBG with Wide Supply-range

To eliminate dedicated supply line for a BBG, the BBG should operate under the same supply line with digital circuits, whose voltage can be down to threshold voltage. For wide supply-range operation, an amplifier, especially an input differential to single-end stage, is one of the critical parts. In our previous work [5], we proposed a scheme to reduce supply voltage down to near-threshold voltage by keeping the input common mode voltage to a constant level in a conventional differential pair shown in Fig. 2(a). It is, however, difficult to maintain saturation for all the threestacked transistors when the supply voltage is lowered down to threshold voltage. In this study, we adopt the constant common mode scheme in [5] and further eliminate the current source M0 as shown in Fig. 2(b), which enables to reduce the supply down to around the threshold voltage. With the constant common mode scheme, the voltage at node P does not fluctuate because the common mode voltage is fixed and therefore we can eliminate M0 while we can maintain Fig. 2(a) and (b) being equivalent. The constant common mode voltage is generated by a diode-connected MOS-FET as shown in Fig. 3. This scheme helps designers to ensure every MOSFET to operate in saturation region while maintaining wide supply-range operation. However under low supply voltage, impact of process-voltage-temperature (PVT) variability becomes large. The circuit should be designed such that correct operation is guaranteed at any corner conditions.

## 2.2 Forward and Reverse Body Bias Generation

For the design automation of the BBG, we utilize a cellbased design flow described in [7]. Since every element in the BBG is implemented in a small cell in this flow, we avoid using poly resistors and thick-gate-oxide transistors which require a large clearance larger than their cell size. Instead, we use MOS capacitors, metal fringe capacitors, and core MOSFETs.

The BBG has to consist of core MOSFETs though the output range of the BBG should cover both forward and reverse body bias conditions. Since the core MOSFETs have a thin gate oxide and cannot tolerate a higher voltage than the core voltage, careful design is required for RBB. Voltage for RBB is negative and goes below  $V_{SS}$  when biasing NMOSFETs in a P-well. In the same way, when biasing PMOSFETs in an N-well, voltage for RBB exceeds  $V_{DD}$ . To simplify our discussion below, we only describe a BBG for NMOSFETs. The other BBG for PMOSFETs has a complementary structure.

Our strategy to generate RBB is to isolate a core voltage region and a negative voltage region as shown in Fig. 3. Charge pumps CP1 and CP2 and capacitor Cc connect two regions and all the transistors are designed not to exceed voltage limitations. While generating FBB, bypass switches are closed and the charge pumps are disabled. Those blocks that do not require a negative voltage are implemented in the core voltage region; DACs, amplifiers, current bias, and sequence controller.

The core voltage region operates as follows. A diodeconnected MOSFET M5 generates a constant voltage  $V_{\text{bias}}$ , which corresponds to a fixed common-mode voltage, for a positive input of the amplifier. An adder drives a negative input of the amplifier  $V_{\text{in-}}$  as Eq. (1).

$$V_{\rm in-} = V_{\rm bias} + V_{\rm fb} - V_{\rm da}.$$
 (1)

Assuming virtual connection between two inputs, we get

$$V_{\rm fb} = V_{\rm da}.$$
 (2)

When FBB is required, the charge pumps are disabled and bypass switches are closed. The voltage  $V_{\rm fb}$  is equal to the output voltage  $V_{\rm PW}$  and we get

$$V_{\rm PW} = V_{\rm da}.$$
 (3)

When RBB is required, the charge pumps are enabled and  $V_{\rm fb}$  is pumped up by  $V_{\rm DD}$  as follows.

$$V_{\rm fb} = V_{\rm PW} + V_{\rm DD}.$$
 (4)

Therefore, the output voltage is given by,

$$V_{\rm PW} = V_{\rm da} - V_{\rm DD}.$$
 (5)

We will explain more in detail about the adder and the CP2 in Sect. 2.3.1.

## 2.3 Circuit Design

This section describes details of circuit blocks. Fig. 3 shows an NMOSFET body bias ( $V_{PW}$ ) generator. By isolating a core voltage region from a negative voltage region by two charge pumps CP1 and CP2, a wide range of output voltage is generated from a core supply line without thick-gateoxide transistors. For this scheme, two signaling techniques are required to control the charge pumps; a clock level shifter and a sequence signal level shifter. These techniques are described in Sect. 2.3.4 and Sect. 2.3.5, respectively.

## 2.3.1 DAC and feedback charge pump

When output voltage is RBB, a charge pump CP2 in Fig. 3 works as a  $V_{DD}$  level shifter to feedback negative voltage into the core voltage region.

To feedback body bias voltage, a switched capacitor



(a) Biasing phase for RBB. The DAC is in final phase.



(b) Amplification phase for RBB. The DAC is in conversion phase.



(c) Biasing phase for FBB. The DAC is in final phase.



(d) Amplification phase for FBB. The DAC is in conversion phase.

Fig. 4 DAC and feedback charge pump phases.

charge pump, an adder, and a DAC are utilized as shown in Fig. 4. When RBB is requested, the charge pump and the adder operates in two phases; biasing phase as in Fig. 4(a), and amplification phase as in Fig. 4(b). In phase (a), capacitors C1 and C2 are charged to intended voltages as Eqs. (6) and (7).

$$V_{\rm C1} = V_{\rm DD},\tag{6}$$

$$V_{\rm C2} = V_{\rm bias} - V_{\rm da}.\tag{7}$$

In phase (b), the input voltage of the amplifier  $V_{in-}$  satisfies Eq. (9) in a steady state.

$$V_{\rm in-} = V_{\rm C1} + V_{\rm C2} + V_{\rm PW},\tag{8}$$

$$= V_{\text{bias}} - V_{\text{da}} + V_{\text{DD}} + V_{\text{PW}}.$$
(9)

Finally, assuming virtual connection between amplifier inputs, we have Eq. (10).

$$V_{\rm PW} = V_{\rm da} - V_{\rm DD}.\tag{10}$$

When FBB is requested, the charge pump is disabled and it takes other two phases; biasing phase as in Fig. 4(c), and amplification phase as in Fig. 4(d). In phase (c), capacitor C2 is charged to intended voltage as Eq. (11).

$$V_{\rm C2} = V_{\rm bias} - V_{\rm da}.\tag{11}$$

In phase (d), the input voltage of the amplifier  $V_{in-}$  satisfies Eq. (13) in a steady state.

$$V_{\rm in-} = V_{\rm C2} + V_{\rm PW},\tag{12}$$

$$= V_{\text{bias}} - V_{\text{da}} + V_{\text{PW}}.$$
 (13)

Finally, assuming virtual connection, we have Eq. (14).

$$V_{\rm PW} = V_{\rm da}.$$
 (14)

To employ enough linearity between RBB and FBB, the design of CP2 needs two considerations. First, switching frequency is slow enough to charge to an accurate voltage. If the frequency is too fast, especially under low  $V_{DD}$ , a capacitor can not be fully charged, which impacts on differential nonlinearity between RBB and FBB. Second, capacitor layout is carefully designed. Node F in Fig. 4 is the most sensitive to parasitic capacitance. The node F is "flying" when switching from phase (a) to phase (b). In phase (a), the voltage of the node F becomes  $V_{DD}$ . When switching to phase (b), the node F is charged to  $V_{PW} + V_{C1}$  from  $V_{DD}$  by capacitor C1. The parasitic capacitance on the node F is charged to different values in phase (a) and phase (b) and it results in an error of the output voltage. In contrast, any other node does not have voltage difference between phase (a) and phase (b) at steady state and is not sensitive to parasitic capacitance.

## 2.3.2 Output charge pump

To output RBB voltage, which exceeds the supply voltage, the other charge pump CP1 is utilized as shown in Fig. 3. Detailed schematic is shown in Fig. 5. The charge pump





**Fig. 6** Clock signals for the output charge pump.

boosts output voltage of the amplifier according to Eq. (15), where  $V_{\text{amp}}$  is the output voltage of the amplifier

$$V_{\rm PW} = V_{\rm amp} - V_{\rm DD}.$$
 (15)

To acquire large enough current to drive load with small area penalty, we utilize a high frequency clock. This point is different from the charge pump CP2.

The charge pump has two phases. At the first phase, C4 is charged to  $V_{\rm DD}$ . At the second phase,  $\phi_0$  becomes on and M2 turns on to boost the voltage as described in Eq. (15). Timing chart for the input clock signals is shown in Fig. 6. The charge pump operates with a pair of non-overlap clocks  $\phi_0$  and  $\phi_2$ . It also requires their inverted clocks  $\phi_1$  and  $\phi_3$ , respectively. Because the output is negative voltage, voltage swing for the gate of M2 should be between  $V_{\rm PW}$  and  $V_{\rm PW} + V_{\rm DD}$ . We define  $V_{\rm DD,aux} = V_{\rm PW} + V_{\rm DD}$ . Clock signals with negative voltages are generated by a clock level shifter described in Sect. 2.3.4. On the other hand, voltage swing for the gate of M4 should be between  $V_{PW}$  and  $V_{DD}$ . It can be larger than the allowable voltage range for a core transistor when operating under the nominal supply voltage hence we utilize stacked pairs of two MOSFETs to reduce the stress. The core voltage clock  $\phi_1$  drives M4.1 and negative voltage clock  $\phi_{n1}$  drives M4.2. M4.3 and M4.4 protect M4.1 and M4.2 from the high voltage, respectively.

To output FBB voltage, we stop the charge pump and bypass it through M5. An inverter consisting of M5.1 and M5.2 drives gate voltage of M5. Because this pull-up circuit for M5 does not require any clock signals,  $V_{PW}$  never become floating even in a reset or a start-up state. When



**Fig. 8** 4-phase clock level shifter and  $V_{DD,aux}$  generator.

generating RBB, M5.3 pulls the gate down to  $V_{PW}$ . A gate signal for the M5.3 is produced by a latched level shifter (LLS) described in Sect. 2.3.5.

## 2.3.3 Phase compensation

A phase compensation capacitor Cc is added to keep stability of the feedback loop as shown in Fig. 7. A dominant pole is established by Cc. Details of the compensation method can be found in Ref. [5]. In order to suppress switching noise from charge pump CP1, the capacitor terminal is not connected to the output of the amplifier but to the charge pump output ( $V_{PW}$ ).

2.3.4 Clock level shifter and auxiliary voltage generator for negative voltage region

To drive charge pumps CP1 and CP2, 8 clock signals and an auxiliary voltage are generated by a clock level shifter and an auxiliary voltage generator, respectively. First, nonoverlapping clocks  $\phi_0$ ,  $\phi_2$  and inverted clocks  $\phi_1$ ,  $\phi_3$  shown in Fig. 6 are generated from a single logic clock [8]. These clock signals and  $V_{PW}$  are fed to the clock level shifter. Clock signals  $\phi_{n0}-\phi_{n3}$  shown in Fig. 6 are the outputs of the clock level shifter and  $V_{DD,aux}$  is the output of the auxiliary voltage generator.

Second, to drive MOSFETs in the negative voltage region, a clock level shifter shown in Fig. 8(a) is utilized. Four cross-coupled NMOSFETs keep the lower level  $V_{PW}$ . While  $\phi_{n2}$  is at the higher level, M1 and M4 define the lower levels of  $\phi_{n0}$  and  $\phi_{n3}$ , respectively. While  $\phi_{n0}$  is at the higher level, M2 and M3 define the lower levels of  $\phi_{n1}$  and  $\phi_{n2}$ , respectively.

Third,  $V_{DD,aux}$ , which is a power line for the negative voltage region, is generated by two cross-coupled PMOS-



Fig. 9 Schematic of latched level shifter.

FETs as shown in Fig. 8(b). Since the clocks swing between  $V_{PW}$  and  $V_{PW} + V_{DD}$ , auxiliary voltage is  $V_{DD,aux}$  =  $V_{\rm PW} + V_{\rm DD}$ . While  $\phi_{\rm n3}$  is at the lower level, M5 conducts  $\phi_{\rm n1}$ to  $V_{\text{DD,aux}}$ . While  $\phi_{n1}$  is at the lower level, M6 conducts  $\phi_{n3}$ to  $V_{DD,aux}$ .

#### 2.3.5 Latched level shifter

A latched level shifter shown in Fig. 9 sends a control signal to the negative voltage region. The latched level shifters are utilized to drive the switches in CP2 and the bypass switches.

The signal D propagates into the negative voltage region via coupling capacitors. When CK is low, regardless of D, the voltage of the nodes E and EB becomes  $V_{\rm NW}$ , which is the low level in the negative voltage region. Transiently the voltage of E and EB can be lower voltage than  $V_{\rm NW}$ , but the cross-coupled NMOSFETs stabilize the voltage of E and EB to  $V_{\rm NW}$ . When CK is high, according to D, either of E or EB becomes  $V_{DD,aux}$ , which is the high level in the negative voltage region. Another node is pulled down by the crosscoupled NMOSFET. The voltage of the nodes E and EB are amplified and hold by the RS-latch and we get level-shifted signal at Q and inverted signal at QB. We have designed a NOR gate with separated power terminals for this circuit.

### 3. Cell-based Design of the BBG Embedded in an **Application Core**

#### 3.1 Cell-based Physical Design of the BBG

Physical layout of the BBG is designed under a cell-based design flow described in Ref. [7]. The netlist of the BBG is disjoint from that of the target circuit except for power supply, clock, and reset signals. They are merged and the unified netlist is used for generating a completed layout through the cell-based design flow. The BBG is split into 760 analog cells including metal fringe capacitors, transmission gate for switched capacitors, and amplifiers. Every cell is single height and compatible with digital design flow for place and route. The auxiliary power lines are routed as signal nets since the lines only feed eight NOR gates with maximum current of  $2 \mu A$ .

#### 3.2 AES

An AES cipher core was implemented for demonstration of

| Table 1 | Synthesis | constraints | and | characteristics | of | the | AES | cipher |
|---------|-----------|-------------|-----|-----------------|----|-----|-----|--------|
|---------|-----------|-------------|-----|-----------------|----|-----|-----|--------|

| Supply voltage  | 1.2 V   |
|-----------------|---------|
| Frequency       | 400 MHz |
| Number of cells | 28k     |



Fig. 10 Chip photograph, AES block, and placement.

the BBG. Synthesis constraints and characteristics are summarized in Table 1.

The AES cipher and its function test block are shown in Fig. 10(b). The function test block consists of a pseudo random number generator with a linear feedback shift register (LFSR) and a comparator. Results of comparison can be measured outside the chip. The substrate island consists of BBG, AES cipher, AES inverse cipher, process monitor [9], controller [10] and ring oscillators.

## 3.3 Layout

The BBG was designed to control an AES cipher core in a 65 nm low power CMOS process with threshold voltage around 0.5 V.

Physical layout of the BBG and AES cipher core is designed by an EDA tool for cell-based design as shown in Fig. 10. Figure 10(a) shows a placement result by the EDA tool. In this design, we assume that the substrate island is small enough to ignore the variation inside the island. A body biasing mesh network on metal layers maintains uniform body bias voltage on one substrate island. Thus the placement of the BBG and the monitor circuit is not important. The EDA tool has automatically placed the BBG cells on white space of the AES cipher. If the island is not small and the variation inside the island is not negligible, the



Fig. 11 Measured output voltage, DNL, and INL from reverse body bias to forward body bias.

island should be split into smaller island. In case that the island cannot be split by some reason, we should consider the number and the placement of the monitor circuits.

Total area of the circuit is  $0.22 \text{ mm}^2$ , which includes the AES cipher, the process monitor and the BBG but excludes a function test block. It is calculated by subtracting the function test block area of  $0.02 \text{ mm}^2$  from the total area of  $0.6 \text{ mm} \times 0.4 \text{ mm} = 0.24 \text{ mm}^2$  as shown in Fig. 10(c). Area of the BBG is 0.0052 mm<sup>2</sup>, which is the sum of total cell area for the BBG. Thus area overhead of the BBG is 2.3%.

#### **Measured Results** 4.

The BBG in the AES cipher is measured under its test mode. Input code for the BBG is controlled externally and body bias voltages ( $V_{\rm NW}$  and  $V_{\rm PW}$ ) are measured.

Figure 11(a)–(c) and (d)–(f) shows output voltage of the BBG, differential nonlinearity (DNL), and integral nonlinearity (INL) at  $V_{DD} = 1.2 \text{ V}$  and 500 mV, respectively. The horizontal axis is the input code for the BBG. The most significant bit (MSB) of the input code switches between FBB and RBB. The other bits are the input of the DAC. Negative values for input codes correspond to RBB and positive values correspond to FBB. The DNL is less than 0.5 LSB at every code. Thanks to the feedback charge pump, no significant DNL found at input code 0, changing point between RBB and FBB. The INL is at most 85 mV at nominal voltage  $V_{DD} = 1.2$  V, which possibly caused by leakage of the switch or offset of the amplifier. If the BBG is integrated in a feedback system using the monitor circuit [11], impact of the INL can be suppressed.

Transient responses at the lowest and nominal  $V_{DD}$ were measured to verify stability of the feedback system as





Fig. 13 Measured transient responses of body bias voltage and 2-input NAND gate delay.

20

0.6

Time fi

0.4



Fig. 14 Operating ranges of the AES.

shown in Fig. 12. No dumping appeared at both conditions.

Transition of gate delay is measured in order to take into account RC-delay for well resistance and capacitance. Figure 13(a) shows the transition of 2-input NAND gate measured by a 29-stage ring oscillator, controlling from -0.4 V reverse body bias (RBB) to +0.4 V forward body bias (FBB). Transition time to 90% is 0.16 µs and RC-delay in the well is not dominant in this experiment. Figure 13(b) shows the transition of the 2-input NAND gate, controlling from FBB to RBB. Due to the charge pump, transition is slower than (a).

Operating ranges of the AES are shown in Fig. 14. Without body bias, minimum supply voltage is 1.15 V at frequency 400 MHz as shown in Fig. 14(a) and power consumption is 484 mW. With 0.6 V forward body bias for both NMOSFETs and PMOSFETs, minimum supply voltage is 0.975 V as shown in Fig. 14(b) and power consumption is 400 mW including that of the BBG. The power consumption of the BBG is at most 0.6 mW according to its simulation. The results show that forward body bias reduces power consumption 17% without penalty in operating speed. As the supply voltage becomes lower, the power of the target circuit becomes smaller and the power overhead of the BBG becomes relatively large. On the other hand, the impact of variation compensation is expected to be large when the supply voltage becomes lower. Thus it is not clear at which supply voltage the BBG overhead and power reduction balance.

| Parameter         | this work              | prev. work [5]         | [12]                 | [13]                  | [2]                |
|-------------------|------------------------|------------------------|----------------------|-----------------------|--------------------|
| Process           | 65 nm                  | 65 nm                  | 90 nm                | 90 nm                 | 65 nm              |
| Supply            | 0.5 V to 1.2 V         | 0.6 V to 1.2 V         | 1.2 V                | 2.5 V & -1 V          | > 1.1 V            |
| Core supply       | 0.5 V to 1.2 V         | 0.6 V to 1.2 V         | 0.8 V to 1.2 V       | 1.0 V                 | 1.2 V              |
| Output            | FBB/RBB                | FBB                    | FBB                  | FBB/RBB               | FBB                |
| Resolution        | 19 mV †                | 38 mV                  | 8 mV                 | -                     | binary             |
| BBG circuit area  | 0.0052 mm <sup>2</sup> | 0.0023 mm <sup>2</sup> | 0.03 mm <sup>2</sup> | 0.006 mm <sup>2</sup> | -                  |
| Core circuit area | 0.22 mm <sup>2</sup>   | 0.1 mm <sup>2</sup>    | 1 mm <sup>2</sup>    | 0.15 mm <sup>2</sup>  | 15 mm <sup>2</sup> |
| Area overhead     | 2.3%                   | 2.3%                   | 3%                   | 4%                    | < 3%               |
| Layout design     | automated              | automated              | modular              | custom                | automated          |
| Power consumption | 0.6 mW                 | 0.12 mW                | 0.21 mW              | 1.5 mW                | -                  |
| Response time     | 2 µs ‡                 | 2 µs                   | 5 µs                 | 70 ns                 | -                  |

**Table 2**Summary and Performance of the BBG.

<sup>†</sup> is a maximum value at  $V_{\text{DD}} = 1.2 \text{ V}$  and is proportional to  $V_{\text{DD}}$ .

<sup>‡</sup> is at  $V_{\text{DD}} = 1.2 \text{ V}$  and it also depends on input values.

Performance summary and comparison of the BBG is described in Table 2. Output range of the BBG covers both FBB and RBB without requiring additional supply. The proposed BBG operate under wide range of supply voltage from 0.5 V to 1.2 V.

## 5. Conclusion

A forward/reverse body bias generator for fine-grained body bias has been presented. The correct operation of the BBG is verified under wide range of supply voltage from 0.5 V to the nominal voltage of 1.2 V and the output voltage range covers both forward and reverse body bias. The BBG is integrated into an AES cipher using a cell-based design flow. A demonstration with the AES cipher shows that the BBG can work correctly for performance tuning under the supply voltage of 1.2 V down to 0.5 V, which can be exploited for reducing power dissipation. A 17% power saving is observed at a constant operating frequency of 400 MHz.

## Acknowledgments

The authors would like to thank T. Ishihara, A. Islam, S. Kim, and S. Nishizawa for design and lab support.

This work is supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsys, Inc., Cadence Design Systems, Inc., and Mentor Graphics, Inc.

The chip in this study has been fabricated at Fujitsu Ltd. through the fabrication program of VDEC in collaboration with STARC, e-Shuttle, Inc.

A part of this research is funded by JSPS KAKENHI Grant Number 25280014.

## References

- [1] S. Dighe, S. Vangal, P. Aseron, S. Kumar, T. Jacob, K. Bowman, J. Howard, J. Tschanz, V. Erraguntla, N. Borkar, V. De, and S. Borkar, "Within-die variation-aware dynamic-voltage-frequency scaling core mapping and thread hopping for an 80-core processor," Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2010 IEEE International, pp.174–175, Feb. 2010.
- [2] F. Tachibana, H. Sato, T. Yamashita, H. Hara, T. Kitahara, S. No-

mura, F. Yamane, Y. Tsuboi, K. Seki, S. Matsumoto, Y. Watanabe, and M. Hamada, "A process variation compensation scheme using cell-based forward body-biasing circuits usable for 1.2 V design," IEEE Custom Integrated Circuits Conference, 2008. CICC 2008, pp.29–32, 2008.

- [3] R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas, "Mitigating parameter variation with dynamic fine-grain body biasing," Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp.27–42, 2007.
- [4] J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D. A. Antoniadis, A. P. Chandrakasan, and V. De, "Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage," IEEE J. Solid-State Circuits, vol.37, no.11, pp.1396–1402, Nov. 2002.
- [5] N. Kamae, A. Tsuchiya, and H. Onodera, "A body bias generator with low supply voltage for within-die variability compensation," IEICE Trans. Fundamentals, vol.E97-A, no.3, pp.734–740, 2014.
- [6] N. Kamae, A. Islam, A. Tsuchiya, and H. Onodera, "A body bias generator with wide supply-range down to threshold voltage for within-die variability compensation," 2014 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp.53–56, Nov. 2014.
- [7] N. Kamae, A. Tsuchiya, and H. Onodera, "A body bias generator compatible with cell-based design flow for within-die variability compensation," 2012 IEEE Asian Solid State Circuits Conference (A-SSCC), pp.389–392, Nov. 2012.
- [8] B. R. Gregoire, "A compact switched-capacitor regulated charge pump power supply," IEEE J. Solid-State Circuits, vol.41, no.8, pp.1944–1953, Aug. 2006.
- [9] A. Islam and H. Onodera, "On-chip detection of process shift and process spread for post-silicon diagnosis and model-hardware correlation," IEICE Trans. Inf. & Syst., vol.E96-D, no.9, pp.1971–1979, 2013.
- [10] A. Islam, N. Kamae, T. Ishihara, and H. Onodera, "A built-in selfadjustment scheme with adaptive body bias using P/N-sensitive digital monitor circuits," 2012 IEEE Asian Solid State Circuits Conference (A-SSCC), pp.101–104, Nov. 2012.
- [11] A. Islam and H. Onodera, "Characterization and compensation of performance variability using on-chip monitors," Proceedings of Technical Program-2014 International Symposium on VLSI Technology, Systems and Application (VLSI-TSA), pp.1–4, Apr. 2014.
- [12] M. Meijer, J. de Gyvez, B. Kup, B. van Uden, P. Bastiaansen, M. Lammers, and M. Vertregt, "A forward body bias generator for digital CMOS circuits with supply voltage scaling," Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), pp.2482–2485, June 2010.
- [13] D. Levacq, M. Takamiya, and T. Sakurai, "Backgate bias accelerator for sub-100 ns sleep-to-active modes transition time," IEEE J. Solid-State Circuits, vol.43, no.11, pp.2390–2395, Nov. 2008.



Norihiro Kamae received the B.E. degree in electrical and electronics engineering in 2010 and M.E. degree in communications and computer engineering in 2012, both from Kyoto University, Kyoto, Japan. He is currently pursuing the Ph.D. degree with Kyoto University. He is a student member of IEEE and IE-ICE. His research interests include low power design methodology for integrated circuits and analog-assisted digital circuits.



Akira Tsuchiya received the B.E., M.E., and Ph.D. degrees in communications and computer engineering from Kyoto University, Kyoto, Japan, in 2001, 2003, and 2005, respectively. Since 2005, he has been an Assistant Professor with the Department of Communications and Computer Engineering, Graduate School of Informatics, Kyoto University. His research interests include modeling and design of on-chip passive components of highfrequency CMOS, and high-speed analog circuit

design. Dr. Tsuchiya is a member of the Institute of Electrical, Information and Communication Engineers (IEICE), Japan, and IEEE.



Hidetoshi Onodera received the B.E., and M.E., and Dr. Eng. degrees in Electronic Engineering, all from Kyoto University, Kyoto, Japan. He joined the Department of Electronics, Kyoto University, in 1983, and currently a Professor in the Department of Communications and Computer Engineering, Graduate School of Informatics, Kyoto University. His research interests include design technologies for Digital, Analog, and RF LSIs, with particular emphasis on low-power design, design for manufactura-

bility, and design for dependability.

Dr. Onodera served as the Program Chair and General Chair of ICCAD and ASP-DAC. He was the Chairman of the IPSJ SIG-SLDM (System LSI Design Methodology), the IEICE Technical Group on VLSI Design Technologies, the IEEE SSCS Kansai Chapter, and the IEEE CASS Kansai Chapter. He is currently the Chairman of IEEE Kansai Section. He served as the Editor-in-Chief of IEICE Transactions on Electronics and IPSJ Transactions on System LSI Design methodology.