# Exploration of On-Chip Switched-Capacitor DC-DC Converter for Multicore Processors Using a Distributed Power Delivery Network

Pingqiang Zhou, Dong Jiao, Chris H. Kim and Sachin S. Sapatnekar Department of Electrical and Computer Engineering University of Minnesota, Minneapolis, MN 55455, USA

Abstract—In this paper, we explore the design of on-chip switchedcapacitor (SC) DC-DC converters in the context of multicore processors, using an accurate power grid simulator. Results show that distributed design of SC converters can reduce the IR drop by up to 74% compared to the lumped design, with improved supply voltage. We also demonstrate the usage of SC converters for multi-domain power supply.

# I. INTRODUCTION

The roadmap for future multicore-based computing shows more and more processor cores placed on the same die to build chip multiprocessors (CMPs). CMPs provide the ability to perform multiple tasks in parallel. However, the power demands of various cores on the same die can be different, and can change with time, depending on the applications that they may run. Dynamic voltage scaling (DVS) is one of the most effective means to achieve energy-efficient design in CMPs. The varying power demands of all cores can be best met if DVS is supported by providing multiple independent on-chip power supplies: this can support per-core or per-cluster (where multiple cores are driven by the same supply) power management in CMPs.

A voltage regulator is an essential component of the power delivery network. Most DVS systems are based on off-chip voltage regulators driving on-chip power grids, which comes at the cost of additional complexity and area, since voltage regulators are built traditionally in board-level with large inductors or capacitors. The costs and sizes of such bulky modules severely limit their use for multiple power domain regulation. To enable per-cluster or per-core DVS, it is essential to develop fully integrated on-chip DC-DC converters for each power domain, which can significantly improve load regulation and eliminate load-transient spikes caused by inductances from package and global power grid [1], [2].

The key challenge associated with realizing such on-chip integrated converters is the difficulty in achieving high efficiency at the high power densities required by high-performance CMPs. Historically, on-chip DC-DC converters are limited to low power applications [3], [4] due primarily to the lack of dense, high-quality-factor energy storage elements. In typical CMOS processes, on-die capacitors have significantly higher Q and energy density and lower cost than on-die inductors, leading to several recent efforts in exploring fully integrated switched-capacitor (SC) DC-DC converters [4]–[6]. In [6], the authors have demonstrated the application of embedded deep trench capacitors in a switched-capacitor DC-DC voltage converter to provide an on-chip energy storage device of extreme density (~200nF/mm<sup>2</sup>, current density of 2.3A/mm<sup>2</sup>), high efficiency (90%) and minimal parasitic losses.

Prior work has not adequately studied the layout implications of on-chip power supply design. It is well known that power delivery is most efficient if the power sources are close to the utilization points (it is for this reason that decoupling capacitors – which deliver power based on stored charge – are placed close to large noise sources). In this work, we explore the application of on-chip SC DC-DC converters in the context of CMPs. When integrating SC



Fig. 1. Lumped vs. Distributed on-chip DC-DC converters.

converters into on-chip power delivery network, we can built them in either lumped or distributed form, as shown in Fig. 1. For the lumped case, a large central converter delivers power to all the blocks in the DVS cluster or the whole chip. In contrast, for the distributed case, several smaller converters can be distributed across the chip and each load can absorb current from the nearby converter. Although an independent closed-loop control unit is needed for each distributed converter [3], its benefits are significant. First, since the load current is typically at the granularity of Amps in CMPs [7], distributed converters can significantly reduce the voltage droop seen by the local loads by providing more localized power distribution. Second, distributed design of converters provides the flexibility to support multiple power deliveries, and we can apply DVS to each local converter to achieve better power management.

Existing design tools do not provide adequate support for analyzing multicore power grids. Therefore, we develop an accurate on-chip power grid simulator which incorporates on-chip SC DC-DC converters and supports multiple power domains. We then quantitatively compare the lumped and distributed designs of on-chip SC converters using realistic current profiles from CMP applications. We also demonstrate the application of SC converters for multi-domain power supply.

## II. SWITCHED-CAPACITOR DC-DC CONVERTER

A switched-capacitor (SC) DC-DC converter (also known as a charge pump) is a network of charge-transfer capacitors (also called pumping capacitors) and switches that operates in two or more phases, converting an input voltage  $V_{in}$  to an output voltage of  $V_{out}$ . If  $V_{out}$  is higher than  $V_{in}$ , the conversion is called a "step-up" conversion. Vice versa, if  $V_{out}$  is lower than  $V_{in}$ , the conversion is called a "step-down" conversion. In this work we focus on step-down conversion.

A representative SC DC-DC converter operates in two nonoverlapping phases: a charging phase  $\phi_1$  and a discharging phase  $\phi_2$  (in reality, many more phases are used to control ripple: in this paper, we use 16 phases – but the essential idea is the same as for two phases). During phase  $\phi_1$  a group of capacitors in the network are connected to the input to get charged, while in phase  $\phi_2$  this group of capacitors are connected to the output to discharge. There are several different ways to configure the connection of capacitors in each phase, and each configuration has its own characteristics. In this work, we explore the simple "Series-Parallel" configuration. Fig. 2 show four different kinds of series-parallel step-down SC DC-DC converters as proposed in [3]. This method uses the same total capacitance of  $12C_B$  and and provides multiple output voltage levels from the same converter block through various series-parallel reconfigurations of this total capacitance.



Fig. 2. Configurations of SC DC-DC converters with different gains.



Fig. 3. Equivalent circuit in charging and discharging phases for G1BY2.

For example, consider configuration G1BY2, with 2:1 gain (conversion ratio). Two capacitors each with capacitance of  $6C_B$  and five switches are connected in a network, and the switches are controlled by two signals  $\phi_1$  and  $\phi_2$ (Fig. 2(d)). In charging phase  $\phi_1$ ,  $\phi_1$  turns ON two switches connecting the two charge-transfer capacitors in series (Fig. 3(a)). Since both capacitors have the same value of capacitance  $6C_B$ , each will be charged to  $V_{in}/2$  if enough time is provided for the capacitors to be fully charged. In the second phase  $\phi_2$ , three other switches turn ON, while the ones controlled by  $\phi_1$  turn OFF (Fig. 3(b)). This will connect both capacitors in parallel with the output load, resulting in an output voltage  $V_{out} = V_{in}/2$ . As current starts to flow into the load, the charge stored in the capacitors will deplete and the output voltage will drop to  $V_o = V_{in}/2 - \Delta V$  at the end of this stage before it is recharged in the next phase.

The power that such an SW DC-DC can deliver is

$$P_L = (\alpha \cdot C_B \cdot V_{in} \cdot \Delta V) \cdot f_s \cdot \eta \tag{1}$$

where  $\alpha$  is a coefficient determined by the particular topology,  $f_s$  is

the switching speed of clock signals  $\phi_1$  and  $\phi_2$ , and  $\eta$  is the efficiency of the converter. For further details, the reader is referred to [3].

### **III. SIMULATION PLATFORM**

Fig. 4 presents a detailed model of the power delivery network for the CMP. The package and C4 bump contacts are modeled as RL pairs. The on-board power supply is modeled as a DC voltage source. The on-chip power delivery network consists of a global VDD grid, on-chip DC-DC converters, local power grids, a global GND grid, core or decoupling capacitors and current loads. The global sparse VDD grid distributes voltage to on-chip SC converters. Each local power grid belongs to a power domain, and its voltage is controlled by the corresponding on-chip SC converters. Each power domain can have a group of SC converters. The power grids are generated according to an industrial 32nm technology.



Fig. 4. Model of power delivery network.

In our work, we consider multicore applications which require multiple power delivery domains for best energy efficiency. Existing power grid simulators, which are focused on simulating a single voltage domain, are excellent for today's CMPs that use a single off-chip voltage regulator. However, they do not provide adequate support for simulating large power grid networks driven by SC DC-DC converters, incorporating factors such as the regulator efficiency under time-varying loads. Therefore, we build an accurate power grid simulator incorporating on-chip SC DC-DC converters.





SUMMARY OF SW DC-DC CONVERTERS

We consider a test chip with four identical cores. Fig. 5 shows the chip floorplan. In our simulator, each core can be modeled as either lumped or distributed time-varying current sources. In our simulations, we model each core as a lumped current source and generate the current profiles by simulating several SPEC OMP2001 [8] workloads using an accurate full system multicore simulator GEMS [9]. We observed two typical types of current traces from these workloads: in one, which we call *trace1* and show in Fig. 6, there are many short current pulses early in the simulation, while the other, which we call *trace2*, is of the nature shown Fig. 7.



Fig. 7. Trace2, the apparent periodicity is caused by a loop in the execution.

For the SC converters, we use the structures shown in Fig. 2. The switches are modeled as resistors when they are turned on. As a common practice, 16-phase interleaving (within each converter, 16 cells working in parallel) is use to reduce the output ripple of the converters. The digital-capacitance-modulation scheme [4] is integrated into our simulator, which controls the amount of capacitance that takes part in the charge transfer process.

Further, we explore the choice of  $C_B$ . Depending on the current demands, a larger or smaller  $C_B$  may be used. We organize the  $C_B$  capacitors into banks so that each  $C_B$  can have four different sizes: 1X, 2X, 4X and 8X, and any  $C_B$  capacitors that are not used (e.g., for a low current demand) can be power-gated to reduce leakage. It should be noted that the maximum available  $C_B$  for the lumped and distributed designs of the SC converters are different, since distributed converters are smaller, more numerous, and must satisfy lower local power demands, they may use smaller  $C_B$  values.

The parameters for the SC converters studied in this work are summarized in Table I, and the other parameters for the power grid and the CMP are listed in Table II.

TABLE II SIMULATION CONFIGURATION

| DC voltage source | $Vdd=1.2V (Vcore = 0.6V \sim 1.2V)$                   |
|-------------------|-------------------------------------------------------|
| Package           | $L_{pkg} = 15pH, R_{pkg} = 1m\Omega$                  |
| C4 bump           | #=768, $L_{bump} = 7.2 pH$ , $R_{bump} = 1.5 m\Omega$ |
| Core load         | capacitance=1 nF, core frequency=750Mhz               |

### **IV. SIMULATION RESULTS**

# A. Lumped vs. Distributed On-Chip SC DC-DC Converters

In this section, we compare the lumped and distributed designs of on-chip SC converter. For this experiment, we assume that all the four cores shown in Fig. 5 works in one power domain, and the G3BY4 structure (Fig. 2(b)) with 4:3 conversion ratio and a nominal Vdd of 0.9V is used to deliver power to the cores. For the lumped design, we place a single SC converter in the center of the test chip, and it delivers power to all the four cores; for the distributed design, we place four individual SC converters evenly distributed on the chip, so that each core can absorb current from its local converter. For fair comparison, the same amount of total available charge-transfer capacitances are used for the lumped and distributed cases.

We exercised these two designs by applying the two types of current traces shown in Fig. 6 and 7. The *trace1* current profile can serve as the low load case, and the *trace2* for the high load case. The results are shown in Fig. 8 and 9. From Fig. 8 we can see that for a nominal voltage of 900mV and *trace1*, compared to the lumped design, the minimum voltage seen by the cores can be improved from 757mV to 811mV, and the maximum IR drop can be reduced by 74% if we go for the distributed design. The corresponding numbers for *trace2* is an improvement of minimum voltage from 637mV to 729mV and a 71% reduction of maximum IR drop, as shown in Fig. 9.





(d) Four distributed SC converters, (e) Four distributed SC converters, Core2, min vol.=815mV Core3, min vol.=826mV

Fig. 8. Comparison of lumped and distributed designs of SC converter using current profile *trace1*.

Efficiency is an important metric for SC DC-DC converters. The principle contributors to efficiency loss in a SC DC-DC converter are: conduction loss arising from charing a capacitor through a switch, loss due to parasitic capacitors, gate-drive loss due to switching the gate capacitance of the charge-transfer switches, and power loss in the control circuitries [3]. Simulation results show that the parasitic capacitance of the deep trench capacitors is less than 1% of the total charge-transfer capacitance. The size of the switches is negligible compared to the cores in our test chip, so we can ignore the gate-drive loss. The loss in the control circuitry is of specific concern only when delivering ultra-low load power levels (in the magnitude of  $\mu$ W) [3]. Therefore, in our simulations the loss of the SC converters mainly come from the conduction loss. The measured results show that, for the lumped and distributed converters, the average efficiencies when simulating the current profiles are in the range of [92.39%,95.38%].

In summary, although the distributed design requires an independent closed-loop control unit for each individual converter, its benefits are prominent. First, in the distributed design, the cores can absorb current from local SC converters, and the current doesn't need to flow through a long conduction path from the converter to the core load as in the lumped case. Therefore, distributed design of SC converters



(d) Four distributed SC converters, (e) Four distributed SC converters, Core2, min vol.=746mV Core3, min vol.=735mV

Fig. 9. Comparison of lumped and distributed designs of SC converter using current profile *trace2*.

would benefit in the sense of less IR noise since each converter can regulate its local supply voltage. Second, the distributed converters deliver much less power than the lumped one, with smaller capacitors they can respond much faster to the changes in the local core loads, which leads to smaller voltage swings as seen by the loads. Finally, distributed SC converters have the flexibility to manage the chargetransfer capacitors in fine granularity: when a local core is idle at the execution time, the corresponding local converter can power-gate its unused charge-transfer capacitors to reduce leakage power.

# B. Multiple Power Deliveries Using On-Chip SC DC-DC Converters

In this section, we explore the use of on-chip SC converters for multi-domain power delivery. For the test chip shown in Fig. 5, we design four power domains: Core0 works in domain0, served by one lumped G1BY1 converter with nominal Vdd of 1.2V, Core1 works in domain1, served by one lumped G3BY4 converter with nominal Vdd of 0.9V, Core2 works in domain2, served by one lumped G2BY3 converter with nominal Vdd of 0.8V, and Core3 works in domain3, served by one lumped G1BY2 converter with nominal Vdd of 0.6V.

We then exercised these four power domains by the corresponding current traces presented in Section III. Fig. 10 and 11 show the simulation results. We can see that all the four domains work well. In fact, given a single Vdd supply, we can further dynamically reconfigure the converter in each domain (see Fig. 2) to deliver a wide range of load voltages, therefore DVS can be applied to each domain to achieve better power management.

# V. CONCLUSION

In this paper, we have explored the design of on-chip SC DC-DC converters with an accurate power grid simulator. Simulation results based on realistic multicore current profiles show that distributed SC converters can reduce the IR drop by up to 74% compared to the lumped design, with improved supply voltage. We also present the idea of using SC converters for multi-domain power supply.

#### ACKNOWLEDGMENT

The authors gratefully acknowledge Jieming Yin at University of Minnesota for providing us the current profiles for the multicore chip.



Fig. 11. Simulations results of four power domains using trace2.

We also would like to thank Bongjin Kim at University of Minnesota for verification the parasitic capacitance of deep trench capacitors.

## REFERENCES

- G. Patounakis, Y. Li, and K. Shepard, "A fully integrated on-chip DC-DC conversion and power management system," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 3, pp. 443–451, Mar. 2004.
- [2] Z. Zeng, X. Ye, Z. Feng, and P. Li, "Tradeoff analysis and optimization of power delivery networks with on-chip voltage regulation," in *the* ACM/EDAC/IEEE Design Automation Conference, Jun. 2010, pp. 831– 836.
- [3] Y. Ramadass and A. Chandrakasan, "Voltage scalable switched capacitor DC-DC converter for ultra-low-power on-chip applications," in *IEEE Power Electronics Specialists Conference*, Jun. 2007, pp. 2353–2359.
- [4] Y. Ramadass, A. Fayed, B. Haroun, and A. Chandrakasan, "A 0.16mm<sup>2</sup> completely on-chip switched-capacitor DC-DC converter using digital capacitance modulation for LDO replacement in 45nm CMOS," in *IEEE International Solid-State Circuits Conference*, Feb. 2010, pp. 208–209.
- [5] H.-P. Le, M. Seeman, S. Sanders *et al.*, "A 32nm fully-integrated reconfigurable switched-capacitor DC-DC converter delivering 0.55 W/mm<sup>2</sup> at 81% efficiency," in *IEEE International Solid-State Circuits Conference*, Feb. 2010, pp. 210–211.
- [6] L. Chang, R. Montoye, B. Ji et al., "A fully-integrated switched-capacitor 2:1 voltage converter with regulation capability and 90% efficiency at 2.3A/mm<sup>2</sup>," in *IEEE Symposium on VLSI Circuits*, Jun. 2010, pp. 55–56.
- [7] S. Bell, B. Edwards, J. Amann et al., "TILE64 processor: A 64-core SoC with mesh interconnect," in *IEEE International Solid-State Circuits Conference*, Feb. 2008, pp. 88–598.
- [8] "SPEC OMP2001," Available at http://www.spec.org/omp/.
- [9] M. M. K. Martin, D. J. Sorin, B. M. Beckmann *et al.*, "Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset," *SIGARCH Computer Architecture News*, vol. 33, pp. 92–99, 2005.