### UC Santa Cruz UC Santa Cruz Previously Published Works

## Title

Harmonic Resonant Clocking

Permalink https://escholarship.org/uc/item/0p27n58t

**ISBN** 9781467326582

**Authors** Skinner, H Blake Hu, Xuchu Guthaus, Matthew

# Publication Date 2012-10-01

**DOI** 10.1109/vlsi-soc.2012.7332077

Peer reviewed

## Harmonic Resonant Clocking

H. Blake Skinner, Xuchu Hu, Matthew Guthaus Department of Computer Engineering University of California Santa Cruz Santa Cruz, CA 95064 {hskinner, hxcu, mrg}@soe.ucsc.edu

Abstract-Distributed inductor-capacitor (LC) resonant clocking is a recent, promising technique to reduce the energy consumption in Clock Distribution Networks (CDNs) by recycling the energy on-chip. Even though the majority of power is saved, resonant clocks distribute a sinusoidal clock signal with a 25% slew which increases short-circuit power in the sequential elements compared to traditional buffered clocks. In this work, we present the first harmonic resonant clock circuit that adds a third harmonic to the fundamental frequency in order to increase the slew rate of resonant clocks and reduce the shortcircuit power in the sequential elements. We present two different methods of tuning a secondary tank circuit in a Colpitts oscillator to minimize the slew: one based on the frequency response of the circuit and the other based on matching an ideal square wave clock signal. Both methods provide benefits in slew reduction at a modest cost of components.

#### I. INTRODUCTION

Clock distribution networks (CDN) can consume between 30-70% of total chip power in high-performance processors. Many techniques have been proposed to minimize the dynamic power including minimal wirelength routing, buffer/wire sizing, low-voltage-swing, dynamic voltage and frequency scaling, and clock gating. Many of these techniques exploit inactivity or reduced workloads to reduce power, but the fundamental limit during active modes is a severe challenge in future highly-threaded many-core and GPU processors.

Resonant clock distributions are another alternative to save power that can recycle the energy on-chip. Various resonant techniques have been proposed including standing wave [11], rotary/salphasic [3], [17], monolithic inductor-capacitor (LC) resonant [10], [14], [19], and distributed-LC resonant [1], [2], [5]. There have been automated methods for synthesizing distributed-LC trees [4], [7], [12] and to synthesize rotary clocks [15], [18]. Distributed LC tanks have garnered recent industrial interest [6], [13] due to their similarity to existing clocking strategies and promise of high performance. Many of the design concerns in high-performance grids have been addressed in design automation methodologies [8], [9].

While resonant clocks work well at high frequencies, the slew rate of the sinousidal clock poses a problem below around 2GHz. This could limit the applicability of resonant clocks to many of the mobile and low-power devices in many modern systems. As an example, if we examine some clock distributions that drive buffered clock inputs with a  $4 \times$  output to input capacitance ratio, we see at least  $10 \times$  increase in buffer short current power in resonant grids compared to non-resonant grids as shown in Figure 1. This short circuit power



Fig. 1. Dynamic power in resonant clocks (2nd column) is dramatically reduced compared to buffered clocks (1st column), however, the the total contribution of short-circuit power in the sinks becomes a significant portion of remaining power.

constitutes more than 20% of the total power of a resonant grid and occurs because the slew rate of a sinusoid is 25% of the clock period. In buffered clock distributions, the slew rate is typically constrained to about 10% of the clock period to reduce the impact of short-circuit power.

Note that in this paper we use the term *slew rate* somewhat loosely to refer to amount of slew present in the signal. In other words, we use it to refer to the percentage of the signal's total period taken up by the rising or falling edge, making it somewhat analogous to the term *slew time*.

We address the previous slew problem in resonant clocks by introducing a harmonic resonant oscillator. While we only examine the feasibility of a single monolithic oscillator, the same concept can be used to reduce the short-circuit power in distributed-LC resonant clocks. A harmonic resonant oscillator adds an additional LC tank circuit to the on-chip oscillator to add a third harmonic and create a "more square" wave. To do this, the values of each circuit element must be carefully balanced with respect to the load of the clock tree. We propose a simulated annealing optimization problem with two different cost functions to find the best circuit configuration and then compare with previous buffered and resonant clock distributions.

In particular, this is the first paper to:

- Introduce a harmonic resonant oscillator to solve the sinusoidal slew problem in distributed-LC resonant clocks.
- Propose a simulated annealing methodology to optimize slew in resonant oscillators.
- · Quantify the short-circuit power savings using harmonic

resonant oscillators.

The remainder of the paper is organzed as follows: Section II discusses the theoretical aspects of our proposed harmonic resonant clock oscillator, Section III presents our methodology to optimize and evaluate the circuit, Section IV discusses our results, and Section V concludes the paper.

#### II. HARMONIC RESONANT THEORY

From Fourier theory, we know that an ideal square wave is the sum of an infinite number of sinusoids. Adding only the third harmonic according to

$$\alpha \sin(2\pi f_0 x) + \beta \sin(2\pi f_3 x) \tag{1}$$

can be done at the cost of only one additional tank circuit. If the proper weighting of the fundamental and third harmonic,  $\alpha$  and  $\beta$  respectively, is chosen, this can significantly decrease the overall slew. For example, Figure 2 shows a comparison of purely the fundamental and the fundamental with the third harmonic using coefficients,  $\alpha = 0.55$  and  $\beta = 0.15$ . In this case, the slew rate is decreased from 25% of the clock period to approximately 15% of the clock period.

The major limitation of the magnitudes  $\alpha$  and  $\beta$  are due to voltage overshoot and the noise margins of CMOS logic. If  $\alpha$  is too large, then the clock voltage will overshoot the supply voltage. While this does not present a logical failure, it can introduce oxide reliability issues with repeated voltage overshoot stress. The magnitude of  $\beta$ , however, can pose a more significant challenge. If the sub-peaks are more than  $V_{thn}$ or less than  $V_{DD} - V_{thp}$  where  $V_{thn}$  and  $V_{thp}$  are the NMOS and PMOS threshold voltages, respectively, an extraneous clock edge could trigger the sequential elements. This places a hard limit on the magnitude of the third harmonic. We have found, however, that the third harmonic is actually low-pass filtered in typical clock applications. This has the effect that the sub-peaks are typically not visible and this limit is not significant in practice.

We utilize a modified Colpitts oscillator with a secondary tank circuit to accomplish the third harmonic as shown in Figure 3. The load capacitance  $(C_L)$  represents the collective load of the clock network and sequential elements. The drain inductance  $(L_D)$  and the inductance and capacitance of the tank circuit  $(L_T \text{ and } C_T)$  can be tuned.

The annealing was designed around the premise of  $L_D$  and  $C_L$  forming the fundamental frequency

$$f_0 = \frac{1}{2\pi\sqrt{L_D C_L}} \tag{2}$$

which is the first term of Equation 1. The tank circuit parameters,  $C_T$  and  $L_T$  form the third harmonic which is ideally

$$f_3 = \frac{1}{2\pi\sqrt{L_T C_T}} \tag{3}$$

and is the second term of Equation 1. In practice, however, all elements must be tuned in relation to each other, because the tank circuit has properties of inductance and capacitance which will affect the resonant frequency of the outer oscillation.



Fig. 2. A standard sine wave, and a sine wave with the 3rd harmonic added. The harmonic oscillation lowers the slew from 25% to approximately 15% of the clock period.



Fig. 3. A tank circuit on the output of a Colpitts oscillator can add the third harmonic to the sine wave, creating a "more square" wave.

Similarly, the drain inductance and capacitive load of the clock tree affect the oscillation of the tank circuit as well.

#### III. METHODOLOGY

#### A. Parameter Tuning

The circuit elements  $L_D$ ,  $L_T$ , and  $C_T$  must be carefully tuned to create a waveform with a minimal amount of slew but also to resonant at the correct fundamental frequency  $f_0$  to save power. The clock load  $C_L$  represents the collective load of the wires and sinks and cannot be changed. The values of the inductors and capacitors, however, should also be constrained within reasonable ranges. Depending on these constraints, it may not be possible to get a valid waveform using this method if the required size of a circuit element is too large or small to be implemented on-chip.

The constraints of the design and the precision limitation of discrete components means that the possible configurations of the design form a finite solution set, one that would be hard to characterize analytically due to the interactions between dielectric forces. For such a problem, we choose to use simulated annealing due to its flexibility and because it could be run with definite time constraints. Two methods of simulated annealing were used to find the ideal values of the circuit elements. Both generated new configurations by allowing the drain inductance and tank circuit elements to randomly move within the constraints of the design. The resonant frequency of the tank circuit is kept within 30% of the third harmonic, but the drain inductance was allowed to freely move.

Each method used a different cost function. The first was based on the frequency domain analysis of the circuit. Our goal was to produce a waveform with resonant frequencies on the first and third harmonic. The annealer cost function compared the first two peaks of the frequency spectrum. Specifically, it compared how close the second observed peak was to the third harmonic of the wave. It is possible for a waveform that is far off from the intended frequency to receive a low energy if it looks "more square". The generated circuits did stay near the target frequency due to the fact that the move function restricted the resonant frequency of the tank circuit around the desired third harmonic.

The second cost function was based on the transient response of the circuit. The annealer compared the output waveform with a perfect square wave of the desired frequency  $f_0$ . The start and end time of the simulation was set to guarantee the simulation would start at the beginning of the high dutycycle of the period and end after exactly five periods. As there were no elements in this circuit which could cause a shift in phase, the start of a duty-cycle could be predicted solely based on the period. A minimum time of 1ns was provided to give the circuit enough time to charge. The energy is calculated by averaging the difference of squares comparison of each time step in Ngspice for each clock period.

We used both annealing cost functions to create oscillators with a target frequency of 1 GHz. The algorithm was written in Python using a simulated annealing library [16]. A range of acceptable values for the circuit elements kept the parameters of the circuit within acceptable bounds, however, if the range is too restrictive it may not be possible for the annealer to reach an acceptable configuration. For our experiments, we set the ranges of each inductor so that it along with the 2 pF load could reach anywhere between one-fifth and five times the target frequency. For 1 GHz this was 506.6 pF and 316.6 nF,  $C_T$  was allowed the same range. The circuit level simulation was performed using Ngspice with some additional data analysis also written in Python.

Determining an appropriate cooling schedule was difficult and differed for each cost function. The annealing library has a mechanism to auto-explore the temperature ranges and dynamically set a cooling schedule. It first looked for a temperature that resulted in a 98% acceptance and a temperature that resulted in a 0% acceptance. The annealing was then done between these two values using an inverse-log scale.

|              | 10% Slew | 25% Slew |
|--------------|----------|----------|
| Rising Edge  | 0.160 mA | 0.239 mA |
| Falling Edge | 0.204 mA | 0.288 mA |

TABLE I

SHORT-CIRCUIT CURRENT DRAW FOR 4X LOAD INVERTER BASED WITH A 10% AND 25% OF CLOCK PERIOD SLEW RATE.

#### B. Power Measurement

The power of a resonant clock consists of both the dynamic power and the short-circuit power in the sequential elements,

$$P_{total} = P_{dyn} + P_{sc}.$$
 (4)

Leakage power is not very significant in clock networks and is therefore not included in the analysis.

The dynamic power  $P_{dyn}$  of our oscillator circuits is directly measured using Ngspice. The dynamic power of a standard oscillator was estimated using

$$P_{dyn,osc} = \frac{3}{4Q} \pi V_{DD}^2 f_0 C_L \tag{5}$$

as done in prior works [5]. The Q of the circuit elements for Equation 5 was assumed to be 4.

The dynamic power of a buffered clock network is esimated using

$$P_{dyn,buff} = C_S V_{DD}^2 f_0 \tag{6}$$

where  $C_S$  is the total buffer and clock load capacitance,  $f_0$  is the resonant frequency, and  $V_{DD}$  is the supply voltage. As done in prior works [5], the total capacitance can be estimated based on the stage gain using a exponential horn sizing of buffers according to

$$C_S = C_L \frac{n}{n-1} \tag{7}$$

where n is the number of stages. For a 2pF load, we assumed 5 stages with a stage gain of 4.

The short-circuit power was measured using a model derived from Ngspice simulations. We measured the short-circuit current of a single buffer with a fanout of four using various input slew rates. Each of these buffers is assumed to be a buffered clock input of a sequential element. The total clock capacitance  $C_L$  was chosen as 2pF which represents approximately 500 of these buffers which is reasonable for a small design. The short-circuit current drawn for 10% and 25% of a 1GHz clock period slew rate is listed in Table I. In between these slew rates, linear interpolation is used to reflect the short-circuit current drawn. The total short-circuit power  $P_{sc}$  is scaled by the approximately 500 total buffers.

In the case of the buffered clock network, we must also consider the slew of each buffer in the buffer chain driving the clock load. In this case,  $P_{sc}$  must be estimated by using  $C_S$  instead of  $C_L$ .

#### **IV. RESULTS**

We performed annealing optimization using both the frequency-domain cost function and the comparison with an ideal square wave. We assume an ideal supply voltage of 1V. We also compare these results with a buffered clock with 10% slew rates and a fundamental-frequency resonant oscillator in Table II. The slew and power analysis shows that both harmonic oscillators show an improved slew and decreased power usage. Neither method achieved exactly our 1GHz clock period, but the frequencies were close.

The final parameters for the frequency-domain cost function and the square wave cost function are listed in Table III. While the values have similar ratios, the magnitudes are quite different. The square wave cost function used significantly smaller inductors and instead had a larger tank capacitance. This would require considerably less silicon area to implement.

A transient simulation of both the frequency-domain cost function and the square wave cost function are shown in Figures 4 and 6, respectively. Both results are multi-cyclic but stable. It is also apparent that the slew rates are significantly less than a pure sine wave, but that the square wave cost function achieves slightly better results. The frequency spectrum analysis for each in Figures 5 and 7 show that the third harmonic is just under 5GHz which is slightly larger than the ideal third harmonic.

All the generated waves exhibit significant voltage overshoot and undershoot, which is likely the cause of some of their gains in slew. This would also add additional overshoot stress on the chip which could be difficult to mitigate. Most resonant clocks, however, experience signal attenuation [4], [7], [12] due to the high resistance of the clock distribution interconnect. The shape of the output waveform is dependent on the relative positions and amplitudes of its harmonics.

Changes in the circuit elements due to design alterations or process variation could alter the resonant frequencies and cause unpredictable distortions in the wave's shape. The tank capacitance in both generated circuits is on the same order of magnitude as load capacitance of the clock network, so the overall capacitance of the circuit will be sensitive to changes in the clock tree. Even a comparatively small change could alter the frequency response. Since, unlike a resonant clock there are multiple harmonics, such a change could distort the wave unpredictably. In this way, our oscillator is more fragile than standard resonant frequency oscillators.

We expected the frequency-domain cost function would result in waveforms that deviate further from the target frequency because the cost function exclusively focuses on the third harmonic and it is not penalized for adjusting the fundamental frequency.

#### V. CONCLUSIONS

The power dissipation of clock distribution networks often accounts for 30-70% of total chip power. While resonant clock distributions offer a significant opportunity to reduce this power, they suffer from high slew rates at low operating frequencies. We presented a method for improving the slew



Fig. 4. Annealing based on the resonant frequency creates a multi-cycle wave with a smaller slew.



Fig. 5. The frequency response shows the ideal point for this analysis was a waveform whose resonant frequency was slightly off the target frequency. The multi-cycle nature of the transient response is shown by the fact that the 2nd peak is not an even multiple of the first.

rate using harmonic resonant oscillators and showed that a slew rate of 13.2% is achievable using the proposed technique. This enables a significant improvement in short-circuit power when compared to previous resonant oscillators and achieves power results superior to both resonant and buffered clock networks. Some of these gains in slew are the result of voltage overshoot/undershoot that the optimization leveraged.

The slew of a resonant clock is minimized significantly at the cost of relatively little power or area. The results, however, are heavily dependent on the constraints of the design; it may be impossible to generate a useful wave using this method for a given system given limitations on inductor and capacitor sizes. In addition, altering the inductance values from the annealed value could result in waveforms that create local peaks and cause unwanted switching in the sequential elements. Theo-

|                              | Actual Frequency | Rising Slew | Falling Slew | Avg. Psc | $P_{dyn}$ | $P_{total}$ |
|------------------------------|------------------|-------------|--------------|----------|-----------|-------------|
|                              | (GHz)            | (%)         | (%)          | (mW)     | (mW)      | (mW)        |
| Resonant Sine Wave           | 1.405            | 25.0%       | 25.0%        | 125.4    | 1.65      | 127.05      |
| Buffered Clock               | 1.405            | 10.0%       | 10.0%        | 114.4    | 2.11      | 116.51      |
| Resonant Frequency Annealing | 1.34             | 17.8%       | 18.0%        | 107.1    | 1.778     | 108.88      |
| Square Wave Annealing        | 1.47             | 13.2%       | 13.3%        | 95.08    | 1.620     | 96.7        |

| TA | BL | ĿΕ | Π |
|----|----|----|---|
|----|----|----|---|

THE SIMULATED RESULTS OF EACH OSCILLATOR. BOTH GENERATED MULTI-CYCLIC WAVES, SO THE SLEW VALUES REPRESENT THE AVERAGE ACROSS SEVERAL PERIODS. THE TARGET FREQUENCY WAS 1 GHz, HOWEVER THE ANNEALING RESULTS WERE COMPARED TO THE AVERAGE OF THEIR VALUES, 1.41 GHz.

|                              | Drain Inductance $(L_D)$ | Tank Inductance $(L_T)$ | Tank Capacitance $(C_T)$ |
|------------------------------|--------------------------|-------------------------|--------------------------|
| Resonant Frequency Annealing | 4.730 nH                 | 1.943 nH                | 1.056 pF                 |
| Square Wave Annealing        | 3.740 nH                 | 1.332 nH                | 1.796 pF                 |

TABLE III

THE IDEAL VALUES OF EACH CIRCUIT ELEMENT ACCORDING TO BOTH ANNEALING ALGORITHMS.



Fig. 6. Annealing by comparing the result to a square wave produced less variation in the peak voltage.

retical bounds could be set to limit these unwanted swings during the annealing process using additional penalty terms in the square wave comparison. A detailed sensitivity analysis is left as a topic of future study.

In the future, it is also possible to consider adding a 5th harmonic, but this is likely to be quite expensive since the resonant frequencies are proportional to  $\frac{1}{2\pi\sqrt{LC}}$ . In addition, directly integrating the power considerations into the annealing process could enable the optimization to directly trade short-circuit power and oscillator efficiency more readily. We have also found some modest improvement in the slew by biasing the Colpitts oscillator with a few mA of current. While this increases the oscillator power, it could potentially be a net benefit for a large CDN if the slew rates are further decreased.

#### ACKNOWLEDGEMENT

Thanks to Walter Condley for initial discussions on the topic. This work was supported in part by the National Science



Fig. 7. This waveform is further from the target frequency, but has less variation than the resonant frequency technique.

Foundation under grant CCF-1053838.

#### REFERENCES

- S Chan, P Restle, K Shepard, N James, and R Franch. A 4.6GHz resonant global clock distribution network. *International Solid-State Circuits Conference (ISSCC)*, pages 342 – 343, Feb 2004.
- [2] S C Chan, K L Shepard, and P J Restle. Design of resonant global clock distributions. *International Conference on Computer Design (ICCD)*, 2003.
- [3] V Chi. Salphasic distribution of clock signals for synchronous systems. IEEE Transactions on Computers, 43(5):597 – 602, May 1994.
- [4] W. Condley, X. Hu, and M.R. Guthaus. A methodology for local resonant clock synthesis using lc-assisted local clock buffers. In *International Conference on Computer-Aided Design (ICCAD)*, 2011.
- [5] A Drake, K Nowka, T Nguyen, J Burns, and R Brown. Resonant clocking using distributed parasitic capacitance. *IEEE Journal of Solid-State Circuits*, 39(9):1520 – 1528, Sep 2004.
- [6] S.C. Chan et al. A resonant global clock distribution for the cell broadband engine processor. JSSC, 44(1), 2009.
- [7] M. R. Guthaus. Distributed LC resonant clock tree synthesis. In International Symposium on Circuits and Systems (ISCAS), pages 1215– 1218, 2011.

- [8] X. Hu and M. R. Guthaus. Distributed resonant clock grid synthesis (ROCKS). In *Design Automation Conference (DAC)*, pages 516–521, 2011.
- [9] X. Hu and M. R. Guthaus. Library-aware resonant clock synthesis (LARCS). In Design Automation Conference (DAC), 2012.
- [10] E. D. Marsman, R. M. Senger, M. S. McCorquodale, M. R. Guthaus, R. A. Ravindran, G. S. Dasinka, S. A. Mahlke, and R. B. Brown. A 16-bit low-power microcontroller with monolithic MEMS-LC clocking. In *International Symposium on Circuits and Systems (ISCAS)*, pages 624–627, 2005.
- [11] F. O'Mahony, C Yue, M. Horowitz, and S Wong. Design of a 10GHz clock distribution network using coupled standing-wave oscillators. *Design Automation Conference (DAC)*, Jun 2003.
- [12] J Rosenfeld and E Friedman. Design methodology for global resonant h-tree clock distribution networks. *International Symposium on Circuits* and Systems (ISCAS), 2006.
- [13] V. Sathe, S. Arekapudi, C. Ouyang, M. Papaefthymiou, A. Ishii, and S. Naffziger. Resonant clock design for a power-efficient high-volume x86-64 microprocessor. In *ISSCC*, pages 68–70, 2012.
- [14] R. M. Senger, E. D. Marsman, M. S. McCorquodale, F. H. Gebara, K. L. Kraver, M. R. Guthaus, and R. B. Brown. A 16-bit mixed-signal microsystem with integrated CMOS-MEMS clock reference. In *Design Automation Conference (DAC)*, pages 520–525, June 2003.
- [15] B Taskin, J Demaio, O Farell, M Hazeltine, and R Ketner. Custom topology rotary clock router with tree subnetworks. *Transactions on Design Automation of Electronic Systems (TODAES)*, 14(3), May 2009.
- [16] Richard J. Wagner. Python module for simulated annealing. University of Michigan, 2009.
- [17] J Wood, T C Edwards, and S Lipa. Rotary traveling-wave oscillator arrays: A new clock technology. *IEEE Journal of Solid-State Circuits* (JSSC), 36(11):1654–1664, 2001.
- [18] Z Yu and X Liu. Implementing multiphase resonant clocking on a finite-impulse response filter. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 17(11):1593 – 1601, Nov 2009.
- [19] C Ziesler, S Kim, and M Papaefthymiou. A resonant clock generator for single-phase adiabatic systems. *International Symposium on Low-Power Electronics and Design (ISLPED)*, Aug 2001.