# All-Digital Fast-Locked Synchronous Duty-Cycle Corrector

Shao-Ku Kao and Shen-Iuan Liu, Senior Member, IEEE

Abstract—An all-digital fast-locked synchronous duty-cycle corrector is presented. It corrects the duty cycle and synchronizes the input and output clocks in 10 clock cycles. The proposed circuit has been fabricated in a 0.18- $\mu$ m CMOS technology. The measured duty-cycle error is between 1.5% and -1.4% for the input duty cycle of 40%  $\sim 60\%$ . The measured peak-to-peak jitter is 12.9 ps at 1 GHz. The measured operation frequency range is from 0.8 GHz to 1.2 GHz.

Index Terms—All-digital, duty-cycle corrector (DCC), fast-locked.

## I. INTRODUCTION

clock with 50% duty cycle is very important in many applications such as DDR-SDRAMs and double-sampling analog-to-digital converters. To double the data rate, both positive and negative transition edges of a clock are utilized. However, the duty-cycle distortion of a clock occurs owing to the unmatched rising time and falling time in the clocking paths. Thus, the duty-cycle corrector (DCC) for a clock is needed There are two categories to implement the DCC in literature: the feedback type [1]–[6] and nonfeedback one [7]–[9]. The analog [1]–[3] and digital [4]–[6] feedback DCCs had been presented. The analog DCC [1]–[3] is normally embedded in the phase-locked loop (PLL) [10] or delay-locked loop (DLL) [11] to generate a clock with 50% duty cycle and achieves the synchronization. However, the complementary clocks and the long locked time are needed.

One [7] of the digital nonfeedback DCCs utilizes the complementary clocks and the interpolation to generate 50% duty-cycle clocks. The complementary clocks are needed and the interpolator limits the operation frequency range. Another digital nonfeedback DCCs [8], [9] require only a single-ended clock and achieve a fast locked time. In [8], [9], the time-to-digital conversion is used to detect the period or duty cycle of the input clock. However, the duty cycle of the input clock is distorted easily in the delay line, which limits the accuracy of the recovered duty cycle. Moreover, the skew between input and output clocks in [8], [9] is not corrected.

In this brief, an all-digital fast-locked synchronous DCC is presented to correct the duty cycle and skew of the clock simultaneously by using a single-ended clock. The proposed synchronous DCC detects the rising and falling edges of input clock

The authors are with the Graduate Institute Electronics Engineering and Department of Electrical Engineering, National Taiwan University, Taipei 10617, Taiwan, R.O.C. (e-mail: lsi@ cc.ee.ntu.edu.tw).

Digital Object Identifier 10.1109/TCSII.2006.885396



Fig. 1. Proposed all-digital synchronous DCC.



Fig. 2. Timing diagram of the interpolator.

separately. The duty cycle of the input clock is not distorted by the delay line compared with [8], [9]. The duty cycle and skew of the clock is corrected in ten clock cycles. With a fast locked time, the circuit can increase the interfacing speed and made a fast wakeup function come true, such as in the SDRAM. This brief is organized as follows. Section II gives the circuit description. Section III gives the performance analysis. The experimental results are given in Section III. Finally, the conclusions are given in Section IV.

#### II. CIRCUIT DESCRIPTION

The proposed all-digital synchronous DCC is shown in Fig. 1. This DCC is composed of a clock generator, an interpolator and a synchronization circuit. The clock generator realizes two in-phase clocks, A and B, with the complementary duty cycles as shown in Fig. 2 and their period is equal to that  $(T_{\rm clk})$  of the input clock.

By using the time-to-digital conversion, the low-level pulsewidth of the input clock is measured. The clock generator realized the clock B with the low-level pulsewidth of  $T_L$  which is approximately equal to that of the input clock. Similarly, the clock A is realized with the inverse duty cycle with respect to the clock B. The interpolated clock, "Clk-int," is realized by interpolating the clocks, A and B. After interpolation, the level-level pulsewidth of the interpolated clock is given as

$$T_{L,\text{Clk-int}} = 0.5(T_{\text{clk}} - T_L) + 0.5T_L = 0.5T_{\text{clk}}.$$
 (1)

Therefore, the interpolated clock with 50% duty cycle is achieved. Furthermore, the synchronization circuit not only generates the output clocks but also reduces the skew between the output and input clocks as well. The detail circuit description is given as follows:

Manuscript received December 7, 2005; revised August 4, 2006. This paper was recommended by Associate Editor G. R. Hellestrand.



Fig. 3. (a) Clock generator. (b) Timing diagram for the clock generator.

#### A. Clock Generator and Interpolator

The clock generator is composed of one-shot circuits, a divided-by-two circuit, two delay lines, DFFs, latches, and some logic circuits as shown in Fig. 3(a).

When the signal, "Enable," goes high, the clock generator is active. The input clock is divided by two to generate the signal, "TDC-A," for DFFs. The dummy delay is to mimic the delay of the DFF which generates the signal, "TDC-A." The falling edge of the input clock passes the one-shot circuit to generate the signal, "Fall." The signal, "Fall," passes through the first delay line to generate the multiphase clocks for DFFs. The signals, "Start-A" and "Stop-A," are enabled by the rising edge of "Fall" and the falling edge of "TDC-A," respectively. The timing diagram is shown in Fig. 3(b).

The time between the rising edge of "Fall" and the falling edge of "TDC-A" is equal to that between the rising edges of "Start-A" and "Stop-A." It is equivalent to measure the lowlevel pulsewidth  $T_L$  of the input clock. This time difference is quantized by the AND gates in the first delay line [the first delay line in Fig. 3(a)] and it is expressed as

$$T_L \approx n \cdot T_D$$
 (2)

where  $T_D$  is the delay of an AND gate in the first delay line and n ( $1 \le n \le 15$ ) is the measured number of the AND gates. The signal, "Start-A," enables the DFFs and the signal, "Stop-A,"





Fig. 5. (a) Synchronization circuit. (b) Timing diagram of the synchronization circuit.

latches the measured code. The measured code is decoded by the exclusive-OR (XOR) gates to determine the transition point from logical ONE to logical ZERO among C1-C15. One of C1-C15 will turn on the corresponding transmission gate to generate the signal, "Fall-A."

The dummy DFFs are connected to the second delay line to match the first one. In the next cycle, the signal, "Rise," is also delayed with the same time to generate the signal, "Rise-A." The rising and falling edges of the clock A are set and reset by "Fall-A" and "Rise-A," respectively. Similarly, the rising and falling edges of the clock B are also set and rest by "Rise" and "Fall," respectively. Therefore, the duty cycle of the clock B is equal to that of the input clock. The clock A has the inverse duty cycle with respect to the clock B.

The interpolator is realized by three inverters [12] as shown in Fig. 4. And, the clocks, A and B, are interpolated to realize the interpolated clock, "Clk-int," with 50% duty cycle.

## B. Synchronization Circuit

The synchronization circuit is composed of one-shot circuits, two matched delay lines, DFFs, latches, XOR gates, two dummy delays and some logic circuits as shown in Fig. 5(a). Since the



Fig. 6. Die photo.



Fig. 7. Measured output clock with input clock of 1 GHz and 40% duty cycle.



Fig. 8. Measured output clock with input clock of 1 GHz and 50% duty cycle.

same circuit is adopted for the clock generator and synchronization circuit, there are several advantages: First, since the rising/falling times of the clock for both circuits are equal, the 50% duty cycle of the clock through the synchronization circuit is kept. Second, since the same circuit is used to detect the duty cycle and skew, the operational frequency range for both the clock generator and the synchronization circuit are equal under process and supply variations. Third, it saves the layout time as well.

The rising and falling edges of the interpolated clock, "Clkint," are regenerated by the one-shot circuits. The rising and falling edges pass the two matched delay lines to maintain the duty cycle. Therefore, the output clock, "Clk-out," still keeps the 50% duty cycle. The dummy delay A is to mimic the delay of the DFF which generates the signal, "TDC-A." The dummy delay B is to mimic the SR-latch and the output clock buffer. That is

$$T_{\text{dummv-delav-}B} = T_{\text{SR-latch}} + T_{\text{output-buffer}}$$
(3)

where  $T_{\text{SR-latch}}$  is the delay of the SR-latch and  $T_{\text{output-buffer}}$  is that of the output clock buffer. Initially, the digit D1 is set to high and the remaining digits among D2-D15 are set to low. And



Fig. 9. Measured output clock with input clock of 1 GHz and 60% duty cycle.



Fig. 10. Simulated and measured duty-cycle errors for different input duty cycles at 1 GHz.



Fig. 11. Simulated and measured static phase errors for different input duty cycles.

the dummy delay B is selected by the multiplexer. When both the signals, "Stop-B" and "RST," go high, the multiplexer will select the path without the dummy delay B. The skew between Clk-in an Clk-out is corrected as follows. The synchronization circuit is turned on by the signal, "RST," after the four cycles of the signal, "TDC-A." The timing diagram is shown in Fig. 5(b).



Fig. 12. Measured jitter of the output clock at 1 GHz.

The skew between the input clock, "Clk-in," and the signal, "Rise-int," is expressed as

$$T_{\text{skew}} = T_1 + T_{\text{dummy-delay-}B} \tag{4}$$

where  $T_1$  is the skew between the input clock, "Clk-in," and interpolated clock, "Clk-int," and  $T_{dummy-delay-B}$  is the delay of the dummy delay B. The rising edge of the signal, "Rise-int," enables the signal, "Start-B." "Start-B" allows the time-to-digital conversion to detect the skew of  $T_{clk} - T_{skew}$ . The falling edge of the signal, "TDC-A," enables the signal, "Stop-B " to stop the time-to-digital conversion. Similarly, the signal, "Rise-int," passes through the third delay line to generate the multiphase clocks for DFFs. The signal, "Stop-B," also latches the codes, D1-D15, and stops the DFFs. By using the similar time-to-digital conversion, the skew,  $T_{clk} - T_{skew}$ , is quantized as

$$T_{\rm clk} - T_{\rm skew} \approx k \cdot T_D \tag{5}$$

where  $T_D$  is also the unity delay of an AND gate in the third delay line and k is the measured number of the AND gates. The digital codes  $D1 \sim D15$  will turn on one of the corresponding transmission gates in the synchronization circuit. It is equivalent to insert the fixed delay between the signals, "Rise-int" and "Rise-delay." Moreover, assume the third and fourth delay lines are matched. The same delay is also inserted between the signals, "Fall-int" and "Fall-delay." After two cycles that the signal, "RST," goes high, the delay between the ouput clock, "Clk-out," and the input clock, "Clk-in," is corrected as

$$T_{\text{skew}} + (T_{\text{clk}} - T_{\text{skew}}) + T_1 + (T_{\text{clk}} - T_{\text{skew}}) + T_{\text{SR-latch}} + T_{\text{output-buffer}} = 2 \cdot T_{\text{clk}}.$$
 (6)

Thus, the output clock is aligned with the input clock. The total locked time needs 10 cycles.

## **III. EXPERIMENTAL RESULTS**

The proposed circuit has been fabrication in a 0.18- $\mu$ m CMOS process. Its die photo is shown in Fig. 6 and the active area is  $0.86 \times 0.26$  mm<sup>2</sup>. The supply voltage is 1.8 V and the power consumption is 15 mW. The open-drain output buffers are used in the proposed circuit. In the circuit simulations, the pad, the bond wire, and the output load are considered. The capacitance of the pad is 150 fF and the inductance of the bond wire is 2.4 nH. The output load is around 20 pF parallel with a resistor of 50  $\Omega$ . These side effects are added in HSPICE simulation to mimic the measurement environments. Figs. 7–9 show the measured output clocks with the corrected 50% duty cycle while the input clock of 1 GHz with 40%, 50%, and 60% duty cycles, respectively.

Fig. 10 also gives the measured and HSPICE simulated dutycycle errors with resepct to different input duty cycles at 1 GHz. In Fig. 10, the simulated duty-cycle errors are between -1.5%and 1.3% with 40 ~ 60% input duty cycles at 1 GHz. And, the measured duty-cycle error is between 1.5% and -1.4% with 40 ~ 60% input duty cycles at 1 GHz.

Fig. 11 gives the measured and HSPICE simulated static phase errors with different input duty cycles at 1 GHz. From Fig. 11, the measured phase error is between -55 ps and 44 ps. And the simulated phase error is less than 53 ps. The circuits are simulated by HSPICE with the typical corner and temperature of 40 degree. The supply voltage is 1.8 V. There are several reasons why the measurement results deviate from the simulation ones. First, the delay time among the unit delay cells varies owing to the processing variations. Second, the unmatched bond wires among pads and the board may cause the disagreements between measurement and simulation results. Third, the measured error comes from the unmatched cables connecting between the input/ output clock and the testing equipment.

|                     | T1 : 1              | E 43             | F 73 M                | F.(1)      | [7]          | F01#         | 503                 |
|---------------------|---------------------|------------------|-----------------------|------------|--------------|--------------|---------------------|
|                     | I his work          | [4]              | [5]*                  | [6]*       | [/]          | [8]*         | [9]                 |
| Туре                | Non-feedback        | Feedback         | Feedback              | feedback   | Non-Feedback | Non-feedback | Non-feedback        |
| Frequency           | 0.8-1.2GHz          | 25-250MHz        | 0.8-1.7GHz            | 1GHz       | 2.5GHz       | 400MHz       | 250-600MHz          |
| range               |                     |                  |                       |            |              |              |                     |
| Correction          | 40%~60%             | $14\% \sim 86\%$ | 20%~80%               | 15%~85%    | 30%~70%      | 2%~98%       | 40%~60%             |
| range               |                     |                  |                       |            |              |              |                     |
| Align with input    | Yes                 | Yes              | No                    | No         | No           | Yes          | Yes                 |
| clock               |                     |                  |                       |            |              |              |                     |
| Total locking       | <10 cycles          | NA               | NA                    | >50 cycles | NA           | >30 cycles   | >10 cycles          |
| cycle               |                     |                  |                       |            |              |              |                     |
| Duty cycel error    | -1.5% ~ 1.4%        | NA               | ±0.25%                | ±0.6%      | ±3%          | -1.2% ~ 2.2% | ±0.7%               |
| Jitter              | 12.9ps @ 1GHz       | 44ps @           | NA                    | NA         | NA           | NA           | 64ps @              |
|                     |                     | 250MHz           |                       |            |              |              | 600MHz              |
| Power               | 15mW                | NA               | 3.2mW                 | 8.2mW      | NA           | NA           | 10mW                |
| Area                | 0.23mm <sup>2</sup> | NA               | 0.0075mm <sup>2</sup> | NA         | NA           | NA           | 0.37mm <sup>2</sup> |
| Process             | 0.18µm              | 0.11µm           | 0.18µm                | 0.18µm     | 0.25µm       | 0.25µm       | 0.35µm              |
| * Simulation Result |                     |                  |                       |            |              |              |                     |

 TABLE I

 Comparison With Previous Works



Fig. 13. The measured output clock with input clock of  $1.2\,\mathrm{GHz}$  and 50% duty cycle.

Fig. 12 shows the measured peak-peak jitter of the output clock at 1 GHz is 12.9 ps. Fig. 13 shows the measured output clock with the desired 50% duty cycle when input clock is 1.2 GHz with 50% duty cycle. The measured input frequency range is 800 MHz  $\sim$  1.2 GHz. The comparisions among the proposed circuit and several previous works are listed in Table I. The proposed circuit achieves a fast-locked time among these works. It allows the fast wakeup time once the circuit is enabled after power down. The proposed circuit synchronizes the input and output clocks simultaneously by using a single-ended input clock. The input duty-cycle range of the proposed circuit is limited by the interpolator [12].

# IV. CONCLUSION

An all-digital synchronous DCC is presented. The duty cycle and skew are corrected in 10 cycles. The proposed all-digital synchronous DCC operates with the frequency range of  $0.8 \sim$ 1.2 GHz and  $40\% \sim 60\%$  duty cycles. The proposed circuit has been fabricated in a CMOS 0.18- $\mu$ m porcess.

# ACKNOWLEDGMENT

The authors would like to thank National Chip Implementation Center (CIC), Taiwan, for fabricating this chip.

#### REFERENCES

- [1] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and T. Ishikawa, "A 2.5 V CMOS delay-locked loop for 18 Mbit 500 megabytes/s DRAM," *IEEE J. Solid-State Circuits*, vol. 29, no. 12, pp. 1491–1496, Dec. 1994.
- [2] Y. J. Jung, S. W. Lee, D. Shim, W. Kim, C. Kim, and S. I. Cho, "A dual-loop delay-locked loop using multiple voltage-controlled delay lines," *IEEE J. Solid-State Circuits*, vol. 36, no. 5, pp. 784–791, May 2001.
- [3] J. Lee and B. Kim, "A low-noise fast-lock phase-locked loop with adaptive bandwidth control," *IEEE J. Solid-State Circuits*, vol. 35, no. 8, pp. 1137–1145, Aug. 2000.
- [4] K. H. Kim, G. H. Cho, J. B. Lee, and S. I. Cho, "Built-in duty-cycle corrector using coded phase blending scheme for DDR/DDR2 synchronous DRAM application," in *Dig. Tech. Papers Symp. VLSI Circuits*, Jun. 2003, pp. 287–288.
- [5] Y. C. Jang, S. J. Bae, and H. J. Park, "CMOS digital duty-cycle correction circuit for multi-phase clock," *Electron. Lett.*, vol. 39, pp. 1383–1384, Sep. 2003.
- [6] J. J. Nam and H. J. Park, "An all-digital CMOS duty-cycle corrector circuit with a duty-cycle correction range of 15-to-85% for multi-phase applications," *IEICE Trans. Electron.*, vol. 88, pp. 773–777, Apr. 2005.
- [7] K. Nakamura, M. Fukaishi, Y. Hirota, Y. Nakazawa, and M. Yotsuyanagi, "A CMOS 50% duty-cycle repeater using complementary phase blending," in *Dig. Tech. Papers Symp. VLSI Circuits*, Jun. 2000, pp. 48–49.
- [8] Y. M. Wang and J. S. Wang, "An all-digital 50% duty-cycle corrector," in Proc. IEEE Int. Circuits Syst. Symp., May 2004, vol. 2, pp. 925–928.
- [9] C. Jeong, C. Yoo, J. J. Lee, and J. Kih, "Digital delay-locked loop with open-loop digital duty-cycle corrector for 1.2 Gb/s/pin double data rate SDRAM," in *Proc. 30th European Solid-State Circuits Conf.*, Sep. 2004, pp. 379–382.
- [10] P. K. Hanumolu, B. Casper, R. Mooney, G. Y. Wei, and U. K. Moon, "Analysis of PLL clock jitter in high-speed serial links," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 50, no. 11, pp. 879–886, Nov. 2003.
- [11] J. Kim, M. A. Horowitz, and G. Y. Wei, "Design of CMOS adaptivebandwidth PLL/DLLs: A general approach," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 50, no. 11, pp. 860–869, Nov. 2003.
- [12] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang, C. V. Tran, C. L. Portmann, D. Stark, Y.-F. Chan, T. H. Lee, and M. A. Horowitz, "A portable digital DLL for high-speed CMOS interface circuits," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 632–644, May 1999.