# A Jitter-Tolerance-Enhanced CDR Using a GDCO-Based Phase Detector

Che-Fu Liang, Student Member, IEEE, Sy-Chyuan Hwu, and Shen-Iuan Liu, Senior Member, IEEE

Abstract—A jitter-tolerance-enhanced 10 Gb/s clock and data recovery (CDR) circuit is presented. The proposed architecture cascades 2 half-rate CDRs with different loop bandwidth to relax the design bottleneck and the predicted jitter tolerance can be enhanced without sacrificing the jitter transfer. By using a gated digital-controlled oscillator (GDCO), the proposed GDCO-based phase detector may reduce the cost of this architecture and achieve a wide linear range. This CDR circuit has been fabricated in a 0.13  $\mu$ m CMOS technology and consumes 60 mW from a 1.5 V supply. It occupies an active area of 0.36 mm<sup>2</sup>. The measured rms jitter is 0.96 ps and the peak-to-peak jitter is 7.11 ps for a 10 Gb/s 2<sup>7</sup> – 1 PRBS. The measured bit error rate for a 10 Gb/s 2<sup>7</sup> – 1 PRBS is less than 10<sup>-12</sup>.

Index Terms—Clock and data recovery, jitter tolerance, jitter transfer.

#### I. INTRODUCTION

TITTER tolerance indicates the maximum sinusoidal jitter **J** that a clock and data recovery (CDR) circuit must tolerate under a specified bit-error rate (BER). Conventional CDR circuits are built with the concept of phase-locked loop (PLL). A PLL-based CDR circuit has a jitter tolerance, which is inversely proportional to the jitter frequency [1]. As shown in Fig. 1(a), for a PLL-based CDR circuit, a wide loop bandwidth is desired to tolerate the high-frequency jitter. However, as shown in Fig. 1(b), a reduced loop bandwidth is required to suppress jitter and achieve a good jitter transfer. Hence, enhancing the jitter tolerance by only increasing the loop bandwidth degrades the jitter transfer and may not be accepted for some applications such as data repeaters [1]. In traditional optical receivers [2], [3] without jitter tolerance enhancement, the jitter tolerance at higher jitter frequency (tens of MHz) is hard to exceed 0.5 UIpp (UI: unit interval). One of the remedies is to adopt the analog phase shifter such as a delay-locked loop (DLL) [4], [5]. The wide-bandwidth DLL absorbs the input jitter embedded in the incoming data and allows the main CDR circuit to recover data correctly. This technique is effective; however, high power consumption and considerable chip area are required especially for multi-Gb/s applications. It is because an analog delay line with a wide range of several UIs is required to delay the input data. And the wideband analog delay line is also needed not to distort the input data. For example, in [5] a standalone 10 Gb/s CDR circuit demonstrates a power consumption over 500 mW, which is four

The authors are with the Graduate Institute of Electronics Engineering and Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 10617, R.O.C. (e-mail: lsi@cc.ee.ntu.edu.tw).

Digital Object Identifier 10.1109/JSSC.2008.920322

times larger than a traditional PLL-based counterpart [3] even using the same technology. A digital equivalent for the so-called blind-oversampling CDR circuit has been presented in [6], but the complex digital blocks still make it unsuitable for multi-Gb/s applications. In this work, a jitter-tolerance-enhanced CDR circuit is presented. The proposed architecture cascades 2 half-rate CDRs with different loop bandwidths to relax the design bottleneck and the predicted jitter tolerance can be enhanced without sacrificing the jitter transfer. In this CDR circuit, a phase detector (PD) is realized by using a half-rate gated digital-controlled oscillator (GDCO). This half-rate GDCO behaves like a wideband CDR circuit and it de-multiplexes the input data into two parallel half-rate data streams. It also extracts the embedded clock from the input data for the subsequent PLL-based CDR with lower loop bandwidth.

This paper is organized as follows. In Section II, the proposed jitter-tolerance-enhancing CDR architecture is introduced. The details of the individual blocks are shown in Section III. The experimental results are given in Section IV and the conclusions are made in Section V.

# II. PROPOSED JITTER-TOLERANCE-ENHANCING CDR ARCHITECTURE

To meet both jitter transfer and jitter tolerance requirements, the linear CDR circuits are widely used [2], [3], [7]. Let the phase of input data modulated by sinusoidal jitter and that of the recovered clock for a linear CDR circuit be  $\phi_D(s)$ and  $\phi_{OUT}(s)$ , respectively. Under the assumptions that the whole system is linear and has periodic data transitions, the jitter transfer, |H(s)|, and the jitter tolerance,  $|J_{TOL}(s)|$ , are expressed as [1]

$$|H(s)| = \left| \frac{\phi_{\text{OUT}}(s)}{\phi_D(s)} \right| \tag{1a}$$

$$\left|J_{\text{TOL}}(s)\right| = \left|\frac{1}{1 - H(s)}\right| (\text{UIpp}) \tag{1b}$$

where UIpp in (1b) means the peak-to-peak jitter amplitude normalized in UI. However, the jitter tolerance in (1b) may be too optimistic. Actually, a conventional CDR circuit may suffer from several non-linear effects, such as finite linear range for a PD, static sampling offset and duty-cycle distortion of the recovered clock. These non-linear effects decrease the jitter tolerance as well. For example, due to the meta-stability in D flip-flops (DFFs), the PD has gain distortion when the phase error is large. As shown in [7], the simulated transfer curve of a linear half-rate PD has a limited linear range less than 0.8 UIpp.

To enhance the jitter tolerance, the basic idea is to cascade two non-full-rate CDR circuits with different loop bandwidths. A half-rate example is shown in Fig. 2. The first CDR circuit

Manuscript received September 29, 2007; revised January 17, 2008. This work was supported in part by MediaTek Inc. and the National Science Council, Taiwan.

Jitter Frequency

(log scale)

Reduce CDR's bandwidth for

better jitter suppression

(b)

Fig. 1. Trade-off between jitter tolerance and jitter transfer. (a) Jitter tolerance. (b) Jitter transfer.

(a)

Increase CDR's bandwidth

for better jitter tolerance

Jitter Frequency

(log scale)



Jitter Transfe

Fig. 2. Proposed jitter-tolerance-enhancing technique.

Jitter Tolerance

 $(CDR_{HBW})$  with a higher bandwidth is used to demux the input data into two half-rate error-free data streams. The second CDR circuit  $(CDR_{LBW})$  with a lower bandwidth filters out the jitter embedded in the data. Since the second CDR circuit deals with the half-rate data, the tolerable phase error between input data and recovered clock are doubled.

Let the jitter transfer of the first and the second CDR circuits be  $|H_{\text{HBW}}(s)|$  and  $|H_{\text{LBW}}(s)|$ , respectively. Assume that  $|H_{\text{HBW}}(s)|$  is flat and its bandwidth is much wider than that of  $|H_{\text{LBW}}(s)|$ . The cascaded jitter transfer,  $|H_{\text{CAS}}(s)|$ , is approximated as

$$|H_{\text{CAS}}(s)| = |H_{\text{HBW}}(s)| \cdot |H_{\text{LBW}}(s)| \approx |H_{\text{LBW}}(s)|. \quad (2)$$

Note that the jitter transfer is dominated by the response of the second CDR circuit.

To derive the system's jitter tolerance  $|J_{\text{TOL}\_\text{SYS}}(s)|$ , we assume  $\text{CDR}_{\text{HBW}}$ 's bandwidth is high enough, so it faithfully transfers the input jitter and generates two error-free half-rate data streams to  $\text{CDR}_{\text{LBW}}$ . Note that  $\text{CDR}_{\text{LBW}}$  retimes the output data streams of  $\text{CDR}_{\text{HBW}}$  instead of the input data. Let the jitter tolerance of  $\text{CDR}_{\text{LBW}}$  be  $|J_{\text{TOL}\_\text{LBW}}(s)|$ . To have the system's jitter tolerance  $|J_{\text{TOL}\_\text{SYS}}(s)|$ , we may firstly calculate  $|J_{\text{TOL}\_\text{LBW}}(s)|$  and refer it to the system's input. Then the system's jitter tolerance is derived by dividing  $|J_{\text{TOL}\_\text{LBW}}(s)|$  by the transfer function of  $\text{CDR}_{\text{HBW}}$ . Finally, we may write  $|J_{\text{TOL}\_\text{SYS}}(s)|$  as

$$|J_{\text{TOL}\_\text{SYS}}(s)| = \frac{|J_{\text{TOL}\_\text{LBW}}(s)|}{|H_{\text{HBW}}(s)|}.$$
(3)

Observing from (3), there may be a misunderstanding that the bandwidth of the  $CDR_{HBW}$  should be kept as small as possible so the jitter tolerance can be increased dramatically. However, this is contrary to our previous assumption. Remember that

 $CDR_{HBW}$  should generate two error-free half-rate data streams to  $CDR_{LBW}$ . As a result, the bandwidth of  $CDR_{HBW}$  should be high enough to tolerate all the input jitter. Intuitively speaking,  $CDR_{HBW}$  may be deemed an ideal multiplexer with nearly no disturb on the data jitter.

Now we may continue to derive  $|J_{\text{TOL}\_\text{LBW}}(s)|$  according to (1b). At the input of  $\text{CDR}_{\text{LBW}}$ , the input data have been demuxed into two half-rate data streams. To avoid confusion,  $|J_{\text{TOL}\_\text{LBW}}(s)|$  is represented as a non-normalized form for two 5 Gb/s demuxed data streams as

$$|J_{\text{TOL-LBW}}(s)| = \frac{200 \text{ ps}}{|1 - H_{\text{LBW}}(s)|}.$$
 (4)

Referring to (3) and normalizing the system's jitter tolerance to the input data rate, now the system's jitter tolerance can be defined as

$$|J_{\text{TOL}_{SYS}}(s)| = \frac{200 \text{ ps}}{|H_{\text{HBW}}(s)| \cdot |(1 - H_{\text{LBW}}(s))|} = \frac{2}{|H_{\text{HBW}}(s)| \cdot |(1 - H_{\text{LBW}}(s))|} (\text{UIpp}).$$
(5)

Compared with conventional CDRs, the proposed CDR circuit has the jitter tolerance ideally improved by a factor of two and its jitter transfer remains nearly unchanged. Note that the jitter tolerance can be improved further by adopting a multiple-demuxed rate, such as quarter rate. For example, the numerator in (5) changes to four if a quarter-rate architecture is adopted. Note that (5) is derived based on a similar approach as that in [1]. We did not consider the consecutive identical digits (CIDs), which may cause reduction of jitter tolerance. With the long CIDs and high jitter frequency, no edge information is available. So that the best a CDR circuit can do is sampling at the middle of the eye and the maximum jitter tolerance is limited to 1 UIpp.

In previous discussions, we assume that the bandwidth of CDR<sub>HBW</sub> is very high. However, building a conventional CDR with a bandwidth of tens of MHz is not practical. To realize the first CDR circuit efficiently, the open-loop CDR architectures using the gated voltage-controlled oscillator (GVCO) [8]–[11] can be adopted. They track the input data faithfully and achieve a good jitter tolerance. In addition, the magnitude of the jitter transfer for these open-loop CDR circuits is not frequency dependent and has a unity gain. In this work, a 4-bit gated digital-controlled oscillator (GDCO), whose oscillation frequency can be adjusted with finite frequency step, is adopted to realize the required wideband CDR circuit in Fig. 1. Its frequency is digitally preset at the vicinity of the desired frequency; that is 5 GHz in this work. Then a conventional PLL-based CDR circuit is used to realize the second (lower bandwidth) CDR in Fig. 2 to achieve the required jitter transfer function. Based on a simple linear model, the jitter tolerance of a GDCO,  $|J_{\text{TOL-GDCO}}(\omega)|$ , with respect to jitter frequency, frequency offset, and PRBS run length can be derived as [11]:

$$|J_{\text{TOL}\_\text{GDCO}}(\omega)| = \left| \frac{2 \cdot \left( 0.5 - \frac{\Delta f}{f_{\text{nom}}} \cdot K - U(t) \right)}{2 \cdot \sin\left(\frac{\omega \cdot T_b \cdot K}{2}\right)} \right|$$
$$\approx \left| \frac{2 \cdot \left( 0.5 - \frac{\Delta f}{f_{\text{nom}}} \cdot K - U(t) \right)}{\omega \cdot T_b \cdot K} \right| (\text{UIpp})$$
(6

where  $\Delta f$  means the frequency offset of a GDCO,  $f_{\text{nom}}$  represent the nominal frequency of a GDCO, K is the PRBS run length, U(t) is the sampling offset,  $\omega$  is the jitter frequency, and  $T_b$  is the bit time. In our 10 Gb/s CDR design, with a frequency offset of 50 MHz, a nominal frequency of 5 GHz, a  $2^7 - 1$  PRBS, a jitter frequency of 80 MHz, and no sampling offset the GDCO represents a jitter tolerance of 2.44 UIpp, which is still higher than (5) in our jitter frequency of concern (DC ~ 80 MHz). Hence, (5) may provide a good approximation of jitter tolerance for our prototype. Note that the simplified result in (6) is valid only for jitter frequency of tens of MHz. With jitter frequency at several GHz, the GDCO has a worst-case jitter tolerance of 0.5 UIpp, which is 50% of that of conventional CDR circuits.

The proposed half-rate CDR circuit is shown in Fig. 3. It is composed of a GDCO-based Phase Detector (GPD), a loop filter, DFFs, and a voltage-controlled oscillator (VCO). The GPD consists of a half-rate GDCO, DFFs, two divide-by-4 dividers, a charge pump (CP), and a phase-frequency detector (PFD). Parts of the GPD, including the half-rate GDCO and DFFs, have the same function as the CDR<sub>HBW</sub> in Fig. 2 and they extract the embedded 5 GHz half-rate clock ( $CK_{GDCO}$ ) from the input data and demuxes the 10 Gb/s input data into two parallel 5 Gb/s data streams (GDCO Data). On the other hand, the loop filter, DFFs, dividers, PFD, CP, and a VCO as a whole provide the function of  $CDR_{LBW}$  in Fig. 2. Unlike a traditional PLL-based CDR, the following low-bandwidth CDR loop locks to a recovered clock rather than random data so the PFD can be used. This increases the linear range of the GPD, which will be discussed later. However, the GPD still needs a frequency calibration circuit to set the GDCO's frequency, so we call it a GPD rather than a GDCO-based Phase-Frequency Detector (GPFD).



Fig. 3. Proposed half-rate CDR circuit.



Fig. 4. Predicted jitter transfer of the proposed CDR circuit.

The jitter transfer of the proposed CDR circuit is determined as follows. The jitter transfer of a GDCO is not frequency dependent and has a unity gain, which will be discussed in Section III. Referring to (2), the jitter transfer of the proposed CDR circuit in Fig. 3 is given as

$$|H_{\text{CAS}}(s)| = \left| \frac{I_{\text{CP}} \cdot (1 + s \cdot RC) \cdot \frac{K_{\text{VCO}}}{C \cdot N}}{s^2 + I_{\text{CP}} \cdot (1 + s \cdot RC) \cdot \frac{K_{\text{VCO}}}{C \cdot N}} \right|$$
(7)

where N(= 4) is the division ratio,  $K_{\rm VCO}$  is the VCO's gain,  $I_{\rm CP}$  is the CP current and RC is the time constant of the passive loop filter. The parameters for this proposed CDR circuit are  $K_{\rm VCO} = 260 \text{ MHz/V}$ ,  $I_{\rm CP} = 400 \ \mu\text{A}$ ,  $R = 820 \ \Omega$ , and C = 10 nF. The predicted jitter transfer based on (7) is shown in Fig. 4. The corner frequency of the jitter transfer is around 4 MHz and the jitter peaking is 0.05 dB, which pass the SONET OC-192 specifications.



Fig. 5. Transfer curves of a traditional PD and the proposed GPD.



Fig. 6. Predicted jitter tolerance for the proposed and the conventional CDR circuits.

Conventional linear PDs in a CDR have limited linear range less than  $\pm 0.5$  UI [7]. By using the divide-by-4 dividers and a PFD in the proposed GPD, larger phase difference is allowed and the linear range of the GPD is extended to  $\pm 8$  UI, which guarantees the linear operation of the proposed CDR circuit. The transfer curves of a traditional linear PD and the proposed GPD for 10 Gb/s CDRs are plotted in Fig. 5 for comparison. Because the GPD demuxes the input data stream into two parallel error-free 5 Gb/s data streams, the main CDR's tracking range is extended to  $\pm 1$  UI. Referring to (5), the jitter tolerance of the proposed CDR circuit is given as

$$|J_{\text{TOL}_{SYS}}(s)| = \left| \frac{2 \cdot \left(s^2 + I_{\text{CP}} \cdot (1 + s \cdot RC) \cdot \frac{K_{\text{VCO}}}{C \cdot N}\right)}{s^2} \right| (\text{UIpp}). \quad (8)$$

The predicted jitter tolerances for the proposed CDR circuit based on (8) and the conventional one with the same loop bandwidth are plotted in Fig. 6. Under the assumptions that the whole system is linear and has periodic data transitions, the proposed CDR circuit achieves a theoretical jitter tolerance of 2 UIpp at 80 MHz, which is twice that of the conventional one. In this prototype, we aim at  $2^7 - 1$  PRBS only to justify (8). With longer PRBS run length, the jitter tolerance of the GDCO reduces and the system's performance may be limited by (6).



Fig. 7. (a) GDCO. (b) GDCO's waveforms when the clock  $\rm CK_{GDCO}$  lags the data. (c) GDCO's waveforms when the clock  $\rm CK_{GDCO}$  leads the data. (d) GDCO's waveforms when the clock  $\rm CK_{GDCO}$  locks with the data.

### III. BUILDING BLOCKS OF THE PROPOSED CDR CIRCUIT

## A. Gated Digital-Controlled Oscillator (GDCO)

In Fig. 7(a), the GDCO is composed of five current-modelogic (CML) multiplexers,  $M \times 1-M \times 5$ , and a replica buffer,  $M \times 6$ . It is modified from the GVCOs in [10], [11] with the similar concepts. Compared with [10], a quadrature clock in this



Fig. 8. CML multiplexers for the GDCO.

GDCO is available to retime the data. Compared with [11], this GDCO has fewer multiplexers to achieve the high-speed operation. When the input data is high, the multiplexers,  $M \times 1$ ,  $M \times 2$ ,  $M \times 4$ , and  $M \times 5$  form an oscillator. The multiplexer,  $M \times 3$ , outputs the clock B, which is the complement of the clock A. When the input data is low, the multiplexers,  $M \times 1$ ,  $M \times 3, M \times 4$ , and  $M \times 5$  form another oscillator. The multiplexer,  $M \times 2$ , outputs the clock A. Once the input data changes, the clock A or B tracks the data. Fig. 7(b)-(d) illustrate how the proposed GDCO adjusts its output phase when the clock, CK<sub>GDCO</sub>, lags, leads, and locks with the data, respectively. In Fig. 7(b), when the clock  $CK_{GDCO}$  lags the data and a data transition arrives, nodes A and B change their polarity and multiplexer  $M \times 1$  makes a switching from the lower input (node B) to the upper input (node A). At this moment, node A with lower voltage level than the zero crossing pulls down the clock  $CK_{GDCO}$  more rapidly so as to compensate the lagged phase. Similarly, in Fig. 7(c), when the clock  $CK_{GDCO}$  leads the data, the clocks A and B change after reaching the zero crossing point to correct the phase. When the clock  $CK_{GDCO}$  locks with the input data, the timing diagram is shown in Fig. 7(d). The phase of the sampling clock  $CK_{GDCO}$  is determined by the phase difference  $\theta_D$  between the data edges and the clock  $CK_{GDCO}$  and it is approximately given as

$$\theta_D = \frac{180^\circ}{K} \cdot 2 \tag{9}$$

where K is the number of the stages in the GDCO. For example, if we connect the data input in Fig. 7(a) to a logic one, the multiplexers  $M \times 1$ ,  $M \times 2$ ,  $M \times 4$ , and  $M \times 5$  form a 4-stage oscillator and K is 4 in this case. As a result, the clock CK<sub>GDCO</sub> is delayed by 90° and samples at the middle of the data eye.

For a conventional GVCO [8], [9], it starts to oscillate when the input data is high and stops to be latched when the input data is low. Serious amplitude variation happens if the output is latched to the supply voltage or ground. It also slows down the speed of the oscillators. For the proposed GDCO in Fig. 7(a), the oscillating waveforms are never latched. Thus, the amplitude variation is reduced and the bandwidth requirement of the gated multiplexers is also relaxed. To enhance the switching speed at nodes A and B in the GDCO of Fig. 7(a), the digital tuning is not applied to the multiplexers  $M \times 2$ ,  $M \times 3$ , and  $M \times 4$ . To preserve the quadrature phase in the GDCO, the digital tuning is applied to the multiplexers  $M \times 1$  and  $M \times 5$ . Based on simulation results, this GDCO has a tuning range of 600 MHz around 5 GHz with a monotonic frequency step no more than 50 MHz.

The CML multiplexers,  $M \times 1 - M \times 6$ , in the proposed GDCO are shown in Fig. 8. Referring to Fig. 7(b) and (c), nodes A and B in the GDCO experience large non-sinusoidal signal swing when there are data transitions. This causes the output waveforms of the multiplexer to be non-differential. Hence, the cross-coupled pairs are added in the multiplexers  $M \times 2$ ,  $M \times 3$ ,  $M \times 4$ , and  $M \times 6$  to ensure the differential outputs. The size of the transistors in the cross-couple pair is only one-sixth of that in the input differential pairs so as to avoid latch-up. The inputs, Data $\pm$ , in  $M \times 1 - M \times 3$  are used to select one of two differential inputs,  $in1\pm$  and  $in2\pm$ , respectively. The tail current source is used to enhance the power supply noise rejection. To overcome the process variations, four digital control bits are used in  $M \times 1$  and  $M \times 5$  to tune the oscillation frequency of the GDCO. Four bits control eight PMOS transistors in triode region to change the load resistance of  $M \times 1$  and  $M \times 5$ .

The GDCO's jitter transfer is also simulated. In this simulation, a 10 Gbps  $2^7 - 1$  PRBS with modulated jitter is applied at the GDCO's input. The jitter amplitude is set to 0.2 UIpp (20 ps pk-pk) and the jitter frequency ranges from 1 MHz to 100 MHz. The result is shown in Table I where no obvious frequency dependency is observed. Note that the GDCO itself generates high-frequency inter-symbol interference (ISI) jitter of 4 ps even when jitter-free input data are applied. The high-frequency ISI is generated because the phase of the GDCO drifts slightly during the CID due to the frequency offset and it is corrected at the end of the CID. However, most of the high-frequency jitter will be filtered out by the following low-bandwidth CDR circuit and this causes negligible extra jitter transfer gain, which can still be tolerated by the jitter transfer mask.

# B. Voltage-Controlled Oscillator (VCO)

Fig. 9 shows the complementary *LC* VCO without a tail current source. Actually this architecture is more sensitive to power

| Jitter frequency | GDCO's output jitter | Jitter transfer gain |
|------------------|----------------------|----------------------|
| 1MHz             | 23.6ps (pk-pk)       | 0.72dB               |
| 10MHz            | 23.4ps (pk-pk)       | 0.68dB               |
| 50MHz            | 23.6ps (pk-pk)       | 0.72dB               |
| 80MHz            | 23.5ps (pk-pk)       | 0.7dB                |
| 100MHz           | 23.3ps (pk-pk)       | 0.66dB               |

TABLE I SIMULATED JITTER TRANSFER OF THE GDCO



Fig. 9. Voltage-controlled oscillator.

supply variations so an independent power pad and an off-chip regulator are used. To have a wide tunable frequency range and overcome the process variations, a VCO with the coarse and fine tunings is usually preferred. In this work, a set of switched capacitors and the accumulation-mode varactor are used to tune the frequency of this *LC* VCO. This VCO covers a frequency range from 4.62 to 5.14 GHz and has a VCO gain of 260 MHz/V. A symmetric inductor is also used to enhance the phase noise performance [12]. This inductor has a value of 2.4 nH and exhibits a single-ended quality factor of 13 at 5 GHz. The current consumption of this VCO is less than 2 mA.

## C. PFD and CP

Both the output clocks of GDCO and VCO are divided by 4 to be 1.25 GHz. It is aimed to reduce the speed requirement of the PFD/CP and extend the linear range of the phase detection. The PFD and CP are shown in Fig. 10(a) and (b), respectively. To overcome the speed limitation, a dynamic PFD is chosen [13]. The delay time  $T_{\rm PFD}$  to reset the PFD in Fig. 10(a) has to be designed appropriately to reduce the dead zone and the resulting jitter. The CP is controlled by differential control signals (UP, UPB, DN, and DNB) to achieve high switching speed. Note that the jitter tolerance is sensitive to the charge pump current mismatch in a linear CDR circuit. Assume the nominal value of the charge pump current is I and the current mismatch is  $\Delta I$ . Also let the bit time of the 10 Gbps input data be  $T_b$ . The introduced static phase error,  $E_r$ , normalized in UI can be written as

$$E_r = \frac{T_{\rm PFD} \cdot \Delta I}{T_b \cdot I} (\rm UI). \tag{10}$$

Hence, the sampling margin decreases and the jitter tolerance in (8) should be modified slightly as shown in (11) at the bottom of the page. According to (11), both  $T_{\rm PFD}$  and  $\Delta I$  should be as small as possible. For the delay time  $T_{\rm PFD}$  in a PFD, it is chosen only large enough to avoid the dead zone problem. In this design,  $T_{\rm PFD}$  is around 200 ps. In fact, the charge pump calibration technique [14], [15] can be used to reduce the current mismatch in a charge pump. According to the simulation results and (10), a worst-case current mismatch of 10% is expected and this may cause a static phase error of 0.2 UI.

#### D. Flip-Flops and Dividers

For the sake of high speed operation, the CML DFFs and dividers are used in this work. The output waveforms of the DFFs are made as sharp as possible by dissipating more power than usual design in order not to reduce the sampling margin. The clock-to-output delay of the DFFs also introduces extra time delay, and replica buffers made of the latches in the DFFs are used to compensate the delay [1]. However, the existing device mismatch still degrades the sampling margin so the measured jitter tolerance will be worse than (11). According to the simulation results, the degraded sampling margin may be several picoseconds for different process corners and this effect can be considered in the first term of (11).

### IV. EXPERIMENTAL RESULTS

The proposed CDR circuit has been fabricated in a 0.13  $\mu$ m CMOS technology. Fig. 11 shows the die photo and the core area is 0.36 mm<sup>2</sup>, where the GDCO occupies an area of 0.01 mm<sup>2</sup>, the VCO occupies an area of 0.18 mm<sup>2</sup>, and the rest is for the digital circuits and the dummy region. All the measurements are taken on a probe station and the peak-to-peak swing level of the input data is 400 mV. The measured half-rate recovered clock and data for a 10 Gb/s 2<sup>7</sup> – 1 PRBS are shown in Fig. 12(a) and (b), respectively. This CDR demonstrates a 7.11 ps peak-to-peak jitter and the measured BER is less than

$$\left|J_{\text{TOL},\text{SYS}}(s)\right| = \left| \left(1 - \frac{T_{\text{PFD}} \cdot \Delta I}{T_b \cdot I}\right) \cdot \frac{2 \cdot \left(s^2 + I_{\text{CP}} \cdot \left(1 + s \cdot RC\right) \cdot \frac{K_{\text{VCO}}}{C \cdot N}\right)}{s^2} \right| \text{(UIpp).}$$
(11)





Fig. 10. (a) Phase-frequency detector. (b) Charge pump.



Fig. 11. Die photo.

 $10^{-12}$ . This CDR is also measured for a 10 Gb/s  $2^{31} - 1$  PRBS and the measured half-rate recovered clock and data are shown in Fig. 13(a) and (b), respectively. Because no frequency calibration circuit is implemented in this prototype, the frequency of the GDCO is set manually and it has a frequency offset less



Fig. 12. Measured half-rate recovered clock and data with a  $2^7 - 1$  PRBS. (a) Recovered clock. (b) Recovered data.

than 40 MHz in our measurements. Hence, the GDCO generates much jitter when it encounters longer CID. Under this condition, this CDR demonstrates a peak-to-peak jitter of 28 ps and a BER less than  $10^{-10}$ . To improve the frequency accuracy, an additional PLL with a replica GVCO should be used.

In this work, both the jitter transfer and jitter tolerance are measured under the specifications of OC-192 with a  $2^7 - 1$  PRBS. The sinusoidal input jitter (316 Hz ~ 80 MHz) is generated by the instrument and the channel jitter is not included. For the proposed CDR circuit with 4 MHz bandwidth, the measured jitter transfer and jitter tolerance are shown in Figs. 14 and 15, respectively. This proposed CDR circuit passes the masks for all the test points and its parameters are shown in Table II. However, the high-frequency jitter tolerance is hard to justify, due to the limited modulation capability in our jitter measurement instrument [16]. The modulation profile that can be produced by this instrument is plotted in Fig. 16 [16]. In order to test the jitter tolerance of this proposed CDR circuit, it should be re-measured with a reduced bandwidth to utilize the higher modulation capability in the lower frequency



Fig. 13. Measured half-rate recovered clock and data with a  $2^{31} - 1$  PRBS. (a) Recovered clock. (b) Recovered data.



Fig. 14. Measured jitter transfer (@ 4 MHz BW).

range. Thus, the CDR's bandwidth is changed to 250 kHz by replacing the off-chip loop filter in the re-measurement and the components' values are shown in Table II as well. The re-measurement result is shown in Fig. 17. For this CDR with



Fig. 15. Measured jitter tolerance (@ 4 MHz BW).



Fig. 16. Reducing the CDR's bandwidth to fit the modulation profile.

the bandwidth of 250 kHz, the jitter tolerance of 1.46 UIpp is obtained at the corner frequency of 250 KHz, which means a jitter tolerance limit of  $1.46 \times 0.707 = 1$  UIpp, which is about twice that of the conventional one [2], [3], [5]. To compare the jitter tolerances, the measured jitter tolerance and the calculated one of the proposed CDR circuit, and the calculated one for a conventional CDR circuit with the same loop parameters ( $R = 70 \ \Omega$ ,  $C = 1 \ \mu$ F, N = 4,  $I_{CP} = 400 \ \mu$ A, and  $K_{\rm VCO} = 260 \text{ MHz/V}$  are plotted together in Fig. 18. Compared with the calculated result by (8), the measured jitter tolerance is 50% of the theoretical value. It is because the non-linear effects exist, such as the current mismatch in a CP and the sampling offset in DFFs. Even with jitter tolerance only 50% of the theoretical value, this architecture is matching the ideal performance of the conventional CDRs. The power consumption of the proposed CDR circuit is 60 mW from a 1.5 V supply, where 50% of that is dissipated by the GDCO, 5% of that is for the VCO, and the rest is assigned to the digital circuits. Table III summarizes the measured performance of this proposed CDR circuit. The comparison with the previous works is also listed in Table III.



Fig. 17. Measured low-bandwidth jitter tolerance (@ 250 kHz BW).



Fig. 18. Measured low-bandwidth jitter tolerance versus the theoretical values.

| TABLE II              |
|-----------------------|
| CDR'S LOOP PARAMETERS |

| Condition        | Measurement with 4MHz | Measurement with 250kHz |  |
|------------------|-----------------------|-------------------------|--|
|                  | bandwidth             | bandwidth               |  |
| R                | 820ohm                | 70ohm                   |  |
| С                | 10nF                  | 1µF                     |  |
| N                | 4                     | 4                       |  |
| I <sub>CP</sub>  | 400μΑ                 | 400μΑ                   |  |
| K <sub>VCO</sub> | 260MHz/V              | 260MHz/V                |  |

TABLE III PERFORMANCE SUMMARY AND COMPARISON

| Parameter                      | [2]                   | [3]                 | [5]                  | This Work            |
|--------------------------------|-----------------------|---------------------|----------------------|----------------------|
| Туре                           | PLL                   | PLL                 | DLL + PLL            | GDCO + PLL           |
| Process                        | 0.13um                | 0.13um              | 0.12um               | 0.13um               |
|                                | CMOS                  | CMOS                | CMOS                 | CMOS                 |
| Jitter transfer                | Pass                  | Pass                | Fail                 | Pass                 |
| Jitter tolerance               | Pass                  | Pass                | Pass                 | Pass                 |
| Jitter tolerance               | 0.5UIpp               | 0.5UIpp             | <0.5UIpp             | 1UIpp                |
| (@ 80MHZ)                      |                       |                     |                      |                      |
| Output jitter<br>(Pk-Pk)       | 10ps                  | 15.6ps              | 18ps                 | 28ps                 |
| Core area<br>(w/o loop filter) | < 1.1 mm <sup>2</sup> | < 3 mm <sup>2</sup> | 1.94 mm <sup>2</sup> | 0.36 mm <sup>2</sup> |
| Power Diss.                    | < 1.2 W*              | <120mW**            | 550mW                | 60mW                 |

<sup>\*</sup> includes pulse pattern generator (PPG) and bit error rate tester (BERT).
\*\* includes preamplifiers and drivers.

#### V. CONCLUSION

A 10 Gb/s CDR circuit using the proposed jitter-tolerance-enhancing technique is presented. The proposed CDR cascades one high-bandwidth CDR and one low-bandwidth CDR to improve the jitter tolerance without sacrificing the jitter transfer. A GDCO is also used to implement the high-bandwidth CDR to alleviate the extra cost. This circuit passes the SONET OC-192 specifications with a  $2^7 - 1$  PRBS and demonstrates a jitter tolerance of 1 UIpp at 80 MHz, which is about twice that commonly achieved with conventional CDR circuits.

#### ACKNOWLEDGMENT

The authors would like to thank National Chip Implementation Center for chip implementation.

#### REFERENCES

- [1] B. Razavi, Design of Integrated Circuits For Optical Communications. New York: McGraw-Hill, 2003.
- [2] Y. Ohtomo, T. Kawamura, K. Nishimura, M. Nogawa, H. Koizumi, and M. Togashi, "A 12.5 Gb/s CMOS BER test using a jitter-tolerant parallel CDR," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2004, pp. 174–175.
- [3] S. Byun, J. C. Lee, J. H. Shim, K. Kim, and H. K. Yu, "A 10-Gb/s CMOS CDR and DEMUX IC with a quarter-rate linear phase detector," *IEEE J. Solid-State Circuits*, vol. 41, no. 11, pp. 2566–2576, Nov. 2006.
- [4] S. H. Lee and B. S. Song, "Digital-domain calibration of multistep analog-to-digital converters," *IEEE J. Solid-State Circuits*, vol. 27, no. 12, pp. 1679–1688, Dec. 1992.
- [5] W. Rhee, H. Ainspan, S. Rylov, A. Rylyakov, M. Beakes, D. Friedman, S. Gowda, and M. Soyuer, "A 10-Gb/s CMOS clock and data recovery circuit using a secondary delay-locked loop," in *Proc. 2003 IEEE Custom Integrated Circuits Conf. (CICC)*, San Jose, CA, Sep. 2003, pp. 81–84.
- [6] M. van Ierssel, A. Sheikholeslami, H. Tamura, and W. W. Walker, "A 3.2 Gb/s semi-blind-oversampling CDR," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2006, pp. 334–335.
- [7] J. Savoj and B. Razavi, "A 10-Gb/s CMOS clock and data recovery circuit with a half-rate linear phase detector," *IEEE J. Solid-State Circuits*, vol. 36, no. 5, pp. 761–768, May 2001.
- [8] A. E. Dunlop, W. C. Fischer, M. Banu, and T. Gabara, "150/30 Mb/s CMOS non-oversampled clock and data recovery circuits with instantaneous locking and jitter rejection," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, Feb. 1995, pp. 44–45.
- [9] M. Nogawa, K. Nishimura, S. Kimura, T. Yoshida, T. Kawamura, M. Togashi, K. Kumozaki, and Y. Ohtomo, "A 10 Gb/s burst-mode CDR IC in 0.13 μm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, Feb. 2005, vol. 1, pp. 228–595.
- [10] C. F. Liang, S. C. Hwu, and S. I. Liu, "A 10Gbps burst-mode CDR circuit in 0.18-μm CMOS," in *Proc. 2006 IEEE Custom Integrated Circuits Conf. (CICC)*, San Jose, CA, Sep. 2006, pp. 599–602.
- [11] C. F. Liang, S. C. Hwu, and S. I. Liu, "A multi-band burst-mode clock and data recovery circuit," *IEICE Trans. Electron.*, vol. E90-C, pp. 802–810, Apr. 2007.
- [12] M. Danesh and J. R. Long, "Differentially driven symmetric microstrip inductors," *IEEE Trans. Microw. Theory Tech.*, vol. 50, no. 1, pp. 332–341, Jan. 2002.
- [13] S. Kim, K. Lee, Y. Moon, D. K. Jeong, Y. Choi, and H. Y. Lim, "A 960-Mb/s/pin interface for skew-tolerant bus using low jitter PLL," *IEEE J. Solid-State Circuits*, vol. 32, no. 5, pp. 691–700, May 1997.
- [14] H. Huh, Y. Koo, K. Lee, Y. Ok, S. Lee, D. Kwon, J. Lee, J. Park, K. Lee, D. Jeong, and W. Kim, "A CMOS dual-band fractional-n synthesizer with reference doubler and compensated charge pump," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2004, pp. 100, 516.
- [15] S. Cheng, H. Tong, J. Silva-Martinez, and A. I. Karsilayan, "Design and analysis of an ultrahigh-speed glitch-free fully differential charge pump with minimum output current variation and accurate matching," *IEEE Trans. Circuits Syst. II*, vol. 53, no. 9, pp. 843–847, Sep. 2006.
- [16] Agilent 71501D jitter analysis user's guide. Agilent Technologies [Online]. Available: http://cp.literature.agilent.com/litweb/pdf/71501-90011.pdf



**Che-Fu Liang** (S'04) was born in Taipei, Taiwan, R.O.C., in 1981. He received both the B.S. and Ph.D. degrees in electrical engineering from National Taiwan University, Taipei, Taiwan, R.O.C., in 2003 and 2007, respectively.

His research interests include phase-locked loops, and high-speed CMOS data-communication circuits for multiple gigabit applications.



Shen-Iuan Liu (S'88–M'93–SM'03) was born in Keelung, Taiwan, R.O.C., 1965. He received the B.S. and Ph.D. degrees in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 1987 and 1991, respectively.

During 1991–1993, he served as a second lieutenant in Chinese Air Force. During 1991–1994, he was an Associate Professor in the Department of Electronic Engineering of National Taiwan Institute of Technology. He joined in the Department of Electrical Engineering, NTU, Taipei, in 1994, and

he has been the Professor since 1998. His research interests are in analog and digital integrated circuits and systems.

Dr. Liu served as chair of the IEEE SSCS Taipei Chapter in 2004–2008. He has served as general chair of the 15th VLSI Design/CAD Symposium, Taiwan, 2004, and as Program Co-chair of the Fourth IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, Fukuoka, Japan, 2004. He was the recipient of the Engineering Paper Award from the Chinese Institute of Engineers in 2003, the Young Professor Teaching Award from MXIC Inc., the Research Achievement Award from NTU, and the Outstanding Research Award from National Science Council in 2004. He served as a technical program committee member for ISSCC in 2006-2008 and A-SSCC since 2005. He was an Associate Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–II: EXPRESS BRIEFS in 2006–2007. He has been an Associate Editor for IEEE JOURNAL OF SOLID-STATE CIRCUITS since 2006 and an Associate Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–II: REGULAR PAPERS since 2008. He has been on the Editorial Board of *Research Letters in Electronics* since 2008. He is a senior member of IEEE and a member of the IEICE.



**Sy-Chyuan Hwu** was born in Taipei, Taiwan, R.O.C., on March 16, 1981. He received the B.S. degree in electronics engineering from the National Taiwan University, Taipei, Taiwan, and the M.S. degree in electronics engineering from the National Taiwan University, Taipei, Taiwan, in 2003 and 2005, respectively.

His research interests include PLL and high-speed serial links for optical fibers. He is currently with MediaTek Inc., Hsinchu, Taiwan, working on mixed-mode circuit design.