# A Power-Efficient Clock and Data Recovery Circuit in 0.18 $\mu$ m CMOS Technology for Multi-Channel Short-Haul Optical Data Communication Armin Tajalli, Student Member, IEEE, Paul Muller, Member, IEEE, and Yusuf Leblebici, Senior Member, IEEE Abstract—This paper studies the specifications of gated-oscillator-based clock and data recovery circuits (GO CDRs) designed for short haul optical data communication systems. Jitter tolerance (JTOL) and frequency tolerance (FTOL) are analyzed and modeled as two main design parameters for the proposed topology to explore the main tradeoffs in design of low-power GO CDRs. Based on this approach, a top-down design methodology is presented to implement a low-power CDR unit while the JTOL and FTOL requirements of the system are simultaneously satisfied. Using standard digital 0.18 $\mu$ m CMOS technology, an 8-channel CDR system has been realized consuming 4.2 mW/Gb/s/channel and occupying a silicon area of 0.045 mm<sup>2</sup> /channel, with the total aggregate data bit rate of 20 Gb/s. The measured FTOL is $\pm 3.5\%$ and no error was detected for a $2^{31}-1$ pseudo-random bit stream (PRBS) input data for 30 minutes, meaning that the bit error rate (BER) is smaller than $10^{-12}$ . Meanwhile, a shared-PLL (phase-locked loop) with a wide tuning range and compensated loop gain has been introduced to tune the center frequency of all CDR channels to the desired value. Index Terms—Chip-to-chip interconnection, clock and data recovery circuit, CMOS integrated circuits, frequency tolerance, gated oscillator, jitter tolerance, optical data communication, short-haul. # I. INTRODUCTION HE demand for increasing the data communication speed in short-distance applications such as in computer networks and high-speed processing systems has raised the importance of low-power and low-cost multi-channel optical transceivers [1]–[4]. In addition to very high-bandwidth, optical links can provide a robust medium against electro-magnetic coupling in short-haul applications [5]. Therefore, implementing high-performance, low-power and low-cost multi-channel optical transceivers in conventional silicon CMOS technology is a Manuscript received January 22, 2007; revised June 12, 2007. This work was supported by Grant 200021-100625 of the Swiss National Science Foundation. A. Tajalli was with the Electrical Engineering Department, Sharif University of Technology, Tehran, Iran. He is now with the Microelectronic Systems Laboratory (LSM), Swiss Federal Institute of Technology (EPFL), Lausanne CH-1015, Switzerland (e-mail: armin.tajalli@epfl.ch). P. Muller was with the Microelectronic Systems Laboratory (LSM), Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland. He is now with Marvell Semiconductor, Etoy CH-1163, Switzerland (e-mail: pmuller@marvell.ch). Y. Leblebici is with the Microelectronic System Laboratory (LSM), Swiss Federal Institute of Technology (EPFL), Lausanne CH-1015, Switzerland (e-mail: yusuf.leblebici@epfl.ch). Digital Object Identifier 10.1109/JSSC.2007.905234 Fig. 1. Conceptual block diagram of an integrated multi-channel optical receiver. key challenge for future data processing and communication systems. This paper presents a power-efficient clock and data recovery (CDR) solution to be combined with the already demonstrated pure silicon based photo-detection and amplification front-end [5]–[7] to realize the completely integrated multi-channel receiver. Fig. 1 shows the conceptual block diagram of the proposed multi-channel receiver system. In the proposed topology, an integrated photo-detector (PD) converts the optical signal to electrical current [5]. This electrical signal is amplified by trans-impedance and limiting amplifiers (TIA and LA) [6], [7] and then retimed by the CDR circuit. Due to their instantaneous locking properties, GO CDRs have been used widely in burst-mode applications [8]. Meanwhile, their simple topology has made them very suitable for low-power and small-area applications [9]–[11]. The performance of the proposed CDR, which is designed based on gated-oscillator (GO) topology, is extensively studied and analyzed in this paper to implement a high performance and low-power system. To implement a low-power CDR satisfying the required specifications for short-haul applications, a careful system design and modeling technique is presented. It will be shown that this topology is sensitive to any frequency error between received data and sampling clock frequency while it shows a relatively good jitter tolerance (JTOL) performance. The performance of the GO CDR in presence of frequency error will be also studied to have a good estimation of its frequency tolerance (FTOL). This study shows the existing tradeoffs between power dissipation and FTOL and hence the power penalty for achieving the desired FTOL. Section II describes the topology of the proposed multi-channel CDR system. Meanwhile, the techniques for modeling and estimating the JTOL and FTOL of a GO CDR will be explained in this section. In Sections III and IV, implementation techniques and measurement results will be presented. #### II. GATED-OSCILLATOR (GO)-BASED CDR In a GO CDR, the sampling clock is produced by a simple ring oscillator. As depicted in Fig. 2(a), a synchronizer block is used to keep this clock synchronous with the received data. This synchronizer block can be an edge detector which produces a retiming signal at each data transition [11]. Fig. 2(b) shows the topology of the proposed GO CDR, which consists of a current controlled ring oscillator (CCO), and Edge Detector block (EDET) that controls the phase of clock in CCO. As shown in Fig. 2(c), at each receiving data edge, the EDET generates a synchronization signal (ED) applied directly to the CCO. This signal prevents the CCO from oscillation and freezes the output clock $(ck_{out})$ to HIGH via the first stage of the ring oscillator. At the rising edge of ED, the oscillator releases and goes back to its free oscillation mode in a frequency determined by the controlling current $(I_C)$ and in phase with the last received data edge. Sampling the delayed data $(DD_{\rm in})$ instead of input data $(D_{\rm in})$ in the proposed topology eliminates the delay introduced by the delay line. Meanwhile, parasitic delays due to the XNOR gate or the delay mismatch between two inputs of the NAND gate in the oscillator should be compensated by proper dummy gates (as shown in Fig. 2(b)). In a GO CDR the total delay in delay line should not be less than half of one clock period (i.e., $T_0/2$ ) to make sure that the ring oscillator would be retimed in each data transition. If this delay is less than $T_0/2$ , then it is possible that the rising edge of ED arrives before the rising edge of $CK_4$ arrives (see Fig. 2(c)). In this case, the output clock phase is determined by $CK_4$ rather than ED. In this manner, synchronization between clock and data can take place within only a few transitions on input data. However, this fast synchronization will take place at the expense of poor jitter transfer (JTRAN) characteristics. Indeed, any jitter on received data or delay-line (DL) will be transferred to the output without any filtering. To reduce the effect of delay-line jitter, in the proposed topology, output data is produced by sampling $DD_{\rm in}$ (which contains almost the same jitter as ED) instead of $D_{\rm in}$ (Fig. 2(b)). Meanwhile, using an *injection signal* instead of *gating signal* can reduce the influence of input jitter at the output. By this technique, in each data transition a finite amount of charge will be injected to the oscillator and hence the oscillator will be synchronized with the input data more gradually. Assuming that the input is a periodic signal, then the transfer function between the injected phase and Fig. 2. (a) General and (b) proposed GO CDR topology. (c) Timing of operation. (d) Simplified circuit schematic of a SCL-based AND $(Z=A\bullet B)$ and the corresponding replica bias circuit. Fig. 3. Proposed 8-channel CDR topology which uses a shared-PLL for frequency tuning. the output phase would be a first-order low-pass transfer function as shown in [12] can be described by $$\frac{\phi_{\text{out}}}{\phi_{\text{in}}} \approx \frac{a \cdot z^{-1}}{1 - b \cdot z^{-1}} \tag{1}$$ in which a and b are depending on the strength of injection signal [12]. In a real CDR, the input data is a random bit stream and hence using average values for a and b, (1) would just approximately present the jitter transfer characteristics of the system. In this work, since the JTRAN is not a critical parameter for short-haul applications, the injection lock technique has not been applied. Fig. 2(d) shows the circuit schematic of a delay cell and the replica bias that has been applied to implement the delay line and CCO. Source-coupled logic (SCL) –based topology has been applied to achieve the desired speed [13]. The SCL gate shown in Fig. 2(d) has been configured as an AND gate to implement the proposed delay cells. Fig. 3 shows the proposed multi-channel gated-current-controlled-oscillator (GCCO)-based CDR. In this architecture, a shared phase-locked loop (PLL) generates a local high frequency clock ( $f_{\rm out}$ ) from a reference input clock ( $f_{\rm in}$ ) while $f_{\rm out}$ is intended to be exactly equal to the baud rate of the received data. This frequency has been specified in 100 ppm accuracy [18]. The proposed PLL uses an oscillator matched to the other oscillators used in each channel. To have a better matching between each channel and the PLL, current-controlled oscillators (CCOs) are used instead of voltage-controlled oscillators (VCOs) [10]. A copy of the control current $(I_C)$ produced by PLL is delivered to all matched oscillators in each channel $(I_C[1:8])$ , and a local replica bias circuit generates the proper bias voltage for pMOS load devices (shown in Fig. 2(d)). Using control voltage $(V_C)$ instead of $I_C$ can lead to a larger frequency error. In a VCO based topology, $V_C = V_{BN}$ (see Fig. 2(d)) would be copied for all channels while the existing mismatch between nMOS devices placed far away from each other can contribute to a large frequency offset. In the CCO approach, since all the bias voltages $(V_{BN} \text{ and } V_{BP})$ are produced locally and based on $I_C$ , this problem does not exist. A precise current mirror circuit can be used to ensure that the $I_C[1:8]$ are well matched to each other. Provided that the CCOs are well matched, the clock frequencies of all channels ( $Ck_{out}$ [1:8]) are identical and equal to $f_{\text{out}}$ . It is desirable to design the proposed CDR with a high frequency tolerance to avoid any incorrect sampling due to the inherent mismatch between the channels. In the following, the performance of a GO CDR in presence of frequency offset and also input jitter will be analyzed to investigate the capabilities and also limitations of this topology for short-distance data communication applications. # A. Frequency Tolerance Unlike in conventional PLL-based CDRs, a frequency difference can exist between the GO CDR and the incoming data stream. In practical applications, the data rate is specified within $\pm 100$ ppm accuracy. The frequency tolerance (FTOL), is defined as the maximum frequency difference at which the BER Fig. 4. (a) Incorrect sampling in presence of frequency offset and also jitter on sampling clock and received data. (b) Simulated BER in different values of frequency error and jitter on sampling clock (input data specifications: CID = $5,\,\mathrm{RJ}=0.015~\mathrm{UI_{rms}},\,\mathrm{DJ}=0.2~\mathrm{UI_{pp}}).$ remains lower than a specified value (usually, BER $< 10^{-12}$ ). For correct sampling in ideal conditions, when there is no jitter on data or clock, the frequency error must be smaller than $$|f_{ck} - f_0| < \frac{f_0}{2n} \tag{2}$$ in which $f_0 = \omega_0/2\pi$ is the nominal data frequency, $f_{ck} = 1/T_{ck}$ is the oscillator frequency (sampling clock), and n indicates the number of consecutive identical digits (CID) [13]. Using 8B10B coding, CID would be limited to five, i.e., $n \leq 5$ . Hence, based on (2): FTOL < 10%. However, in practice FTOL is less than this value mainly because of existing jitter on sampling clock or input data. As shown in Fig. 4(a), it is possible that the data edge arrives before the corresponding sampling clock edge, due to the jitter on received data or sampling clock resulting in error sampling. In this case, the probability of incorrect sampling would be BER = $$\int_{-\infty}^{+\infty} P_{ck}(\tau) \cdot \left( \int_{-\infty}^{\tau} P_d(\eta) \cdot d\eta \right) \cdot d\tau \qquad (3)$$ in which $P_{ck}(\cdot)$ and $P_d(\cdot)$ are indicating the probability of transition on sampling clock (around $t=(2n-1)T_{ck}/2$ ) and data (around $t=nT_{\rm data}$ ), respectively. According to (2), the worst case happens when there is a long stream of identical bits. Fig. 4(b) shows the achievable FTOL based on (3) and assuming that CID = 5. Here, the BER is calculated for different values of frequency error and also different *rms* (root mean square) jitter values on sampling clock. Here, it is assumed that the input data already contains both random (RJ) and deterministic jitter (DJ) [14]. As can be seen, any increase on clock jitter will conclude in degradation of FTOL. Based on this approach, to have an acceptable frequency tolerance, jitter generation in the oscillator must be very small. The main source of jitter on sampling clock in this configuration is accumulated jitter during free running of gated oscillator. The accumulated jitter increases with the free running time interval of the oscillator and can be expressed by [15] $$\sigma_{ck} = \kappa \sqrt{\Delta T} \tag{4}$$ in which $\sigma_{ck}$ indicates the rms jitter value on clock accumulated during the time interval of $\Delta T$ , and $\kappa$ is a proportionality factor that depends on topology and power consumption of the delay stages in ring oscillator and also on technology parameters [15], [16]. In a GO CDR, $\Delta T$ depends on the number of CIDs. Therefore, according to Fig. 4(b) and using (4), to have a FTOL of about 4.5%, and in presence of five consecutive identical digits: $\kappa \leq 9 \times 10^{-8} \left[ \sqrt{s} \right]$ . This criterion can be used to determine the biasing and hence the size of transistors in each delay cell. This approach offers a straightforward top-down methodology to design GO CDRs. Fig. 5 shows the proposed design methodology. Here, the high level system requirements and the structure of the CDR topology lead to a detailed behavioral model. This, in turn, is used to identify the jitter specifications. Based on this approach, the main limitations on oscillator jitter dictated by FTOL and JTOL can be used to determine the general circuit specifications such as power consumption. In the next step, the specifications can be translated into detailed circuit parameters such as biasing conditions and transistor sizes. ### B. Jitter Tolerance (JTOL) Jitter tolerance is one of the most important test parameters for serial link transceivers. JTOL is a measure of CDR capability in tolerating the input jitter, and it is usually tested by adding a sinusoidal jitter (SJ) at given frequency range to the data stream, which already includes the deterministic (DJ) and random jitter (RJ) components added in the channel [14]. The maximum jitter amplitude, which is a function of jitter frequency at which the CDR still operates at a given BER, is called jitter tolerance [13]. Simulation or analysis of JTOL for a nonlinear system like GO CDR is very complex. In this section, some techniques for analysis and modeling this parameter in GO CDRs will be presented. 1) Pure Sinusoidal Input Jitter: In presence of input sinusoidal jitter on received data, the data rate will be changed as $$\omega(t) = \omega_0 + \Delta\omega \cdot \cos\omega_j t \tag{5}$$ Fig. 5. Proposed GO CDR top-down design methodology. in which $\omega(t)$ indicates the instantaneous data frequency in [rad/s], $\omega_j$ is the frequency of sinusoidal jitter, $\omega_0$ is the nominal data frequency ( $\omega_0 = 2\pi \cdot R$ where R is data band rate), and $$\Delta \omega = \pi \cdot \text{UI}_{\text{pp}} \cdot \omega_j. \tag{6}$$ Here, $\mathrm{UI_{pp}}$ is the peak-to-peak sinusoidal jitter amplitude [17]. Accordingly, the period of input data will be $T_{\mathrm{data}} = 2\pi/\omega(t)$ . Now, it is possible to calculate the JTOL in a GO CDR based on variations in data period. In presence of sinusoidal jitter and according (6), the period of data will be changed such that $$\frac{2\pi}{(\omega_0 + \Delta\omega)} < T_{\text{data}} < \frac{2\pi}{(\omega_0 - \Delta\omega)}.$$ (7) To have a correct sampling, the data edge must not arrive after $3 \times T_0/2$ or before $T_0/2$ , hence $T_0/2 < T_{\rm data} < 3T_0/2$ . Combining this criterion with (7), then $2\pi/(\omega_0-\Delta\omega) < 3T_0/2$ and $T_0/2 < 2\pi/(\omega_0+\Delta\omega)$ . Therefore, to have correct sampling: $\Delta\omega/\omega_0 < 1/3$ . Applying (6) this criterion can be translated into $$UI_{pp} = \frac{\omega_0}{(3\pi\omega_j)} \tag{8}$$ in which $\mathrm{UI_{pp}}$ is the maximum tolerable sinusoidal jitter amplitude (peak-to-peak). Ignoring the channel jitter, this expression indicates a worst case approximation for JTOL in a GO topology since it is assumed that the data period always has its lowest (or highest) possible value as indicated in (7). It can be shown that in a more general case when there are n consecutive identical digits (as shown in Fig. 6(a)), the data edge must be within the time interval of $(2n-1)\cdot T_0/2 < T_{\mathrm{data}} < (2n+1)\cdot T_0/2$ , and the JTOL can be approximated by $$UI_{pp} \approx \frac{1}{(2n+1) \cdot \pi} \cdot \frac{\omega_0}{\omega_i}.$$ (9) It should be mentioned that here it is assumed that the input data is periodic with nominal period of $T=n\times T_0$ which indicates the worst case since JTOL reduces by increasing the value of n. Fig. 6(b) compares the JTOL calculated by (9) (based on data period variation) with respect to the JTOL mask and in different CID values [18]. This plot also shows the results of behavioral modeling. As can be seen, as long as the channel jitter is negligible, there is a good agreement between (9) and behavioral modeling results. The JTOL predicted by (9) is slightly higher than the modeling result because it does not take into account the effect of RJ. 2) Including the Channel Jitter: To have a more practical estimation for JTOL, channel jitter must be also included in calculations [14]. Channel jitter generally includes both types of random (RJ) and deterministic jitter (DJ) with Gaussian and uniform distribution, respectively [14], [19]. If there is no jitter on sampling clock, then incorrect sampling can happen if data arrives earlier than corresponding clock edge or later than the next sampling clock edge. In this case, BER can be calculated by $$\text{BER} = \int_{-\infty}^{(2n-1) \cdot T_0/2} P_d(\tau) \cdot d\tau + \int_{(2n+1) \cdot T_0/2}^{+\infty} P_d(\tau) \cdot d\tau \quad (10)$$ in which $P_d(\cdot)$ indicates the probability of data transition in specified time. Assuming the impulse model for DJ with peak-to-peak jitter value of $T_{\rm pp}$ instead of uniform distribution [20], i.e., $$P_{d,\mathrm{DJ}}(t) = 0.5 \times \left[ \delta \left( t - \frac{T_{\mathrm{pp}}}{2} \right) + \delta \left( t + \frac{T_{\mathrm{pp}}}{2} \right) \right]$$ (11) and including the RJ with Gaussian distribution of $N(0, \sigma_{\rm RJ})$ (in which the mean value is zero and the *rms* value is $\sigma_{\rm RJ}$ ), the probability of data transition under DJ and RJ can be found as $$P_{d,\mathrm{DJ}\oplus\mathrm{RJ}}(t) = 0.5$$ $$\times \left[ N \left( T_{\mathrm{data}} - \frac{T_{\mathrm{pp}}}{2}, \sigma_{\mathrm{RJ}} \right) + N \left( T_{\mathrm{data}} + \frac{T_{\mathrm{pp}}}{2}, \sigma_{\mathrm{RJ}} \right) \right]. \quad (12)$$ Note that in the ideal case when there is no SJ, then $T_{\rm data} = n \times T_0 = n \times T_{ck}$ . As shown before, it is more convenient to calculate the variations on $T_{\rm data}$ for JTOL estimation ( $T_{\rm data}$ is shown in Fig. 6(a)). The next step is to add the SJ to (12) and get a complete expression for $P_d(\cdot)$ . A simple way to do this is using the upper or lower limits for $T_{\rm data}$ estimated in (7). Meanwhile as shown in Fig. 6(a), since $T_{\rm data}$ indicates the time difference between two consecutive data transitions, the jitter distribution on $T_{\rm data}$ would be $$P_T = P_d \oplus P_d \tag{13}$$ in which $\oplus$ indicates the convolution operation. This data transition will occur between two consecutive clock transitions at $(n-1/2) \cdot T_{ck}$ and $(n+1/2) \cdot T_{ck}$ . Ignoring clock RJ, $P_d(\cdot)$ would Fig. 6. (a) GO CDR operation in presence of jitter. (b) JTOL based on(9),(15), and behavioral modeling in comparison to JTOL mask [18] when the channel jitter is negligible (RJ = $0.01~{\rm UI_{rms}}$ ). (c) JTOL for CID = 5 while the input is periodic. have two peak values at $T_{\rm data,min}-T_{\rm pp}/2$ and $T_{\rm data,max}+T_{\rm pp}/2$ . The distance between each of these two peaks and the nearest clock edge is the key parameter in estimating the JTOL. In other words, increasing the temporal distance between each clock edge and the closest jitter peak leads to a larger achievable JTOL. For calculating the worst condition, define $\Delta t = \min\{\Delta t_L, \Delta t_H\}$ in which $\Delta t_H$ and $\Delta t_L$ present these temporal distances as $$\Delta t_L = \left(T_{\rm data,min} - \frac{2 \times T_{\rm pp}}{2}\right) - \left(n - \frac{1}{2}\right) \cdot T_{ck} \quad (14\text{-}1)$$ $$\Delta t_H = \left(n + \frac{1}{2}\right) \cdot T_{ck} - \left(T_{\text{data,max}} + \frac{2 \times T_{\text{pp}}}{2}\right). \tag{14-2}$$ The coefficient of $(2\times)$ in front of $T_{\rm pp}$ in (14) is used to take into account the convolution effect in (13). Assuming $\mathrm{RJ}_{\rm pp}=\Delta t$ ( $\mathrm{RJ}_{\rm pp}$ indicates the peak-to-peak value of $\mathrm{RJ}$ ), then error sampling can occur at the rate of BER for $\Delta t < \mathrm{RJ}_{\rm pp}$ . Defining $\lambda = \mathrm{RJ}_{\rm pp}/\sigma_{\rm RJ} = \Delta t/\sigma_{\rm RJ}$ , then $\lambda$ must be greater than a specified value to reach the desired BER [19]. In other words, the peak-to-peak value of the RJ must be larger than $\mathit{rms}$ jitter value by at least the specified value of $\lambda$ to have a BER less than the desired value, i.e., $\Delta t > \lambda \cdot \sigma_{\rm RJ}$ . With these assumptions, JTOL can be approximated as $$\text{UI}_{\text{pp(min)}} \approx \frac{\eta}{\pi} \cdot \frac{\omega_0}{\omega_j}$$ (15) in which $\eta$ is defined as $$\eta = \min \left\{ \left| 1 - \frac{n}{n \pm \left( 1/2 - T_{\rm pp}/T_0 - \sqrt{2} \cdot \lambda \sigma_{\rm RJ}/T_0 \right)} \right| \right\}$$ (16) and depends on specifications of different types of jitter. Here, the minus sign corresponds to the $\Delta t_L$ and plus sign corresponds to the $\Delta t_H$ case and the coefficient of $\sqrt{2}$ appears because of the convolution operation indicated in (13). This expression gives a good estimation for JTOL in GO CDR topologies. In Fig. 6(b) the JTOL estimated by (15) is compared to the behavioral modeling results. The simulation results for CID = 1, CID = 3, and CID = 5 show how JTOL depends on the CID value. It can be seen that there is a very good agreement between (15) and behavioral modeling results in low jitter frequencies. 3) High Frequency Jitter: As predicted by (15) and (16), at low jitter frequencies JTOL reduces with increasing CIDs. However, as behavioral modeling shows in Fig. 6(b), by increasing the jitter frequency, JTOL does not reduce as predicted by (9) and (15). To explain the reason for this behavior at high jitter frequencies, assume that there is a sequence of n consecutive identical bits. Thus, the instantaneous data frequency would be $\omega_0/n$ . In this case, if $\omega_j = k \cdot \omega_0/n$ (in which k is an integer number), then the instantaneous phase of the input data would be [17] $$\phi(t) = \frac{\omega_0 \cdot t}{n} + A_{\rm SJ} \cdot \sin\left(\frac{k \cdot \omega_0 \cdot t}{n}\right) \tag{17}$$ in which $A_{\rm SJ}$ is the amplitude of SJ. Therefore, in this case regardless of $A_{\rm SJ}$ , the data period remains unchanged since based on (2) the period of $\sin[\phi(t)]$ would be $n\cdot T_0$ . As long as $\omega_j=k\cdot\omega_0/n$ , the SJ has no effect on the input data transition points. In this case, the JTOL is only limited by the other sources of jitter. For example, for n=5 the JTOL would grow near the frequencies $\omega_j=\omega_0/5$ and $\omega_j=2\omega_0/5$ , as confirmed for periodic input data with CID = 5, in Fig. 6(c). In practice, the JTOL can be calculated by the weighted averaging of BER over different possible values of n since the input data is not periodic. The final result can be seen in Fig. 6(b) when RJ = 0.01 UI<sub>rms</sub>. In this plot, the roll off in JTOL for frequencies higher than $0.06\times\omega_0$ is due to this effect. Based on Fig. 6(b), GO CDR shows a very good JTOL performance for low jitter frequencies while very careful design is needed to pass the JTOL requirement in very high jitter frequencies. Meanwhile, Fig. 6(b) shows that the JTOL in presence of frequency error between the received data and sampling clock degrades. This means that JTOL can impose additional restrictions on the tolerable frequency error with respect to the FTOL that is calculated based on the correct sampling assumption. The physical limit shown in Fig. 6(b) and (c) is imposed by (5), since in this equation $\Delta\omega$ should remain less than $\omega_0$ or $\Delta\omega < \omega_0$ . #### III. CIRCUIT IMPLEMENTATION As shown in Section II, GO CDRs are suffering from sensitivity to frequency offset and careful design is required to achieve acceptable FTOL and JTOL. The main goal of this work is to implement a low-power and small area CDR which is very desirable for short-haul applications. In the following, the techniques for implementing a reliable CDR circuit will be explained. ## A. Phase Noise Requirement Frequency stability and timing jitter are the two most important specifications of the oscillator in a GO topology. Timing jitter of ring oscillators, or its frequency domain analogy, *phase noise*, has been extensively studied in [15], [16] and [21]. As indicated in Section II, sampling clock jitter can be described by (4). This equation can also be used to present a good estimation for jitter-power consumption tradeoff in a differential ring oscillator. As shown in [21], the minimum achievable value for can be calculated by $$\kappa_{\min} = \sqrt{\frac{8}{3\eta}} \cdot \sqrt{N \cdot \frac{kT}{P} \cdot \left(\frac{V_{\text{dd}}}{V_{\text{char}}} + \frac{V_{\text{dd}}}{R_L I_{\text{SS}}}\right)}.$$ (18) Here, $\eta$ indicates the relation between rise time and delay in each delay cell, P is the oscillator power dissipation, N is the number of delay stages in ring oscillator, $R_L$ is the load resistance, $I_{SS}$ is the tail current of delay cell, $V_{dd}$ is supply voltage, $V_{\rm char} = V_{\rm dsat}$ (drain-source saturation voltage) for long channel devices and $V_{\rm char} = E_C L/\gamma$ for short-channel devices [21]. Fig. 7 shows the two main design parameters, i.e., $\kappa$ and oscillation frequency versus the bias current. Both parameters are normalized with respect to the desired values. As can be seen, $\kappa$ reduces while the oscillation frequency increases with increasing the bias current. Based on this figure, $I_{SS} = 100 \ \mu A$ can be a good choice to achieve both desired speed and jitter performance. To be conservative, a larger bias current has been chosen here $(I_{\rm SS}=200~\mu{\rm A})$ . For scaling the oscillation frequency, it is necessary to resize the devices in ring oscillator, as shown in Fig. 7 [22]. Therefore, using (18) it is possible to determine the minimum achievable power dissipation while satisfying the system jitter requirements. This figure also compares the estimated $\kappa$ value derived in [15] and [21] for the proposed differential ring oscillator. # B. GO CDR Design Based on the topology shown in Figs. 2(b) and 3, an 8-channel CDR has been implemented in a conventional 0.18 $\mu$ m digital CMOS technology. The proposed shared-PLL uses a high order Fig. 7. Jitter-power and frequency-power tradeoffs in a ring oscillator. loop filter to suppress the ripples on controlling signal and hence having a very little jitter generation. To achieve a good matching and balance between the delay line and the ring oscillator, all the delay cells in the delay line and the ring oscillator are built with identical SCL-based two-input multiplexer (MUX) gates optimized for this application [23]. This matching helps to achieve almost the same delay value per stage in the delay line and oscillator and hence there is no need for an extra tuning loop for adjusting the delay of delay line. Fig. 2(d) shows the proposed AND gate and also the simplified schematic of the replica bias circuit [13]. The minimum acceptable bias current for the delay cells has been chosen based on (4), (18) and Fig. 7. This approach results in a low-power circuit while satisfying the system requirements. # C. Shared Phase-Locked Loop Fig. 8(a) shows the block diagram of the proposed PLL. A third order loop filter has been applied to attenuate the ripples on the controlling signal. A transconductor $(G_m)$ cell also converts the controlling voltage to current. Copies of this current are delivered to all CDRs to tune their oscillators on the desired frequency. To achieve good matching, it is necessary to apply a precise current mirror circuit for generating $I_C[1:8]$ . In the PLL shown in Fig. 8(a), the parasitic pole introduced by the $G_m$ cell and parasitic capacitors at the transconductor output $(C_w)$ can push the loop towards instability. Regarding Fig. 8(c) and assuming $g_{m7}=g_{m6}$ , the frequency transfer function of the $G_m$ cell can be expressed by $$G_m(s) = \frac{I_c(s)}{V_{\text{in}}(s)} \approx \frac{g_{m1}}{1 + s \frac{C_F}{g_{mea}}}$$ (19) where $g_{m1}$ and $g_{m6}$ are presenting the transconductance of M1 and M6. To avoid this problem, it is possible to use this parasitic pole, *i.e.*, $g_{m6}/C$ (and removing $1/(R_3C_3)$ ) for filtering purpose. Otherwise, this pole should be pushed toward very high frequencies. Fig. 8(b) shows the transfer characteristic of Fig. 8. (a) Block diagram of the proposed PLL. (b) Transfer characteristics of the transconductor used in PLL loop. (c) Proposed nonlinear transconductor and simplified current mirrors used to copy the current for PLL as well as CDR channels. the proposed transconductor. Notice that the $G_m$ value of the transconductor is low at low output currents and high at high output currents. This nonlinear characteristic helps to achieve both a high current swing to have a wide tuning range as well as a relatively constant CCO gain $(K_{\rm CCO})$ over process corners. In slow corner where the $K_{\rm VCO}$ $(K_{\rm VCO} = \partial f_{\rm osc}/\partial V_{BN})$ and $V_{BN}$ is introduced in Fig. 2(d)) is low and higher control current is required to achieve the desired oscillation frequency, transconductance is high. For the same reason, transconductance must be low when the control current is low. Thus, considering $$K_{\text{CCO}} = \frac{\partial f_{\text{osc}}}{\partial I_C} = \frac{\partial f_{\text{osc}}}{\partial V_C} \times \frac{\partial V_C}{\partial I_C} = G_m \cdot K_{\text{VCO}}$$ (20) Fig. 9. Proposed test chip mask layout. the CCO gain will remain almost insensitive to the process variation. Fig. 8(c) shows the circuit schematic of the proposed transconductor. In this circuit, the input voltage $(V_{\rm in})$ is converted to current by M1. When $V_{\rm in}$ is close to $V_{\rm SS}$ , M1 is in triode region and the transconductance is low. However, when $V_{\rm in}$ approaches $V_{\rm DD}$ , M1 moves toward saturation and the transconductance increases rapidly. This explains the $I\!-\!V$ characteristic in Fig. 8(b). The replica circuit consists of M2, M4, and $I_B$ are used to specify the $V_A$ (the voltage in which M1 switches from triode to saturation region) and also the output current swing. $I_{\rm OFF}$ is used to have a small amount of current in start-up condition and to be sure that the oscillator starts up properly. ### IV. SILICON REALIZATION AND MEASUREMENT RESULTS The proposed multi-channel CDR has been implemented in a digital 0.18 $\mu m$ CMOS technology. Fig. 9 shows the mask layout of the test chip which includes 8 CDR channels as well as biasing circuit and the shared-PLL. To avoid package parasitic effects, the fabricated chip has been mounted and bonded directly on printed circuit board (PCB). As can be seen in Fig. 10, the measured free running oscillation frequency of CCO shows good matching to post-layout simulation results. This plot also shows the low sensitivity of the oscillation frequency to the supply voltage variation, thanks to the internal bias control circuit. Fig. 11 shows the measured recovered clock and recovered data at a clock rate of 2.5 GHz. The eye diagram and bathtub curve shown in Fig. 12 are presenting a good horizontal eye opening. In this figure, the eye closure in y-direction is mainly due to the bandwidth limitation of 50 output buffers. Using LeCroy SDA 6000 scope, the effective $\it rms$ jitter value on recovered data is measured as 4.1 $\it ps_{rms}$ . To estimate the frequency tolerance of the proposed CDR, the nominal frequency of the reference clock has been changed until incorrect sampling occurs. Using Tektronix AWG520; the frequency tolerance of the proposed CDR has been measured in presence of RJ and DJ on received data. The measured FTOL is $\pm 3.5\%$ which is slightly smaller than what was expected (the expected value was 4.5%). The measured oscillation frequency of the PLL and different channels show that the matching between Fig. 10. Measured free running oscillation frequency of a CCO versus tuning current in comparison to the simulation results. Fig. 11. Recovered data and clock of the implemented CDR at $f_{\rm clk}=2.5~{\rm GHz}.$ Fig. 12. Eye diagram of the output recovered data and the bath tub curve at $f_{\rm clk}=2.5$ GHz. the channels and PLL is better than 1%. The matching is better for closer channels than the farthest ones. Depending on the Fig. 13. Measured and simulated JTOL ( $RJ_{\rm data}=0.01~UI_{\rm rms},~DJ=0.2~UI_{\rm pp}$ ). technology that is used and the matching that can be achieved, additional PLLs (such as using one dedicated PLL for every 2 or 4 channels) may be considered. Meanwhile, no error was detected for a $2^{31}-1$ PRBS(pseudo-random bit stream) input data for 30 minutes meaning that the BER is smaller than $10^{-12}$ . Fig. 13 shows the measured JTOL in comparison to the simulation results. It can be seen that the measurement results are in good agreement with behavioral modeling results. In this figure, the upper limits on measured jitter amplitude and on the jitter frequency are both due to the test equipment limitations. In this measurement RJ and DJ have been added to the input data. This explains the reason for having less than 0.5 $\rm UI_{pp}$ SJ amplitude tolerance at the high frequencies. The measured power consumption was 4.2 mW/Gb/s/channel. The power consumption could be further reduced by removing the test blocks, extra buffers and biasing circuits have been used in each channel for characterization purposes. # V. CONCLUSION In this paper, the implementation of an 8-channel clock and data recovery circuit operating with an aggregate data rate of 20 Gb/s has been presented. A structural design methodology confirmed by the measurements has been introduced to implement the proposed CDR system with low-power dissipation while satisfying the short-haul system jitter requirements. As shown in this paper, gated-oscillator based CDRs show an acceptable jitter tolerance for short distance applications while occupying a very small silicon area and consuming very low power. It has been also shown that concerning the sensitivity of this topology to the frequency offset between received data and the recovered clock, a careful design is required to ensure that the CDR specifications, especially the JTOL, remains on desired level. These features indicate that the proposed GO CDR topology is very suitable for modern short-haul applications where eventually hundreds of transceivers must be integrated on a single chip. Implemented in a digital 0.18 $\mu$ m CMOS technology, the power dissipation of the proposed gated-oscillator based CDR is 4.2 mW/Gb/s/channel occupying 0.045 mm² (excluding 50 $\Omega$ output buffers). #### ACKNOWLEDGMENT The authors would like to acknowledge S. Badel, Z. D. Toprak, A. Schmidt, S. Hauser, and G. Ding from EPFL for their valuable help in this work. #### REFERENCES - [1] H. Takauchi *et al.*, "A CMOS multichannel 10-Gb/s transceiver," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2094–2100, Dec. 2003. - [2] Y. Moon *et al.*, "A quad 0.6/3.2 Gb/s/channel interference-free CMOS transceiver for backplane serial link," *IEEE J. Solid-State Circuits*, vol. 39, no. 5, pp. 795–803, May 2004. [3] J. Kim *et al.*, "A four-channel 3.125-Gb/s/ch CMOS serial/link trans- - [3] J. Kim et al., "A four-channel 3.125-Gb/s/ch CMOS serial/link transceiver with a mixed/mode adaptive equalizer," *IEEE J. Solid-State Circuits*, vol. 40, no. 2, pp. 462–471, Feb. 2005. - [4] Y. Miki et al., "A 50-mW/ch 2.5/Gb/s/ch data recovery circuit for the SFI-5 interface with digital eye-tracking," *IEEE J. Solid-State Circuits*, vol. 39, no. 4, pp. 613–621, Apr. 2004. - [5] M. K. Emsley, O. Dosunmu, M. S. Unlü, P. Muller, and Y. Leblebici, "Realization of high-efficiency 10 GHz bandwidth silicon photodetector arrays for fully integrated optical data communication interfaces," in *Proc. Eur. Solid-State Device Research Conf. (ESS-DERC)*, Estoril, Portugal, Sep. 2003, pp. 47–50. - [6] P. Muller, Y. Leblebici, M. K. Emsley, and M. S. Unlü, "A 4-channel 2.5 Gb/s/channel 66 dBOhm inductorless transimpedance amplifier," in *Proc. Eur. Solid-State Circuits Conf. (ESSCIRC)*, Leuven, Belgium, Sep. 2004, pp. 491–494. - [7] P. Muller and Y. Leblebici, "Limiting amplifiers for next-generation multi-channel optical I/O interfaces in SoCS," in *Proc. SoC Conf.*, Sep. 2005, pp. 193–196. - [8] M. Nogawa et al., "A 10 Gb/s burst-mode CDR IC in 0.13 μm CMOS," in IEEE Int. Solid State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2005, pp. 228–229. - [9] S. Kobayashi and M. Hashimoto, "A multirate burst-mode CDR circuit with bit-rate discrimination function from 52 to 1244 Mb/s," *IEEE Photon. Technol. Lett.*, vol. 13, pp. 1221–1223, Nov. 2001. - [10] S. Kaeriyama and M. Mizuno, "A 10 Gb/s/ch 50 mW 120 × 120 μm<sup>2</sup> clock and data recovery circuit," in *IEEE Int. Solid State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2003, pp. 70–71. - [11] M. Nakamura et al., "A 156 Mb/s CMOS clock recovery circuit for burst-mode transmission," in *IEEE Symp. VLSI Circuits Dig. Tech. Pa*pers, 1996, pp. 122–123. - [12] H.-T. Ng et al., "A second-order semidigital clock recovery circuit based on injection locking," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2101–2110, Dec. 2003. - [13] B. Razavi, Design of Integrated Circuits for Optical Communications. New York: McGraw-Hill, 2003. - [14] J. Kim and D. -K. Jeong, "Multi-gigabit-rate clock and data recovery based in blind oversampling," *IEEE Commun. Mag.*, pp. 68–74, Dec. 2003 - [15] J. A. McNeill, "Jitter in ring oscillators," IEEE J. Solid-State Circuits, vol. 32, no. 6, pp. 870–879, Jun. 1997. - [16] A. A. Abidi, "Phase noise and jitter in CMOS ring oscillators," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1803–1816, Aug. 2006. - [17] L. M. De Vito, "A versatile clock recovery architecture and monolithic implementation," in *Monolithic Phase-Locked Loops and Clock Re*covery Circuits, Theory and Design, B. Razavi, Ed. New York: IEEE Press. 1996. - [18] InfiniBand Architecture Specification, Revision 1.0.a, InfiniBand Trade Assoc., 2001. - [19] Converting between rms and peak-to-peak jitter at a specified BER Maxim Integrated Products, Application Note HFAN-4.0.2, 2000. - [20] N. Ou, T. Farahmand, A. Kuo, S. Tabatabaei, and A. Ivanov, "Jitter models for the design and test of Gb/s-speed serial interconnects," *IEEE Des. Test Comput.*, vol. 21, pp. 302–313, Jul.-Aug. 2004. - [21] A. Hajimiri, S. Limotyrakis, and T. H. Lee, "Jitter and phase noise in ring oscillators," *IEEE J. Solid-State Circuits*, vol. 34, no. 6, pp. 790–804, Jun. 1999. - [22] C. H. Doan, "Design and implementation of a highly-integrated low-power CMOS frequency synthesizer for an indoor wireless wideband-CDMA direct-conversion receiver," Master's thesis, Electr. Eng. Comput. Sci. Dept., Univ. California, Berkeley, 2000. - [23] A. Tajalli, P. Muller, M. Atarodi, and Y. Leblebici, "A multichannel 3.5mW/Gb/s/channel gated oscillator based CDR in a 0.18 μm digital CMOS technology," in *Proc. Eur. Solid-State Circuits Conf. (ESS-CIRC)*, Grenoble, France, Sep. 2005, pp. 193–196. Armin Tajalli (S'04) received the B.S. and M.S. degrees (Hons.) in electrical engineering from Sharif University of Technology, Tehran, Iran, and Tehran Polytechnic University in 1997 and 1999, respectively, and the Ph.D. degree from Sharif University of Technology in 2006 (Hons.). From 1998 to 2004, he was with Emad Semicon as a Senior Analog Design Engineer. Since 2006, he has been with Microelectronic Systems Laboratory (LSM), Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland, working on ultra-low-power circuit design techniques. Dr. Tajalli received the Kharazmi Award on Research and Development, 2000, and the Presidential Award of the best Iranian researchers, 2003. **Paul Muller** (S'02–M'07) received the engineering degree in electrical engineering from the Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland, in 1999 and the Dr. Sc. degree in 2006. From 1999 to 2002, he worked as an Analog and Mixed-Signal Design Engineer at XEMICS SA, where he contributed to several sensing and data acquisition circuit designs. In 2002, he joined the Microelectronic Systems Laboratory at EPFL as a Research Assistant, working on the modeling and design of multi-channel gigabit receivers for short-distance optical communication interfaces. Since 2006, he has been with the wireless division of Marvell Semiconductor in Etoy, Switzerland, and Santa Clara, CA. Yusuf Leblebici (M'90–SM'98) received the B.S. and M.S. degrees in electrical engineering from Istanbul Technical University in 1984 and 1986, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign (UIUC) in 1990. Between 1991 and 2001, he worked as a faculty member at UIUC, at Istanbul Technical University, and at Worcester Polytechnic Institute (WPI) where he established and directed the VLSI Design Laboratory. Since 2002, he has been a Chair Professor at the Swiss Federal Institute of Technology in Lausanne (EPFL), and Director of the Microelectronic Systems Laboratory. His research interests include design of high-speed CMOS digital and mixed-signal integrated circuits, computer-aided design of VLSI systems, intelligent sensor interfaces, modeling and simulation of semiconductor devices, and VLSI reliability issues. He is the co-author of two textbooks, *Hot-Carrier Reliability of MOS VLSI Circuits* (Kluwer Academic, 1993) and *CMOS Digital Integrated Circuits: Analysis and Design* (McGraw-Hill, 1996, 1998, and 2002), as well as more than 150 scientific articles published in international journals and conferences. Dr. Leblebici has served on the organizing and steering committees of several international conferences in microelectronics. He served as an Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II between 1998 and 2000, and as an Associate Editor of IEEE TRANSACTIONS ON VLSI between 2001 and 2003. He received the Young Scientist Award of the Turkish Scientific and Technological Research Council in 1995, and the Joseph Samuel Satin Distinguished Fellow Award of the Worcester Polytechnic Institute in 1999.