# A 16-Gb/s –11.6-dBm OMA Sensitivity 0.7-pJ/bit Optical Receiver in 65-nm CMOS Enabled by Duobinary Sampling

Mostafa G. Ahmed<sup>®</sup>, *Student Member, IEEE*, Dongwook Kim<sup>®</sup>, *Student Member, IEEE*, Romesh Kumar Nandwana<sup>®</sup>, *Member, IEEE*, Ahmed Elkholy<sup>®</sup>, *Member, IEEE*, Kadaba R. Lakshmikumar<sup>®</sup>, *Life Fellow, IEEE*, and Pavan Kumar Hanumolu<sup>®</sup>, *Fellow, IEEE* 

Abstract-High-speed, low-power optical interconnects, such as intensity modulation direct detection (IMDD) optical links, are increasingly deployed in data centers to keep pace with the growing bandwidth requirements. High-sensitivity low-power optical receivers (RXs) are the key components that enable energy-efficient IMDD optical interconnects. This article presents a low-power nonreturn-to-zero (NRZ) optical RX using a combination of a limited-bandwidth trans-impedance amplifier (TIA) and duobinary sampling to improve RX sensitivity at high data rates. Duobinary sampling leverages the well-controlled TIA inter-symbol interference (ISI) to recover the transmitted data, making it much more hardware efficient than canceling the ISI using a decision feedback equalizer (DFE). The proposed optical RX employs a CMOS-based analog front-end (AFE) to achieve high linearity and excellent power efficiency. Fabricated in 65-nm CMOS process, the prototype RX achieves optical modulation amplitude (OMA) sensitivity of -11.6 dBm at 16 Gb/s with 0.7-pJ/bit efficiency.

*Index Terms*—CMOS, duobinary sampling, high-sensitivity, intensity modulation direct detection (IMDD) optical links, optical receiver (RX), photodiode (PD), sense amplifiers (SAs), trans-impedance amplifier (TIA), variable-gain amplifier (VGA).

### I. INTRODUCTION

**D** OW-POWER, high-density, low-cost, and high-speed short-reach interconnects are needed in modern hyperscale data centers. According to [1] and [2], the interconnection network consumes around 23% of the total data-center IT power consumption. Intensity modulation direct detection (IMDD) optical links are used to meet the bandwidth requirements in an energy-efficient manner [3]. Fig. 1 shows a typical high-throughput optical interconnect in which  $N_{\rm CH}$ 

Manuscript received October 4, 2020; revised January 14, 2021; accepted March 2, 2021. Date of publication March 17, 2021; date of current version August 26, 2021. This article was approved by Associate Editor Payam Heydari. (*Corresponding author: Mostafa G. Ahmed.*)

Mostafa G. Ahmed and Pavan Kumar Hanumolu are with the Coordinated Science Laboratory, University of Illinois at Urbana–Champaign, Urbana, IL 61801 USA (e-mail: gamal2@illinois.edu).

Dongwook Kim is with Apple Inc., West Lake Hills, TX 78746 USA.

Romesh Kumar Nandwana and Kadaba R. Lakshmikumar and with the Optical Systems and Interconnects Group, Cisco Systems, Inc., Allentown, PA 18195 USA.

Ahmed Elkholy is with Broadcom, Inc., Irvine, CA 92618 USA

Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2021.3064248.

Digital Object Identifier 10.1109/JSSC.2021.3064248

Electrical Data Optical Priver Modulator Modulator Laser Diode Optical Splitter

Fig. 1. High-throughput optical interconnect.

parallel IMDD optical links share a single laser diode (LD) source. Each IMDD optical link consists of an optical modulator that converts data in the electrical domain into an optical domain, an optical fiber, and an optical receiver (RX) that converts the received optical signal back into the electrical domain. Because of the low insertion loss of an optical cable, IMDD optical links can communicate over relatively long distances without increasing power consumption. While the optical fiber itself is almost lossless, imperfect optical coupling between various link components and the inefficiency associated with transmitter modulation causes signal attenuation. The minimum required LD optical power ( $P_{LD}$ ) is dictated by such losses (L) and the optical RX sensitivity ( $P_{RX,S}$ ) and can be calculated using the following equation:

$$P_{\rm LD}|_{\rm dB} = 10\log(N_{\rm CH}) + P_{\rm RX,S}|_{\rm dB} - L|_{\rm dB}.$$
 (1)

From (1), we note that high-sensitivity optical RXs (small  $P_{RX,S}$ ) help to: 1) reduce LD optical power, which is a significant source of power dissipation in an optical link, or 2) increase the number of parallel channels ( $N_{CH}$ ) when the LD is operated at its maximum output power. Thus, a high-sensitivity low-power optical RX greatly enhances overall link energy efficiency. The design of such optical RXs is the focus of this article.

An optical RX consists of a photodiode (PD) followed by a trans-impedance amplifier (TIA) and a decision circuitry, as shown in Fig. 1. PD generates a current proportional to the received optical power, which is converted into a voltage by the TIA. The following decision circuitry recovers the

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/



Fig. 2. (a) Proposed duobinary sampling-based optical RX. (b) AFE output eye diagram ( $V_{AFE}$ ).

received data from the TIA output. TIA dictates the overall RX bandwidth and sensitivity. Shunt feedback TIA (SF-TIA) topology, compared to other architectures such as the common gate, offers better noise performance and is, therefore, most commonly used [4], [5]. However, its performance, especially at high data rates, degrades due to debilitating tradeoff between bandwidth and noise from the feedback resistor ( $R_F$ ) [4]–[7]. Because TIA bandwidth must be at least 60% the data rate (9.6 GHz for 16 Gb/s) to not introduce substantial inter-symbol interference (ISI) [5],  $R_F$  must be reduced at higher data rates at the expense of increased TIA noise.

Ideally, this tradeoff can be alleviated by using an equalizer at the output of a low-bandwidth TIA [4]–[7], in which noise is reduced by increasing  $R_F$ , and an equalizer suppresses the ISI resulting from reduced TIA bandwidth. Following this line of argument, Ahmed et al. [5] achieved 3-dB sensitivity improvement by using TIA with a bandwidth of only 25% the data rate and a decision feedback equalizer (DFE) that incurs minimal noise penalty compared to a continuous linear equalizer (CTLE) [8]. However, implementing DFE at high data rates is challenging due to the difficulty in closing the first-tap feedback loop [9]. Speculation can alleviate the timing constraint [4], [9], but it requires additional high-speed samplers and data multiplexer that increase power/area and hardware complexity. In addition, DFE is susceptible to error propagation, especially when the first post-cursor ISI is large, which is the case with a low-bandwidth TIA [4].

these drawbacks, this article presents a Given high-sensitivity low-power optical RX using duobinary sampling in conjunction with a low-bandwidth TIA. Instead of canceling the ISI arising from the low-bandwidth TIA, duobinary sampling that leverages the well-controlled ISI [10], resulting from well-defined frequency response [11], is used to improve RX sensitivity. Low-latency feedback loops are automatically eliminated due to the duobinary data recovery; error propagation is mitigated by using simple pre-coding. Both these features allow optimization of the RX for low power. Because duobinary sampling mandates higher linearity (similar to DFE) compared to nonreturn-to-zero (NRZ) sampling, a CMOS-based analog front-end (AFE) is employed. Fabricated in a 65-nm CMOS

process, the prototype 16-Gb/s optical RX, implemented using a 4-GHz (25% data rate) bandwidth TIA, achieves a sensitivity of -11.6-dBm optical modulation amplitude (OMA) (bit error rate (BER) <  $10^{-12}$ ) while consuming only 11.2 mW (0.7 pJ/bit), which represents more than  $2.7\times$  energy efficiency improvement compared to [5].

The rest of this article is organized as follows. The proposed optical RX architecture is presented in Section II and its circuit implementation details are described in Section III. Experimental results obtained from the prototype optical RX test chip are provided in Section V. The key contributions of this work are summarized in Section VI.

### **II. PROPOSED ARCHITECTURE**

A simplified block diagram of the proposed optical RX is shown in Fig. 2(a). It consists of a low-bandwidth TIA and a variable-gain amplifier (VGA), which form the AFE, followed by a duobinary sampling/decoding back end. Lowbandwidth TIA introduces significant amount of ISI, resulting in a minimal voltage margin when sampled in the middle of the output eye  $(E_{\text{NRZ}})$  at  $T_{\text{NRZ}}$ , as shown in Fig. 2(b). Another way to recover the data is by sampling the TIA output at 0.5 UI away from the middle (at  $T_{DB}$ ) using two samplers that produce a three-level signal from which the data can be recovered using a classical duobinary decoder implemented using an XOR gate, as shown in Fig. 2. In other words, low-bandwidth TIA exhibits duobinary response,  $H_{\text{TIA}}(z) = 1 + z^{-1}$ . The two samplers' thresholds are placed in the middle of the top and bottom duobinary eyes ( $E_{\text{DBH}}$ and  $E_{\text{DBL}}$ ). While the NRZ eye ( $E_{\text{NRZ}}$ ) at  $T_{\text{NRZ}}$  has a very little vertical eye opening, the duobinary eyes ( $E_{\text{DBH}}$  and  $E_{\text{DBL}}$ ) at  $T_{\text{DB}}$  have a much larger vertical opening. Note that duobinary sampling leverages the well-controlled ISI at  $T_{\rm DB}$ , instead of canceling it at  $T_{\rm NRZ}$  as is the case in a DFE equalized RX [4]-[7]. AFE bandwidth of about one-quarter of the input data rate (25%) results in near-optimal duobinary response [12], at which point both the horizontal and the vertical duobinary eye openings are maximized [see Fig. 2(b)]. The transmitted data can be duobinary pre-coded with minimal area/power overhead [12], [13].



Fig. 3. Simulated RX sensitivity versus TIA bandwidth to data rate ratio.

Fig. 3 shows the simulated RX sensitivity versus TIA bandwidth to data rate ratio of the proposed duobinary sampling, conventional NRZ sampling, and one-tap DFE equalized optical RXs. In these simulations, an SF-TIA with the second-order Butterworth frequency response is used, and its two main noise contributors, namely, the feedback resistor and the feed-forward amplifier, are included. To perform fair comparison, SF-TIA design parameters are optimized for each TIA bandwidth condition such that TIA noise  $(\overline{i_n^2})$  is minimized  $(\overline{i_n^2} \propto BW_{TIA}^3)$  using design equations from [5] and [14]. The simulation results illustrate that the proposed duobinary sampling technique achieves nearly the same sensitivity improvement as a one-tap DFE when the TIA bandwidth is 25% the data rate (4 GHz for 16 Gb/s). As for the AFE, the limited bandwidth of the post-amplifiers has minimal effect on the duobinary sampling sensitivity performance as long as the overall AFE bandwidth is 25% the data rate. It is also important to note that most of the post-cursor ISIs appear in the form of the first post-cursor when the AFE bandwidth to data rate ratio is 25% [4], which makes duobinary sampling the most hardware efficient way of mitigating ISI.

## **III. IMPLEMENTATION DETAILS**

A detailed block diagram of the proposed optical RX is shown in Fig. 4. It consists of a multi-stage SF-TIA (MS-SF-TIA) [5], a single-to-differential (S2D)-ended conversion stage, and a differential VGA, which form the AFE followed by a quarter-rate duobinary sampling back end. The RX was intended to be wire-bonded to a p-i-n PD that has 120-fF capacitance. PD input dc current is canceled with a resolution of 0.5  $\mu$ A using a 7-bit current digital to analog converter (DAC) located at the TIA input. The dc offset-compensation (DCOC) loop is employed in the signal path to compensate for device mismatches. The AFE is implemented using CMOS-based amplification stages to achieve almost rail-to-rail output swing, high linearity, and low power while operating with the low supply voltage.

Duobinary sampling back end uses quarter-rate architecture. High-sensitivity samplers (HSSs) are implemented using a cascade of sense amplifiers (SAs) followed by a symmetric SR latch [15]. High- and low-threshold voltages for the duobinary samplers ( $V_{\text{THH}}$  and  $V_{\text{THL}}$ ) are generated using two 7-bit voltage DACs. Four quarter-rate clock phases are obtained from an external differential half-rate clock using quadrature divider-by-two. Phase correction circuitry is implemented to ensure the quadrature-phase spacing between the four clock phases. Circuit implementation details of the RX building blocks are discussed next.

#### A. Multi-Stage SF-TIA and S2D

The MS-SF-TIA schematic is shown in Fig. 5. It is implemented using a three-stage CMOS inverter-based feedforward amplifier with a gain and a bandwidth of 30 dB and 7 GHz, respectively, and consumes 2-mW power. Assuming a total input capacitance of 180 fF, a TIA bandwidth of 4.3 GHz is achieved using an 8.8-k $\Omega$  feedback resistor  $(R_F)$ . Out of 180 fF, 120 fF is assumed to come from PD junction capacitance, 40 fF from the pad and routing parasitic capacitance, and 20 fF due to TIA input capacitance. The three-stage feed-forward amplifier reduces  $R_F$ noise contribution to only 3% and helps achieve superior noise performance compared to a TIA implemented using a single-stage feed-forward amplifier [5]. The overall noise performance is dominated by the noise contribution from the feed-forward amplifier, which contributes about 65% of the total noise.  $R_F$  is implemented using a 3-bit resistor bank to tune the overall AFE bandwidth to 4 GHz (TIA + VGA) and guarantee optimum duobinary response across process and temperature variations. Fig. 6 shows the AFE pulse response at 16 Gb/s obtained from post-layout simulations, including 250-pH PD-TIA bond-wire inductance, across different process corners with and without bandwidth trimming using  $R_F$ . RX sensitivity simulation results shown in Fig. 3 were for an SF-TIA with the second-order transfer function and the implemented three-stage MS-SF-TIA whose phase margin was about 65<sup>0</sup> also exhibits a similar Butterworth response.

The S2D stage converts the single-ended TIA output into a differential signal using an inverter loaded by a matched diode-connected inverter [11], as shown in Fig. 5. Unlike the dummy TIA-based S2D conversion used in [5], this implementation avoids the additional noise penalty. Furthermore, due to the limited common-mode rejection ratio of the following VGA at high frequencies, driving it with a fully differential signal results in a constant VGA output common mode, which greatly improves samplers' performance, as discussed in Section III-E. S2D simulation results show that the unity-gain complimentary path introduces 1.5% total harmonic distortion (THD) for a 400-mV<sub>pp</sub> 1-GHz input tone and less than 5-ps group delay until its 3-dB bandwidth (17.6 GHz) while consuming only 0.4 mW. Also, AFE simulation results show that the maximum output common-mode ripple is only 25 mV when the differential-mode output swing is 570 mV.

PD cathode terminal (PD<sub>C</sub>) is biased using an on-chip resistor,  $R_{C, PD}$ , and capacitor,  $C_{C, PD}$  (see Fig. 5) to couple the TIA ac ground to the PD cathode terminal. This internal PD biasing technique eliminates ground loops at the TIA input that can cause high-frequency ringing and significant degradation



Fig. 4. Block diagram of the proposed optical RX.



Fig. 5. Schematic of the front-end MS-SF-TIA, S2D, and PD biasing.

of RX sensitivity performance [6]. Because PD bias voltage exceeds 2 V,  $C_{C, PD}$  is implemented using metal oxide metal (MOM) capacitor with a high breakdown voltage. The biasing *RC* section has a corner frequency of 1 MHz ( $R_{B, PD} = 6 k\Omega$  and  $C_{C, PD} = 26.5 \text{pF}$ ). Voltage drop across  $R_{B, PD}$  does not exceed 360 mV for a maximum average PD current of 60  $\mu$ A (-9.1-dBm OMA) allowed by the 7-bit current DAC located at the TIA input (see Fig. 4).

## B. VGA

The differential VGA schematic is shown in Fig. 7. It consists of a VGA transconductance stage (VGA-GM) followed by VGA trans-impedance amplifiers (VGA-TIAs). VGA-GM



Fig. 6. AFE pulse response at 16 Gb/s obtained from post-layout simulations across different process corners. (a) Without and (b) with bandwidth trimming.

is implemented using a CMOS differential pair with top and bottom tail current sources. Because TIA output has a common-mode voltage of  $V_{DD}/2$ , implementing the VGA-GM using a CMOS input differential pair achieves better linearity performance with large input signal swing compared to PMOS- or NMOS-only differential-pair-based VGAs as in [5] and [16]. However, tail current sources reduce the voltage headroom, compared to pseudo-differential GM stage in [11], but they help improve the common-mode rejection and increase immunity to supply variations specially at low frequencies. Source degeneration resistors ( $R_{D1}$  and  $R_{D2}$ ) are used to enhance the VGA linearity performance. Simulation results show that the VGA achieves worst case linearity of 1.2% THD with 600-mV<sub>pp</sub> differential input swing at 0-dB gain.

VGA-TIAs are implemented using the SF-TIA topology in which feedback resistors ( $R_{F, VGA}$ ) are implemented using 3-bit resistor banks to tune the VGA gain. A common-mode feedback (CMFB) loop sets the VGA output common-mode



Fig. 7. Schematic of the differential VGA.



Fig. 8. Schematic of the DCOC circuitry.

voltage equal to a reference SF-TIA input common-mode voltage  $(V_{REF})$  by controlling the VGA-GM's tail current source. VGA has a programmable gain in the range from -6 to 9.5 dB, which is achieved using a transconductance of 20 mS and a variable  $R_{F, VGA}$  (250–1500  $\Omega$ ). It is important to note that: 1) the dynamic range of the prototype RX is limited by the maximum input dc current cancellation DAC rather than by the VGA and 2) VGA noise contribution is minimal because of the high gain of the proceeding MS-SF-TIA  $(R_F = 8.8 \text{ k}\Omega)$ . For best power efficiency, VGA was designed to have an output-pole dominant second-order transfer function. VGA consumes 1.6 mW and has a bandwidth of 10 GHz. Out of 1.6 mW, VGA-GM and VGA-TIAs consume 0.3 and 1.3 mW, respectively. Accounting for the 4.3-GHz front-end TIA bandwidth, VGA causes overall AFE bandwidth to be about 4 GHz.

## C. DCOC

The DCOC circuitry is shown in Fig. 8. AFE output offset voltage is sensed using a differential RC integrator and corrected by drawing unbalanced currents from the VGA-TIA inputs ( $V_{P, dc}$  and  $V_{N, dc}$ ). Two transconductance amplifiers (DCOC- $G_{Ms}$ ), implemented using current-mirror operational transconductance amplifiers (OTAs), provide the offset-compensation currents. DCOC- $G_{Ms}$  capacitive loading has minimal impact on the VGA bandwidth because of low VGA-TIA input impedance and the output capaci-



Fig. 9. Schematic of the samplers threshold generation voltage DACs.

tance of the DCOC- $G_{Ms}$  is small compared to VGA-TIAs input capacitance. The gain–bandwidth product (GBW) of the DCOC loop is given by

$$\text{GBW}_{\text{DCOC}} \approx \frac{G_M \times R_{F, \text{VGA}}}{2\pi \times C_{\text{dc}} \times R_{\text{dc}}}$$
(2)

where  $G_M$  is the DCOC- $G_{Ms}$  transconductance and  $R_{dc}$  and  $C_{dc}$  are the integrator resistance and capacitance, respectively. The DCOC loop introduces a high-pass cutoff frequency ( $F_C$ ) in the AFE transfer function, which equals its open-loop GBW.  $F_C$  must be much smaller than the input data rate to avoid baseline wander or data-dependent jitter. The implemented DCOC loop has  $F_C$  of 80 kHz. It is important to note that the DCOC loop does not impact the AFE noise performance because it corrects the AFE offset at the VGA-TIAs inputs where the received signal is already amplified by the front-end TIA and the VGA-GM.

## D. Samplers Threshold Generation

Threshold voltages for the duobinary slicers ( $V_{\text{THH}}$  and  $V_{\text{THL}}$ ) are generated using the TIA-based differential DAC scheme shown in Fig. 9. Equal and opposite currents ( $I_B$  and  $-I_B$ ) are drawn from two SF-TIAs (DAC-TIAs) to generate a differential output voltage ( $V_{D, \text{TH}}$ ) with a common mode ( $V_{C, \text{TH}}$ ) of  $V_{\text{DD}}/2$ .  $V_{D, \text{TH}}$  can be expressed as

$$V_{D, \text{TH}} \approx 2 \times I_B \times R_{F, \text{DAC}}.$$
 (3)

 $I_B$  is generated using a 7-bit current DAC and mirrored to the two DAC-TIAs. Fig. 10 shows the simulated TIA-based DAC output voltages ( $V_{P, \text{TH}}$  and  $V_{N, \text{TH}}$ ) along with their differential-mode ( $V_{D, \text{TH}}$ ) and common-mode ( $V_{C, \text{TH}}$ ) components. The threshold voltage resolution is about 2.5 mV, achieved using 5-k $\Omega$  feedback resistor and a 7-bit current DAC, which has a unit-cell current of 0.5  $\mu$ A. DAC-TIAs are downsized replicas of the VGA-TIAs to better track AFE common-mode PVT variations, which helps sampler operation as discussed later. The threshold generation DACs' codes are set manually during measurement.

## E. Duobinary Sampling and Decoding

A quarter-rate architecture is employed in the duobinary sampling back end in which each channel consists of two HSSs and an XOR gate. Each HSS is implemented using a



Fig. 10. Simulated TIA-based DAC's output (a) positive and negative voltages  $(V_{P, \text{TH}} \text{ and } V_{N, \text{TH}})$  and (b) differential-mode  $(V_{D, \text{TH}})$  and (c) common-mode  $(V_{C, \text{TH}})$  output voltages.



Fig. 11. Schematic of the HSS and the SA.

cascade of offset-compensated SAs (SA1 and SA2) followed by a symmetric SR latch [15], as shown in Fig. 11. Such a series combination of SAs significantly improves sampler sensitivity at the expense of half clock cycle additional latency. The first SA (SA1) is considered as a pre-amplifying stage for the second SA (SA2) for small AFE output. Note that there are no stringent timing constraints on the samplers as in the DFE case because of the feed-forward duobinary decoding. The SA schematic is also presented in Fig. 11. SA offset compensation is performed using two capacitor banks at its output. SA1 uses a differential-difference input stage to control the sampler trip point using the voltage provided by the threshold generation DACs. SA1 output trips when the positive and negative currents ( $I_P$ ,  $I_N$ ) drawn from its cross-coupled



Fig. 12. Schematic of the high-speed differential XOR gate.

latch are equal.  $I_P$  and  $I_N$  are expressed as

$$I_{P|N} = I_{P|N, AFE} + I_{N|P, TH} = \frac{k}{2} \Big( (V_{P|N, AFE} - V_T)^2 + (V_{N|P, TH} - V_T)^2 \Big)$$
(4)

where  $V_{P|N, AFE}$  and  $V_{P|N, DAC}$  are the AFE and DAC positive and negative outputs, respectively. By equating  $I_P$  and  $I_N$ from (4), the sampler differential trip point ( $V_{D, AFE}$ ) can be expressed in terms of the AFE common-mode voltage ( $V_{C, AFE}$ ) and DAC's differential and common-mode output voltages ( $V_{D, TH}$ ,  $V_{C, TH}$ ) as

$$V_{D, AFE} = V_{D, TH} \times \frac{V_{C, TH} - V_T}{V_{C, AFE} - V_T}.$$
(5)

Equation (5) shows that the sampler trip point  $(V_{D,AFE})$ equals  $V_{D, \text{TH}}$  only when  $V_{C, \text{AFE}} = V_{C, \text{TH}}$ . Because of the inability of the SA-based sampler in rejecting the common-mode input signal, any mismatch between common-mode voltages of AFE and DAC causes the sampler trip point to deviate from  $V_{D, \text{TH}}$ , as shown by (5). Due to the proposed TIA-based threshold generation DACs, common-mode voltages of DAC ( $V_{C, TH}$ ) and AFE ( $V_{C, AFE}$ ) are matched across PVT variations, thus resulting in a trip point condition of  $V_{D,AFE} = V_{D,TH}$ , as desired. More importantly, the constant AFE output common-mode voltage ( $V_{C,AFE}$ ), resulting from driving the VGA with a fully differential signal generated by the S2D, guarantees a fixed sampler trip point  $(V_{D, AFE})$  independent of the received data, thus significantly improving the RX sensitivity performance. High-speed differential XOR gate is implemented using differential cascode voltage switch logic (DCVS), as shown in Fig. 12 [17].

#### **IV. EXPERIMENTAL RESULTS**

Fig. 13 shows the complete block diagram of the implemented optical RX. The prototype RX was fabricated in a 65-nm CMOS technology and wire-bonded to an external p-i-n PD with a responsivity of 0.8 A/W and a capacitance of 120 fF. The RX occupies an active area of 0.088 mm<sup>2</sup>, and its micrograph is shown in Fig. 14. The optical test setup is shown in Fig. 15. The laser output is modulated by a duobinary pre-coded data using a LiNbO<sub>3</sub> Mach–Zehnder modulator and coupled to the PD via a single-mode fiber. The pre-coded data was loaded to the pattern generator (Keysight M8040A). A variable attenuator is used to set the optical signal to the desired level at the PD input for sensitivity measurements.



Fig. 13. Block diagram of the prototype optical RX.



Fig. 14. Die micrograph of the prototype optical RX.



Fig. 15. Optical test setup and micrograph of the RX bonded to the external p-i-n PD.

The optical signal received by the PD is ISI free and has an extinction ratio of 10 dB (see Fig. 15).

Fig. 16 shows the measured BER performance for the proposed duobinary and conventional NRZ sampling cases.



Fig. 16. Measured BER versus OMA at 12, 14, and 16 Gb/s for PRBS15 input pattern for NRZ (left) and duobinary sampling (right).

TABLE I

RX POWER BREAKDOWN AT 16 GB/S

| Building Block | Power [mW] |  |  |
|----------------|------------|--|--|
| TIA+S2D        | 2.4        |  |  |
| VGA            | 1.6        |  |  |
| Samplers       | 6.8        |  |  |
| Others         | 0.4        |  |  |
| Total          | 11.2       |  |  |

With PRBS15 data, the RX achieves the OMA sensitivities of -14.1, -13.7, and -11.6 dBm at 12, 14, and 16 Gb/s, respectively, at  $10^{-12}$  BER using the proposed duobinary sampling. However, using the conventional NRZ sampling, the error-free operation is achieved only at 12 Gb/s with the OMA sensitivity of -13 dBm. The measurement results demonstrate more than 33% data rate enhancement using the duobinary sampling compared to the NRZ sampling for the same RX AFE. Post-layout simulation results show that the maximum RX sensitivity degradation for the duobinary sampling due to

| Reference                | A. Cevrero [4]       | A. Sharif [6] | Proesel [7]    | S. Palermo [18]                  | M. Ahmed [5]       | This work                   |       |
|--------------------------|----------------------|---------------|----------------|----------------------------------|--------------------|-----------------------------|-------|
| Technology               | 14-nm FinFET<br>Bulk | 65-nm CMOS    | 90-nm CMOS     | 90-nm CMOS                       | 65-nm CMOS         | 65-nm CMOS                  |       |
| Architecture             | TIA + 1-tap<br>DFE   | TIA + IIR DFE | Res. + IIR DFE | Integrating<br>Double<br>Sampler | TIA + 4-tap<br>DFE | TIA + duobinary<br>Sampling |       |
| Data rate [Gb/s]         | 64                   | 20            | 9              | 16                               | 12                 | 12                          | 16    |
| Rx Cin [fF]              | 69                   | 200           | 140            | 440                              | 100                | 180                         |       |
| PD Responsivity<br>[A/W] | 0.52                 | 0.5           | 0.55           | 0.5                              | 0.75               | 0.8                         |       |
| Power [mW]               | 91                   | 14.1          | 8.4            | 23                               | 23                 | 9.5                         | 11.2  |
| Efficiency<br>[pJ/Bit]   | 1.4                  | 0.705         | 0.93           | 1.43                             | 1.9                | 0.79                        | 0.7   |
| Sensitivity OMA<br>[dBm] | -5.5                 | -5.8          | -5             | -2.4                             | -16.8              | -14.1                       | -11.6 |

 TABLE II

 Performance Summary and Comparison With State-of-the-Art Optical RXs



Fig. 17. Measured bathtub curves for NRZ and duobinary sampling at 16 Gb/s.

the PVT variations is 1.6 dB and occurs in the SS process corner.

Bathtub curves measured at 16 Gb/s with an input OMA of -9.6 dBm are shown in Fig. 17. The measured results show that a timing margin of 25% UI at BER <  $10^{-9}$  is achieved in all quarter-rate RX channels. The total power consumption at 16 Gb/s is 11.2 mW (0.7 pJ/bit) and the power breakdown is shown in Table I. The RX performance summary and comparison with the state-of-the-art clocked optical RXs is shown in Table II. Compared to [5], the proposed RX achieves  $2.7 \times$  energy efficiency improvement and roughly the same sensitivity at 12 Gb/s after accounting for  $1.8 \times$  less PD capacitance in [5].

## V. CONCLUSION

High-sensitivity optical RXs are the key components that enable low-power operation in modern data-center interconnects. This article demonstrated a high-sensitivity lowpower optical RX by using duobinary sampling in conjunction with a low-bandwidth TIA. Duobinary sampling leverages well-controlled ISI introduced by the low-bandwidth TIA, instead of canceling it, as in the case of a DFE, to improve RX sensitivity. The proposed duobinary sampling technique achieves nearly the same sensitivity improvement as a one-tap DFE when the TIA bandwidth is only 25% the data rate (4 GHz for 16 Gb/s). A prototype of the proposed RX was fabricated in a 65-nm CMOS process and wire-bonded to a p-i-n PD. The measured OMA sensitivity at 12 and 16 Gb/s was -14.1 and -11.6 dBm, respectively, the highest reported sensitivity in 65-nm CMOS technology at 16 Gb/s with 0.7-pJ/bit energy efficiency.

## ACKNOWLEDGMENT

The authors would like to thank Vito Boccuzzi for his assistance in testing the receiver prototype. They would also like to thank Analog Devices for the partial financial support.

#### REFERENCES

- C. Kachris and I. Tomkos, "A survey on optical interconnects for data centers," *IEEE Commun. Surveys Tuts.*, vol. 14, no. 4, pp. 1021–1036, 4th Quart., 2012.
- [2] M. Dayarathna, Y. Wen, and R. Fan, "Data center energy consumption modeling: A survey," *IEEE Commun. Surveys Tuts.*, vol. 18, no. 1, pp. 732–794, 1st Quart., 2016.
- [3] C. Kachris, K. Kanonakis, and I. Tomkos, "Optical interconnection networks in data centers: Recent trends and future challenges," *IEEE Commun. Mag.*, vol. 51, no. 9, pp. 39–45, Sep. 2013.
- [4] A. Cevrero et al., "A 64 Gb/s 1.4pJ/b NRZ optical-receiver data-path in 14 nm CMOS FinFET," in *IEEE Int. Solid-State Circuits Conf. (ISSCC)* Dig. Tech. Papers, Feb. 2017, pp. 482–483.
- [5] M. G. Ahmed *et al.*, "A 12-Gb/s -16.8-dBm OMA sensitivity 23-mW optical receiver in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 445–457, Feb. 2018.
- [6] A. Sharif-Bakhtiar and A. C. Carusone, "A 20 Gb/s CMOS optical receiver with limited-bandwidth front end and local feedback IIR-DFE," *IEEE J. Solid-State Circuits*, vol. 51, no. 11, pp. 2679–2689, Nov. 2016.
- [7] J. Proesel, A. Rylyakov, and C. Schow, "Optical receivers using DFE-IIR equalization," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 130–131.
- [8] D. Li et al., "A low-noise design technique for high-speed CMOS optical receivers," *IEEE J. Solid-State Circuits*, vol. 49, no. 6, pp. 1437–1447, Jun. 2014.
- [9] V. Stojanovic *et al.*, "Autonomous dual-mode (PAM2/4) serial link transceiver with adaptive equalization and data recovery," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 1012–1026, Apr. 2005.
- [10] A. Lender, "The duobinary technique for high-speed data transmission," *Trans. Amer. Inst. Elect. Eng., I, Commun. Electron.*, vol. 82, no. 2, pp. 214–218, May 1963.
- [11] K. R. Lakshmikumar *et al.*, "A process and temperature insensitive CMOS linear TIA for 100 Gb/s/ λ PAM-4 optical links," *IEEE J. Solid-State Circuits*, vol. 54, no. 11, pp. 3180–3190, Nov. 2019.
- [12] M. Yoneyama, K. Yonenaga, Y. Kisaka, and Y. Miyamoto, "Differential precoder IC modules for 20-and 40-Gbit/s optical duobinary transmission systems," *IEEE Trans. Microw. Theory Techn.*, vol. 47, no. 12, pp. 2263–2270, Dec. 1999.

- [13] J. Lee, M.-S. Chen, and H.-D. Wang, "Design and comparison of three 20-Gb/s backplane transceivers for duobinary, PAM4, and NRZ data," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2120–2133, Sep. 2008.
- [14] E. Säckinger, Broadband Circuits for Optical Fiber Communication. Hoboken, NJ, USA: Wiley, 2005.
- [15] P. K. Hanumolu, G.-Y. Wei, and U.-K. Moon, "A wide-tracking range clock and data recovery circuit," *IEEE J. Solid-State Circuits*, vol. 43, no. 2, pp. 425–439, Feb. 2008.
- [16] K. C. Chen and A. Emami, "A 25-Gb/s avalanche photodetector-based burst-mode optical receiver with 2.24-ns reconfiguration time in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 6, pp. 1682–1693, Jun. 2019.
- [17] K. M. Chu and D. L. Pulfrey, "Design procedures for differential cascode voltage switch circuits," *IEEE J. Solid-State Circuits*, vol. SSC-21, no. 6, pp. 1082–1087, Dec. 1986.
- [18] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, "A 90 nm CMOS 16 Gb/s transceiver for optical interconnects," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2007, pp. 44–586.



**Mostafa G. Ahmed** (Student Member, IEEE) received the B.Sc. (Hons.) and M.Sc. degrees in electrical engineering from Ain Shams University, Cairo, Egypt, in 2011 and 2015, respectively. He is currently pursuing the Ph.D. degree with the University of Illinois at Urbana–Champaign, Champaign, IL, USA.

From 2011 to 2015, he was an Analog/Mixed-Signal Design Engineer with Si-Ware Systems, Cairo, designing high-performance clocking circuits and local oscillator (LC)-based reference oscillators.

From June 2016 to August 2017, he was a Research Intern with Elenion, New York, NY, USA, developing limiting/linear trans-impedance amplifiers (TIAs) and optical modulator drivers for direct detection and coherent optical links. He is currently a Research Assistant with the University of Illinois Urbana–Champaign, Urbana, IL, USA. His current research interests include high-speed optical communication links, clocking circuits, and RF/millimeter-wave circuits.



**Dongwook Kim** (Student Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Seoul National University, Seoul, South Korea, in 2010 and 2012, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana–Champaign, Champaign, IL, USA, in 2019.

From 2012 to 2015, he was an Engineer with SK Hynix, Icheon, South Korea, where he worked on high-speed interface circuits in mobile devices. He was an Intern with Intel PSG, San Jose, CA,

USA, and Marvell, Santa Clara, CA, USA, during the summers of 2016 and 2018, respectively. He is currently an Analog IC Designer with Apple Inc., Austin, TX, USA. His research interests include high-speed serial links and optical links.

Dr. Kim was a recipient of the Analog Devices Outstanding Student Designer Award in 2018. He serves as a Reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS.



**Romesh Kumar Nandwana** (Member, IEEE) received the B.Tech. degree in electronics and communication engineering from the Motilal Nehru National Institute of Technology, Allahabad, India, in 2009, the M.Eng. degree in electrical engineering from Oregon State University, Corvallis, Hillsboro, OR, USA, in 2013, and the Ph.D. degree in analog and mixed-signal design from the University of Illinois at Urbana–Champaign, Champaign, IL, USA, in 2017.

From 2009 to 2010, he was a Scientist at the Indian Space Research Organization, Ahmedabad, India, working on the design of RF power amplifiers and dc–dc converters for communication satellites. As a Graduate Intern, he has also worked on high-speed link circuits design at Intel Labs, Hillsboro, OR, USA, in 2016, and Xilinx Inc., San Jose, CA, USA, in 2015. He is currently working as the Technical Leader at Cisco Systems, Allentown, PA, USA, developing high-speed/mixed-signal circuits and architectures for next-generation silicon photonic I/Os. His research interests include frequency synthesizers, digital phase-locked loops, clock and data recovery circuits, high-speed serial links, and low-voltage mixed-signal circuits.

Dr. Nandwana was a recipient of the 2016–2017 IEEE Solid-State Circuits Society Predoctoral Achievement Award for his thesis work on clock generation circuits for high-speed I/Os.



Ahmed Elkholy (Member, IEEE) received the B.Sc. (Hons.) and M.Sc. degrees in electrical engineering from Ain Shams University, Cairo, Egypt, in 2008 and 2012, respectively, and the Ph.D. degree from the University of Illinois at Urbana– Champaign, Urbana, IL, USA, in 2016.

From 2008 to 2012, he was with Si-Ware Systems, Cairo, designing high-performance clocking circuits. In 2014, he was with Xilinx Inc., San Jose, CA, USA. In 2016, he was with Silicon Labs, Austin, TX, USA. In 2017, he was a Post-Doctoral Research

Associate with the University of Illinois at Urbana–Champaign. From 2017 to 2019, he was the Co-Founder of PhaseBits, Inc., Champaign, IL, USA, developing high-performance timing solutions. He is currently a Senior Staff Scientist with Broadcom, Irvine, CA, USA, developing high-speed transceivers for optical and backplane/cable applications. His current research interests include frequency synthesizers, high-speed serial links, and data converters.



Kadaba R. (Kumar) Lakshmikumar (Life Fellow, IEEE) received the B.E. and M.E. degrees in electrical communication engineering from the Indian Institute of Science, Bengaluru, India, in 1974 and 1976, respectively, and the Ph.D. degree in electrical engineering from Carleton University, Ottawa, ON, Canada in 1985.

He has held senior engineering positions at Bell Labs, Murray Hill, NJ, USA, Multilink, Somerset, NJ, USA, and Conexant Systems, Red Bank, NJ, USA. He is currently the Principal Engineering

Manager of the Silicon Photonics Division, Cisco Systems, Allentown, PA, USA. He did pioneering work in the area of modeling mismatch in MOS devices for his doctoral work. The standard deviation of mismatch was shown to be inversely proportional to the square root of the channel area. His paper in the December 1986 issue of the IEEE JOURNAL OF SOLID-STATE CIRCUITS is among the top 20 cited publications of the journal between 1968 and 1992. As of now, the article has more than 800 citations.

Dr. Lakshmikumar has served on the Technical Program Committee of the IEEE Custom Integrated Circuits Conference, the IEEE International Solid-State Circuits Conference, and the Compound Semiconductor Integrated Circuits Symposium and an Associate Editor for the IEEE JOURNAL OF SOLID-STATE CIRCUITS. He presented a tutorial titled "PLL Design in Nanometer CMOS" at ISSCC 2010. In 2015, he presented a short course, "Clock and Data Recovery Techniques for Optical Communication Systems," at CSICS.



**Pavan Kumar Hanumolu** (Fellow, IEEE) received the Ph.D. degree from the School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA, in 2006.

He was with Oregon State University until 2013. He is currently a Professor with the Department of Electrical and Computer Engineering and a Research Professor with the Coordinated Science Laboratory, University of Illinois at Urbana–Champaign, Urbana, IL, USA. His current research interests include energy-efficient integrated circuit implemen-

tation of analog and digital signal processing, sensor interfaces, wireline communication systems, and power conversion.

Dr. Hanumolu is the Editor-in-Chief of the IEEE JOURNAL OF SOLID-STATE CIRCUITS.