# Co-design of CML IO and Interposer Channel for Low Area and Power Signaling

Muhammad Waqas Chaudhary, Andy Heinig Fraunhofer Institute for Integrated Circuits IIS Design Automation Division EAS 01069 Dresden, Germany Email: {muhammad.chaudhary, andy.heinig}@eas.iis.fraunhofer.de

Abstract-In recent years, 2.5D integration of ICs on Interposer is becoming popular for highly integrated miniaturized systems. To combine two or more chips together, there is a lot of communication between the chips and this needs either a very high number of slow channels or numerous high speed channels. To find an optimum number and speed of interposer channels is an important task. In conventional PCB data communication systems, very high speed serial data transmission circuits are used which take a lot of area and power. While in 2.5D systems, area-power are strict constraints and the interposer channel is drastically different from PCB channel in terms of its electrical properties. To enable high bandwidth chip-to-chip interposer communication with low area-power requirements, it is mandatory to co-design the interposer channel and IO circuit. To address the issue, this paper discusses the electrical properties of 2.5D channel segments along with a co-design methodology targeting optimum area-power cost for maximum bandwidth current mode logic differential driver.

## I. INTRODUCTION

In 2.5D integration, multiple bare dies are assembled using microbumps onto an interposer which can have multiple metal layers for inter-die routing and also Through silicon vias (TSV) to get signals in and out of complete 2.5D integrated system. A comparison of the 3D and 2.5D integration is shown in figure 1. 3D integration includes TSVs for inter die communication while 2.5D integration uses interposer metal layer interconnects.

One of the most important application of 2.5D systems is the Memory-SOC integration onto an interposer. As shown in figure 2, two dies of different sizes developed in different technology nodes are assembled on an interposer. Such a system will allow the large Memory channel interconnects on the PCB in the standard industry design to be shifted on the metal layers of interconnect with very high interconnect density. But this high interconnect density comes at the cost of two main problems, one is the high resistive loss and other is the dielectric loss. These losses have to be reduced



Fig. 1. 3D and 2.5D system comparison



Fig. 2. Memory SOC on interposer integration

to achieve Gb/s range of signaling with low power. Also, at high data rates, when interconnects behave as transmission line then impedance discontinuities become a problem. So all the segments of 2.5D channel have to studied in terms of their characteristic impedance. Alongside, low power signaling circuits are also essential for achieving the maximum benefit of 2.5D integration in terms of high BW-Density.

For inter-die signaling, depending upon the data rate, the complete 2.5D interconnect channel has to be studied in terms of width, spacing effect on signaling power and bandwidth to achieve the optimum Energy/bit\*Pitch for the whole system. Research community has targeted this problem in recent years. In [2], Embedded interposer in PCB is investigated and only CMOS signaling bandwidth for RC interconnects is shown for minimum 4 µm pitch. In [1], Interposer CML IO circuits are presented with usage of inductive peaking which takes alot of area. In our work, no peaking techniques are used and simplest forms of current mode logic (CML) driver and receiver are considered to explain the CML Interposer codesign methodology. In [4], investigation of different transmission line topologies is presented which shows that coplanar topology with floating beneath metal lines instead of ground give the minimum attenuation and highest bandwidth. In [3], comparison of silicon, glass and organic interposer is presented which shows that silicon substrate interposer has higher attenuation factor as compared to glass and organic ones. In [5], Through silicon via (TSV) options are described along with routing problems for 2.5D integration based processormemory interfaces. In our work, the 2.5D channel is studied along with the co-design of signaling circuit to achieve the



Fig. 4. Complete 2.5D Channel signal path

maxmimum bandwidth with minimum power-area usage on the interposer. Main contributions of this paper are

- 2.5D channel characterization
- Power-area product optimized channel and driver codesign methodology

In the rest of the paper, Section II discusses the design of complete 2.5D channel and shows the necessary equations to characterize the whole signal path. Section III describes the design of Current mode logic (CML) driver and receiver. Section IV discusses the co-design methodology and algorithm used to find the optimum power area cost. Section V presents the results and simulations of the CML driver under the full path of 2.5D channel. Finally, Section VI summarizes the work and concludes the paper.

#### II. INTERPOSER CHANNEL DESIGN

2.5D Interposer channel consists of copper pillar pads, copper pillars, copper pillar pad on Interposer, metal interconnect, pillar and pad on the receiver end as shown in figure 4.

It is necessary to characterize all the components of the path to enable good signal integrity on the receiver end. Copper pillars though having very small length, still can effect the signal integrity under high frequencies. Interconnect can have a very high DC loss at low frequencies, high resistance due to skin effect at high frequencies, along with capacitive and inductive effects. A general RLGC model of whole path is shown in figure 3. Electrical modeling of all these segments and their corresponding RLGC(f) or other models are presented in next subsections.

# A. Copper Pillars

In [6], electrical and mechanical properties of copper pillars are studied and equations are presented for modeling. Only electrical characteristics of copper pillars for signal integrity are being described. Since the copper pillars are in grid form, it makes sense to model the copper pillar surrounded by four other pillars as shown in figure 5. H is the height of the pillar,



Fig. 5. Copper pillar grid

D is the diameter and d is the spacing between the pillars as shown in figure.

The characteristic impedance of the pillars can be written as

$$Z_o(w) = \sqrt{\frac{R(w) + L}{C}} \tag{1}$$

Choosing the height (H) of copper pillars as 100  $\mu$ m and diameter (D) 50  $\mu$ m, it gives the capacitance of 12 fF and inductance of 40 pH. For a pitch of 100  $\mu$ m and diameter of 50  $\mu$ m, the resulting  $Z_o$  is 60  $\Omega$ .

#### B. Metal interconnect

After the copper pillars, the next channel segment in 2.5D interposer is the metal interconnect as shown in figure 4. This segment is the longest part of the channel (5 - 10mm generally) and can be considered as the most important in terms of the effect on signal integrity and eye-digram at the input of the receiver which needs minimum eye width and eye height to capture the signal into the core. Metal interconnect can be considered as a rectangular structure of length (L), height (h), width (W), differential pair spacing (S) and spacing between ground shield to signal line (d) as shown in figure 7. It shows an interconnect pair (+,-) carrying a high speed differential signal surrounded by two ground shields on both sides to avoid the unwanted crosstalk from surrounding interconnects. The width of all the lines shown is same as (W). In this system, the material surrounding the interconnect is Silicon dioxide  $(S_iO_2)$  and the substrate shown is made of Silicon. Now, the



Fig. 3. Complete 2.5D channel model from Output driver cell to receiver buffer

most important task is to accurately characterize the electrical properties of the differential pair in terms of W, S and d while the rest of geometrical and material properties are constant.

In comparison to normal PCB lines, interposer lines with minimum width have very high Resistance per unit length (R) which makes the behavior of these lines as RC lines which can be modeled using simple lumped RC interconnect or distributed RC which gives more accuracy. But these minimum width lines are not good for high speed data transmission between the chips on the interposer because of very low RC bandwidth of the interconnect. In such behavior, impedance matching is not necessary as the inductive component is not large and ringing even if it happens will die down over the interconnect due to high resistive losses [9]. Also, there are losses due to field lines entering the substrate which can be represented by conductance per unit length (G). Capacitance per unit length (C) also needs to be modeled along with the inductance per unit length (L). A simplified representation of the complete 2.5D path is shown in figure 6. For PCB systems, the microstrip lines and coplanar differential lines are very low resistance and show very low loss at frequencies around 5Gb/s. But interposer interconnect has high losses and has to be designed with special care for avoiding high losses to operate in the low-loss region at desired frequency. For this kind of codesign, RLGC(f) models derived from S-parameter models extraced from Full 3D Electromagnetic simulation of the model are highly effective for final design check before sending for fabrication.

The elements RLGC(f) are three dimensional matrices of order  $n \times m \times m$  at *n* frequency points defined in the Vector f where m defines the number of interconnect lines whose model is described in the matrices. For 10Gb/s link, R and G have to be low enough and L,C to be comparatively high enough to work in the RLC or Quasi-TEM region instead of extremely lossy RC behavioral region [4]. These matrices can be derived in terms of changing width and Spacing rather than frequency. In that RLGC matrices would be expressed as a function of W, S as RLGC(W, S). In comparison to RLGC models, Scattering parameters model are easier to use and derive from 3D Electromagentic field sovers. For a simple 2-line interconnect, for maximum accuracy, S-parameter model can be extracted from 3D field solver and then used in the simulation with the driver and receiver model to see the Eye-



Fig. 7. Differential pair on Silicon interposer

diagram at the receiver end.

#### III. CML DRIVER AND RECEIVER DESIGN

In current mode signaling, instead of sending voltage pulses to the Receiver, current pulses are sent on the transmission lines which are then converted into Voltage using either simple Termination resistors or transimpedance amplifiers. In this work, to keep the analysis simple, simple resistors are used at the receiver front end to convert current pulses to voltage signals which can then be interpreted by Comparators. A general schematic of the CML driver is shown in figure 8, which shows two input transistors in saturation region under common mode at  $V_{CM}$  driving  $I_{bias}$  into the ground. As described in [7], if the interconnect impedance and driver impedance are matched to suppress the ringing at the receiver end, then half of Ibias goes into the receiver end impedance giving a differential voltage swing of  $\frac{I_{bias}}{R}$  where R is the impedance of the Driver  $(T_x)$  output impedance, Receiver  $(R_x)$ input impedance and Interconnect impedance at the working frequency. Generally good receivers are capable to interpret signals  $\geq 100mV$  which means we can design the CML Driver for lowest power until it does not cross the minimum input voltage swing requirements at the receiver.

Generally, the impedance of these CML drivers is kept as  $50\Omega$  to match the co-axial cable impedance and the receiver impedance. But when these circuits are used in 2.5D integrated systems for data transmission between chips, these circuits can make use of the high impedance design to lower the required current (I) for a given required Voltage swing (V).



Fig. 6. Simplified 2.5D channel model



Fig. 8. CML Driver  $T_x$  through Transmission line to  $R_x$ 

To get 100mV swing, if we can design the system for single ended impedance higher than 50 $\Omega$ , then we can save the Power and reduce the power consumption requirements of the whole system which will be a big factor in the industry opting for this technology.

Along with the multiple stage driver, to get the Random stream data from the core logic in the chip to the driver, a cmos to cml converter is needed which is designed using a differential amplifier approach with resistance to push the output with the driver required common mode voltage  $V_{CM}$  and desired voltage swing  $V_{SW}$ . On the Receiver side, on-chip resistors using Poly layer are designed to achieve the required swing at the input of open loop voltage comparator used to convert the differential swing voltage into single ended voltage and then an inverter stage to convert it into full swing digital logic value. Finally, a high to low level shifter is required to convert the higher  $V_{DDIO}$  signal into lower  $V_{DDC}$  full swing digital signal which can then be processed in the core side of the receiver chip.

#### IV. CO-DESIGN METHODOLOGY

In the previous sections II and III, Channel design and CML TX, RX design technique is presented. Now, the co-design methodology for these two is being discussed. Consider a stack up shown by figure 9, in which there are two metal layers of Copper in  $S_iO_2$  dielectric over a silicon substrate. A coplanar architecture is considered in which a differential pair is surrounded by ground lines for shielding purposes and has ground lines under it on the lower metal layer, all separated with a constant spacing (S) with same width (W).

The goal of the co-design is to investigate the performance of this coplanar architecture with different width and spacing values and then to find the W and S values for which minimum Energy\*Pitch (Enpitch) cost is achieved at the maximum possible Effective 3dB Bandwidth ( $BW_{eff}$ ). Once 2D Field Solver has extracted the RLGC model for all possible values of W and S, then the first thing to do is to find out the odd mode impedance  $Z_{odd}$  for each W,S value combination.  $Z_{odd}$ can be calculated using Equation 2 where  $L_o$ ,  $L_m$ ,  $C_o$  and  $C_m$  represent the total, mutual inductances and capacitances respectively.

$$Z_{odd} = \sqrt{\frac{L_o - L_m}{C_o + C_m}} \tag{2}$$

In this configuration, it is assumed that dielectric conductance factor G of RLGC model is not significant which will also be confirmed by simulation results in the next section V. So, the attenuation over the line will be only due to the conductive losses due to interconnect resistance R. This attenuation factor  $\alpha$  in decibals for 10mm line differential pair can be calculated by Equation 3 where  $R_o$ ,  $R_s$  are dc resistance and skin effect resistance factor values.

$$\alpha_{dB} = 8.686 \left[ \frac{\left( R_o + R_s \sqrt{f} \right) / 100}{2Z_{odd}} \right]$$
(3)

By plotting and finding the 3dB Bandwidth  $BW_{ch}$  for each W, S configuration, 3dB (10-90) rise time  $tr_{ch}$  of the link interconnect can be calculated using the Equation 4.

$$tr_{ch} = \frac{0.35}{BW_{ch}} \tag{4}$$

Since, in this methodology, reflection due to impedance mismatch is targeted to be zero, then R in the  $T_x$  and  $R_x$ shown in 8 must be equal to the  $Z_{odd}$  value. This means that effective or final rise time  $tr_{tot}$  at the receiver input can be calculated using the Equation 5.

$$tr_{tot} = \sqrt{9.68(C_{pad}Z_{odd})^2 + \left(\frac{0.35}{BW_{ch}}\right)^2}$$
(5)

Then the total or final Bandwidth  $BW_{tot}$  can be calculated using the inverse of the Equation 4. Now, once for each W, S value, the total Bandwidth is known, the next step is to find the power and Signal pitch cost for each configuration. For current mode logic driver as shown in Figure 8, the power consumed is only static which can be calculated simply as the product of supply voltage  $V_{DD}$  and  $I_{bias}$ . For a given voltage swing  $V_{SW}$  requirement, the current required is  $V_{DD}/R$  where R is equal to  $Z_{odd}$  for our work. For area cost, metric is the signal pitch for such coplanar configuration which is simply 3 times (S + W). So, the final metric for our co-design is energy\*pitch/bit which can be calculated by Equation 6.

$$Energy * Pitch/bit = \left(\frac{V_{DD}V_{SW}3(S+W)}{Z_{odd}BW_{tot}}\right)$$
(6)

The objective is to minimize the energy\*pitch cost metric and from the given configurations, the configuration which gives the required Bandwidth with minimum cost of area and energy per bit can be used.



Fig. 9. Stackup used for Simulation

## V. RESULTS AND DISCUSSION

The stackup used for simulations and methodology evaluation is shown in figure 9. The height of metal layers and inter layer spacing along with silicon substrate height and dielectric constants are also shown in the stackup. The topology is simple with a coplanar pair of width W separated by spacing S with grounded metal lines below.

This configuration with different values of W, S is simulated using HSPICE 2D Field Solver, which resulted in RLGC extracted models. The G matrix containing dielectric conductance factor is zero in the extracted RLGC model which means that attenuation can be only dependent upon conductance of the interconnect. The calculated attenuation values using Equation 3 is plotted in figure 10.

Also, Odd mode differential impedance  $Z_{diff}$  is plotted in figure 11 which shows that with increasing spacing of metal lines, the inductance increases which results in increased impedance but with larger increase in width, the capacitance increases which makes the impedance lower. As can be seen in the plot also,  $Z_{diff}$  reaches a peak at 5um width but with further increases in width, it decreases. One simple conclusion from this plot can be easily inferred that at 5um width,



Fig. 10. Attenuation vs Width for S=W



Fig. 11. Z<sub>diff</sub> variation with Width(W) and Spacing(S)

differential impedance is maximum for this kind of stackup topology which could lead to lowest power CML design.

But the bandwidth is still question. To answer this, Bandwidth has to be calculated which at first needs a value for the PAD capacitance ( $C_{PAD}$ ) which is selected to be 0.2pF which will count for the 500V Human Body Model (HBM) and 100V Charge devie Model (CDM) ESD requirements as given in JEDEC ESDA standards [8]. Then, 3dB effective Bandwidth for the whole path from  $T_x$  to  $R_x$  is plotted in figure 12 which shows that bandwidth increases with increasing width and spacing. But this will drastically increase the area cost of the design. This means that a combined energy/bit\*pitch metric plotted in figure 13 is needed for optimum configuration selection. Power supply value  $V_{DD}$  is 1.8V and required  $V_{SW}$ is 300mV. From the plot, it can be seen that cost metric achieves a minimum for 10um width with 10um spacing which achieves a 3dB Bandwidth of 10GHz without any kind of preemphasis and equalizer circuit in the  $T_x$  or  $R_x$ .

The complete Tx, Rx, and interconnect model, are simulated under all corners and eye diagram is computed for 5Gb/s PRBS pattern as shown in figure 14.

Even without using any equalizer technique, this design is able to achieve about 300mV differential peak to peak voltage swing at the receiver input which is good enough for this design. Consider for example that a chip has to be designed with side length of 3mm for maximum bandwidth and minimum power. Then using 10um width and spacing, 50 differential pairs of Current mode logic links running at 10Gb/s each can be located on the interposer resulting in total bandwidth of 500Gb/s. This assumes that bottleneck is the interconnect area not the CML Driver and Receiver area which is true because ESD requirements are very small for such IO Cells and also there is no equalizer in either  $T_x$  or  $R_x$ .



Fig. 12. 3dB Bandwidth variation with Width(W) and Spacing(S)



Fig. 13. Energy\*pitch variation with Width(W) and Spacing(S)

#### VI. CONCLUSION

In this paper, CML driver and receiver design is presented along with the 2.5D channel complete path electrical modeling discussion and how the circuit design can be optimized with the interconnect design. Interconnect path segments including copper pillars, metal lines on interposer and their modeling is discussed. Finally, some simulation results are shown at the 5Gb/s data rate PRBS pattern for the circuit designed in 28nm CMOS node. A co-design methodology is discussed in detail for minimum area and power coplanar differential pair topology for Current mode logic based differential signaling on 2 metal layer interposer. The results show that at 10um width with 10um spacing, 10GHz bandwidth is achievable without any kind of equalizer in  $T_x$  and  $R_x$ . This work can be further extended in future for different differential pair topologies along with optimization for other signaling circuit techniques like Low voltage differential signaling (LVDS). Also, fabrication and measurements for such structures to prove simulation results is also very important which will be undertaken in future.

#### REFERENCES

- Jiacheng Wang, Shunli Ma, P. D. S. Manoj, Mingbin Yu, R. Weerasekera and Hao Yu, "High-speed and low-power 2.5D I/O circuits for memorylogic-integration by through-silicon interposer," 3D Systems Integration Conference (3DIC), 2013 IEEE International, San Francisco, CA, 2013, pp. 1-4
- [2] A. Martwick and J. Drew, "Silicon interposer and TSV signaling," Electronic Components and Technology Conference (ECTC), 2015 IEEE 65th, San Diego, CA, 2015, pp. 266-275
- [3] Y. Kim, J. Cho, K. Kim, V. Sundaram, R. Tummala and J. Kim, "Signal and power integrity analysis in 2.5D integrated circuits (ICs) with glass, silicon and organic interposer," Electronic Components and Technology Conference (ECTC), 2015 IEEE 65th, San Diego, CA, 2015, pp. 738-743
- [4] Siming Pan and B. Achkir, "Comparative study of transmission lines design for 2.5D silicon interposer," Electromagnetic Compatibility (EMC), 2013 IEEE International Symposium on, Denver, CO, 2013, pp. 312-316
- [5] A. Heinig, M. W. Chaudhary, P. Schneider, P. Ramm and J. Weber, "Current and future 3D activities at Fraunhofer," 3D Systems Integration Conference (3DIC), 2015 International, Sendai, 2015, pp. FS4.1-FS4.3
- [6] Ate He, Tyler Osborn, Sue Ann Bidstrup Allen, and Paul A. Kohl All-Copper Chip-to-Substrate Interconnects Part II. Modeling and Design J. Electrochem. Soc. 2008 155(4): D314-D322; doi:10.1149/1.2839014
- [7] Heydari, P.; Mohavavelu, R., "Design of ultra high-speed CMOS CML buffers and latches," in Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on , vol.2, no., pp.II-208-II-211 vol.2, 25-28 May 2003
- [8] JEP155A.01 JOINT JEDEC/ESDA standard Recommended ESD Target Levels for HBM/MM Qualification
- [9] [Online]@http://www.rle.mit.edu/isg/documents/BSKimPhDThesis.pdf



Fig. 14. 5Gb/s simulation