# Design and Optimization of Inductive-Coupling Links for 3-D-ICs Benjamin J. Fletcher, Student Member, IEEE, Shidhartha Das, Member, IEEE, and Terrence Mak, Senior Member, IEEE Abstract—Recent research in the field of 3-D system integration has looked to the use of inductive-coupling links (ICLs) to provide vertical connectivity without incurring the inflated fabrication and testing costs associated with through-silicon vias. For power-efficient ICL design, optimization of the utilized physical inductor geometries is essential, but currently must be performed manually in a process that can take several hours. As a result, the generation of optimized inductor designs poses a significant challenge. In this paper, we address this challenge in three main contributions: 1) a novel, nonuniform planar inductor layout that exhibits enhanced performance when compared with conventional uniform inductors; 2) a rapid solver for evaluating inductor layouts; and 3) a high-speed optimization algorithm for determining best performing coil pairs. These three contributions are combined as a CAD tool for optimization of ICLs for 3-D-ICs (COIL-3-D). Results demonstrate that COIL-3-D achieves an average accuracy within 7.8% of finite-element tools consuming a small fraction of the time $(1.5 \times 10^{-3} \%)$ , significantly ameliorating the design of ICL-based 3-D-ICs. We also demonstrate that using COIL-3-D to optimize ICL inductor layouts can yield significant performance (up to 41.5% bandwidth improvement) and power (up to 8.1% power improvement) benefits, when compared with layouts used in prior ICL implementations. For these reasons, this paper unlocks new potential for low-cost, power-efficient 3-D integration using ICLs. Index Terms—3-D-IC, antenna design, inductive coupling, optimization, wireless channel. ### I. INTRODUCTION THREE-DIMENSIONAL system integration has emerged as a promising "more-than-Moore" technology whereby dies are stacked vertically; increasing device density, shortening interconnect, and hence enhancing the performance of ICs [1]. Typically, research surrounding 3-D integration focuses on through-silicon vias (TSVs) to provide electrical connections between dies, however, incorporating TSVs introduces many additional processing steps, resulting in inflated fabrication and testing costs, in addition to diminished yields [2]. An alternative solution, which overcomes these problems, is the use of inductive-coupling links (ICLs) [3]. Here, planar inductors are fabricated in each stacked die, allowing transmission of ac data via electromagnetic (EM) coupling. These systems can Manuscript received May 10, 2018; revised September 15, 2018; accepted October 22, 2018. (Corresponding author: Benjamin J. Fletcher.) S. Das is with ARM Ltd., Cambridge CB1 9NJ, U.K. (e-mail: shidhartha.das@arm.com). Digital Object Identifier 10.1109/TVLSI.2018.2881075 make use of standard CMOS processes (without any additional fabrication steps), reducing costs and enhancing yield. ICLs are often criticized for their inferior power efficiency (compared with TSVs), and therefore, when designing ICLs, it is essential that the utilized inductor geometries are optimized. Presently, this involves using finite-element method (FEM) tools for EM analysis, and then converting the system's EM characteristics into equivalent circuit models that can be handled by electrical simulators (e.g., SPICE). The layout can then be manually adjusted, and the process repeated until a satisfactory solution is found. Solvers using FEM, however, often take several hours to converge even while analyzing a single geometry [4]. Due to this, determining coil pairs with optimized geometries (which typically necessitates analyzing thousands of geometries) is extremely computationally expensive, if not impossible. To partially reduce this complexity, all previous work surrounding 3-D system integration using ICLs utilize uniform spiral inductors (where the trace width and spacing remain constant between turns of the inductor) [5]–[8]. While this reduces the design complexity of the system, nonuniform inductor layouts are often more efficient [9]. This paper addresses the challenge of ICL design and optimization, performing detailed analysis of ICL layout requirements, styles, and topologies and proposes a method of rapidly determining optimal coil layouts for ICLs. We bring the work together as a CAD-tool for optimization of ICLs for 3-D-ICs (COIL-3-D) which is a publicly available software tool for integration with inductive link 3-D-IC design flows. The main novel contributions of this paper, therefore, include the following. - 1) Detailed modeling and analysis of ICL requirements considering typical ICL transceivers. - 2) Proposition of a graduated width–spacing inductor layout to improve ICL performance. - 3) A comprehensive scalable inductor model for simulating the performance of ICL layouts, in addition to mathematical expressions for determining the scalable model parameters that achieves an average accuracy within 7.8% of FEM tools while reducing computational overhead by 67000×. - 4) A refined optimization flow for determining optimized ICL geometries in 3-D-ICs that reduces the number of trial iterations by three orders of magnitude. This paper extends the work in [10] in the following three ways. B. J. Fletcher and T. Mak are with the Department of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, U.K. (e-mail: bjf1g13@ecs.soton.ac.uk; tmak@ecs.soton.ac.uk). - 1) Consideration of nonuniform inductor layouts (where track width and spacing vary from turn to turn). - 2) Replacing the simple semiempirical solver with a more comprehensive scalable circuit model. - 3) Providing a more in-depth analysis of the tool's functionality including a use-case example. We also provide a more detailed discussion of inductor topologies and their performance when used in ICLs. The optimization flow and analysis presented in this paper can be applied to ICLs for data or power transmission between tiers in 3-D-ICs; however, this paper will focus on data communication ICLs as this is their predominant use in 3-D-ICs. Power delivery using ICLs will be addressed in our future work. The remainder of this paper is organized as follows. The background and work related to ICL implementation and optimization are presented in Section II, and the modeling and analysis of ICL requirements are presented in Section III, which conclude with a formulation of the optimization problems which this paper sets out to solve. Following this, Section IV outlines our proposed analysis and optimization approaches, before rigorous evaluation of our proposed approach (Section V) and a usage example (Section VI). Finally, this paper is concluded in Section VII. ## II. BACKGROUND AND RELATED WORK To provide connectivity between stacked dies, research commonly looks to the use of TSVs; vertical electrical connections (passing entirely through each die) for high-bandwidth low-power communication between tiers. TSVs exist in many flavors, however, each one fundamentally relies upon the substrate of each die being etched, filled with metal, and aggressively polished [11]. These additional fabrication stages introduce mechanical strains within the IC, causing major concerns surrounding yield and reliability [12]. To address these concerns, more recent research [13]–[15] has looked to contactless 3-D-ICs which circumvent the additional fabrication costs and challenges associated with forming these vias. Contactless 3-D-ICs typically rely on the principle of EM coupling to communicate data vertically between planar inductors fabricated in the upper back end of line layers of each die [6]. The range of this coupling is approximately one-third of the inductor diameter [7], and as such, when considering coils with outer diameters in the order of hundreds of micrometers, face-to-back stacking is possible [16]. Due to the large size of inductors required for 3-D integration using contactless ICLs, they are often criticized for their poor area efficiency. ICL transceivers can be implemented with a handful of transistors, and therefore, the majority of the area overhead is the inductor layout. For this reason, achieving optimality in the utilized inductor layouts is of paramount importance and forms the motivation of this paper. Commonly, the objective when designing such a system is to maximize power efficiency given a specific area constraint. ## A. Electrical Modeling A range of electrical models exist for simulating on-chip spiral inductors, typically using a $\pi$ topology [17]–[19]. Fig. 1. Equivalent circuit model of an ICL channel [5] which assumes that each coil can be accurately modeled by its resistance $(R_i)$ , capacitance $(C_i)$ , and inductance $(L_i)$ . Fig. 2. Geometric parameters of a square planar coil (outer and interdimensions $d_o$ and $d_i$ , number of turns n, thickness $t_{cu}$ , trace width w, and spacing s). Lumped $\pi$ models have been demonstrated to exhibit high accuracy when modeling on-chip RF inductors; however, it has been suggested that when considering inductively coupled 3-D-ICs, a simpler resistance, inductance, mutual inductance, and capacitance model can be used [5]. This simpler model is shown in Fig. 1. This model is widely reported to exhibit sufficient accuracy for evaluating the performance of an ICL, and hence will also be adopted in this paper. The error introduced by adopting this model is assessed in Section V-C. #### B. ICL Simulation Existing works in this domain use a manual simulation flow for evaluating inductor layouts. This involves: defining an initial inductor layout, such as that presented in Fig. 2 with arbitrary parameters, importing this layout into a full-wave field solver for extraction of the system's S-parameters, manually extracting (using curve fitting) a SPICE model of the link and analyzing the overall performance using SPICE. The process can be repeated, adjusting the layout parameters slightly, until two adequate inductors are found. Full-wave simulation is typically performed using comprehensive FEM software packages such as CST Studio or ANSYS HFSS. These solvers provide high accuracy, however, often take hours to converge at a solution. In addition to this, the search for best performing inductors requires analysis of thousands of layouts, making the existing flow extremely time consuming. Other more rapid solvers include application-specific tools such as SPIRAL and ASITIC [20], developed for on-chip inductor analysis. These use electrostatic and magnetostatic approximations to provide much faster modeling, however, TABLE I PARAMETER NOTATION REFERENCE | LAYOUT PARAMETERS | | | | | | | |------------------------------------|----------------------------------------------------|--|--|--|--|--| | Parameter | Description | | | | | | | D | Coil outer side length | | | | | | | d | Coil inner length | | | | | | | w | Coil track width | | | | | | | s Coil track spacing | | | | | | | | g | Minimum technology grid unit | | | | | | | X Communication distance | | | | | | | | t Coil metal thickness | | | | | | | | φ Coil fill-factor | | | | | | | | $\ell$ | Length of single coil segment | | | | | | | $\chi (\chi_w \text{ or } \chi_s)$ | Graduation coefficient (turn-to-turn graduation in | | | | | | | $\chi (\chi_w \text{ or } \chi_s)$ | track width or spacing) | | | | | | | | ELECTRICAL PARAMETERS | | | | | | | Parameter | Description | | | | | | | L | Coil self-inductance (between terminals) | | | | | | | R | Coil resistance (between terminals) | | | | | | | C | Coil capacitance (between terminals) | | | | | | | Mutual inductance (between co. | | | | | | | | η Data delivery efficiency | | | | | | | | $f_{sr}$ | Coil self-resonant frequency | | | | | | | ρ | Coil metal resistivity | | | | | | | $R_L$ | Receiver load resistance | | | | | | | $\overline{f}$ | Frequency of ICL operation | | | | | | lack the ability to analyze mutual inductance between vertically stacked inductors, as required for contactless 3-D integration. Similarly, simplified mathematical models are used in [21], where a set of semiempirical expressions for deriving the power efficiency of an inductive wireless power transfer (WPT) link are presented. Jow et al. [21], however, focus on WPT for biomedical implants where larger inductors are required, and as a result, many of the approximations used do not hold true in 3-D-IC. The work of Hsu et al. [22] is the only related publication proposing an automated optimization flow for inductive-coupling data links. The authors use a greedy linear optimization algorithm that considers each coil in the link separately to reduce the time complexity of the approach. While this allows the algorithm to complete reasonably quickly, it means that the overall link efficiency may not be optimal. In addition, the authors do not provide a software tool, moreover a standalone set of expressions. This paper augments these previous works presenting: 1) rapid modeling and analysis for accurately evaluating inductor layouts and 2) a refined optimization flow to rapidly identify best performing ICL layouts. These two elements are combined in COIL-3-D. #### III. MODELING AND ANALYSIS OF ICLS Before presenting an ICL optimization flow, we must first classify the requirements of ICL inductors. To add clarity to the explanations presented in Sections III–VI, Table I summarizes the parameter notation adopted throughout the remainder of this paper. Each parameter is referred to using the notation from Table I, and where applicable the subscripts may be added in the following order: the coil number within Fig. 3. Representative baseline ICL transceiver implementation [5]. the stack (e.g., 1 refers to the bottom coil and 2 the one above it), the turn number, and the segment number (within the turn). As an example, $w_{i,j,k}$ refers to the width of segment k of turn j in coil number i. In order to derive equations modeling the performance of ICLs, it is first important to consider their circuit implementation. Fig. 3 shows a typical ICL transceiver circuit, representative of those used in prior works [5], [7], used as a baseline for the mathematical modeling presented in this section. Here, an H-bridge transmitter is implemented in the Tx die that operates as follows: as the data signal transitions from 0 V $\rightarrow$ VDD<sub>Tx</sub>, a short clockwise current pulse, of a duration determined by the pulsewidth delay element, will flow through the coil representing a rising data edge. Conversely, a current pulse of the same duration will flow counterclockwise through the coil when the data signal transitions from $VDD_{Tx} \rightarrow$ 0 V, representing a falling data edge. The current pulse will be length $\delta$ , corresponding to the delay buffer length. In the receiver, a sense-amplifier flip-flop arrangement is typically used to detect these current pulses [23]. Here, transistors M3 form a differential amplifying pair (as highlighted by the dashed box in Fig. 3) which, when the $T_{\text{sense}}$ signal is high, determine the phase of the received current pulse. The setreset NAND latch can then be used to recover the transmitted data stream. Assuming that a maximum silicon area constraint is defined, the optimization target is typically to minimize the power consumption of the system while communicating data at the operating frequency. The power consumption of the ICL transmitter, over a period of time T (assuming transmission of an equiprobable random binary stream) can be calculated by $$P = T \cdot f \int_{0}^{T} I_{\text{Tx}}(t)dt + P_{\text{circuit}}$$ (1) where $I_{Tx}$ is the transmit current (which flows through transistor M0, through the coil and to ground through M1, as shown in Fig. 3) and $P_{\text{circuit}}$ is the power consumed by the supporting transmitter circuitry. The first term constitutes the majority of the power consumption while the latter is negligible in comparison. The voltage induced in the Rx (secondary) coil is given by the following equation: $$V_{\rm Rx} = k\sqrt{L_{\rm Tx}L_{\rm Rx}} \cdot \frac{dI_{\rm Tx}}{dt} \tag{2}$$ where $I_{Tx}$ is the transmitted current, $L_{Tx}$ and $L_{Rx}$ are the inductances of the Tx and Rx coils, respectively, and k is their coupling coefficient. For a given receiver design, $V_{Rx}$ will be constrained by a minimum value (the smallest voltage at which the SAFF will correctly detect pulses), and therefore, the optimization target is to minimize $I_{Rx}$ for a given $V_{Rx}$ ; in other words, to maximize the function $V_{R_L}/V_{Tx}$ ( $\eta$ ). This is presented in the following footnote, 1 from the equivalent circuit model in Fig. 1. Using this expression for evaluating the fitness of an inductive channel (provided the electrical coil parameters are known) allows for much faster evaluation of designs than using SPICE, and hence, this numerical method is adopted in COIL-3-D. Provided $\eta$ is maximized, the transmit current $I_{TX}$ can be minimized by reducing the width of transistors M0 and M1 while still meeting the minimum SAFF sensitivity threshold.<sup>2</sup> This will have the effect of minimizing the system power consumption. Broadly, considering this expression, it can be observed that ICL data efficiency, $\eta$ , is optimized when k is maximized and the parasitic capacitance of each coil is minimized. #### A. Objective Function In addition to this optimization target, the full optimization problem formulation must also include a number a physical constraints. These are outlined as follows. The first constraint is that the inductors should be physically realizable without self-intersection. Mathematically, this imposes that $$D_i > 2 \left[ \sum_{j=1}^n 2(w_{i,j}) + \sum_{j=1}^{n-1} 2(s_{i,j}) \right].$$ (3) In addition, we must ensure that the self-resonant frequency, $f_{\rm sr}$ , of each inductor in the link is greater than the link's operating frequency. While full-wave modeling is a reasonably accurate method of determining the performance of a given layout when fabricated in a specific technology, due to process variations and physical factors (such as uneven etching, etc.) disparities will always exist between the simulated results and practical measurements of fabricated layouts. It is, therefore, sensible to include a marginal tolerance factor $k_t$ . We, therefore, add the constraint $$R_2 C_2 < (1 - k_t) 2\pi \sqrt{LC}$$ . (4) Bringing these details together, the optimization problem formulation can be expressed as $$\max \eta$$ s.t. $w_{i,j,k} > w_{min}, s_{i,j,k} > s_{min} \quad \forall i, j, k$ $$D_i > 2 \left[ \sum_{j=1}^{n} 2(w_{i,j}) + \sum_{j=1}^{n-1} 2(s_{i,j}) \right]$$ $$R_2C_2 < (1 - k_t)2\pi\sqrt{LC}$$ where $n_1, n_2 \in \mathbb{Z}+$ , and $$w_{1,j,k}, s_{1,j,k}, w_{2,j,k}, s_{2,j,k} \quad \forall j, k \in \mathbb{R}+.$$ ## B. Planar Spiral Inductors Having established the optimization target, consideration should be given to the specific physical inductor layouts that maximize these expressions. Due to the requirement of achieving EM coupling between the vertically stacked inductors, layouts for ICL inductors should clearly be monolithic, however, a plethora of monolithic inductor patterns and shapes exist. Sections III-B1 and III-B2 review a range of these topologies, evaluating the performance of each for use in ICLbased 3-D-ICs. - 1) Square, Octagon, and Circle Topologies: Three of the most common monolithic inductor shapes used in VLSI are square, octagon, and circle. As the axial length of circular spirals is much less than square spirals of the same area, it is widely reported that circular planar inductors offer higher Q-factors compared to their square counterparts (due to their reduced resistance). However, due to their efficient area usage, square inductors can offer higher inductance per unit area [24], [25]. As finding optimized inductor layouts for ICLs requires maximizing L and minimizing C for a specific area constraint, square inductors should theoretically, therefore, outperform their circular and octagonal counterparts. Based on this assumption, the COIL-3-D tool presented in Section IV will consider inductors of this shape, and empirical evidence to support this assumption is provided in Section V-B where we present results comparing square, octagonal, and circular inductors for data transmission ICLs. - 2) Uniform and Nonuniform Layouts: All reported previous works investigating inductive-coupling-based 3-D-ICs use inductor layouts where the width and spacing of each turn in the coil are uniform [5]-[8]. In this paper, we explore the possibility of enhancing the efficiency of ICLs using graduated (or nonuniform) width and spacing parameters. In nonuniform planar spiral inductors, the width and/or spacing between each turn of the coil are different to the last. When implementing the inductive-coupling channel, $R_i$ and $C_i$ are limited to sets of values which correspond to actual, physically realizable spirals. By varying the track width (which predominantly defines the resistance of the coil) and the track spacing (which $<sup>\</sup>begin{array}{l} ^{1}\eta = V_{R_{L}}/V_{\rm TX} = 1/(1+j\omega R_{2}\ C_{2})\cdot j\omega k\sqrt{L_{1}\ L_{2}}\cdot 1/(R_{L}(1-\omega^{2}\ L_{1}\ C_{1}) + R_{1}+j\omega(C_{1}\ R_{1}\ R_{L}+L_{1}))[5]. \\ ^{2}{\rm This\ process\ is\ elaborated\ on\ in\ Section\ VI. } \end{array}$ Fig. 4. Illustration of coil layout with graduated spacing ( $\chi_s=1$ and $\chi_w=0.4$ ). predominantly defines the capacitance of the coil) between turns, $R_i$ and $C_i$ can be much more finely tuned to maximize $\eta$ . To investigate nonuniform coil layouts, we introduce two linear graduation coefficients, $\chi_w$ (for width graduation) and $\chi_s$ (for spacing graduation). These graduation coefficients describe the linear scaling of track width and track spacing between each turn of the coil and are calculated by $$\chi_{wi} = \frac{w_{i,n} - w_{i,1}}{n} \text{ and } \chi_{si} = \frac{s_{i,(n-1)} - s_{i,1}}{n}.$$ (5) An illustration of a nonuniform coil and its graduation coefficients is shown in Fig. 4. The coil in Fig. 4 has parameters $\chi_s = 1$ and $\chi_w = 0.4$ . To meet the requirements outlined in Section III, it is necessary to maximize the inductance, L, of a coil while minimizing the parasitic capacitance, C, and resistance, R. When considering nonuniform inductors, the variation in both width and spacing between turns can be carefully exploited to meet these requirements. If we examine the simple spiral inductance equations presented by Mohan $et\ al.\ [24]$ (shown in the following equation), it can be observed that the inductance will increase as function of inductor outer diameter and inner diameter (provided that the other parameters remain constant): $$L = \frac{1.27\mu n^2 (D+d)}{4} \left[ \ln \left( \frac{2.07}{\phi} \right) + 0.18\phi + 0.13\phi^2 \right]$$ (6) where $\phi$ is the fill factor given by $\phi = (d_o - d_i)/(d_o + d_i)$ . To decrease the coil resistance and hence improve performance (cf. Section III), tracks should be made as wide as possible. To widen each of the coil turns while keeping the spacing (s) between them unchanged (such that the coil's self-capacitance remains unchanged) will decrease the diameter of the central "eye" of the coil, d. As established in (6), this will be detrimental to the coil's inductance. As the outer turns are longer than the inner turns, however, it is sensible to increase their width (and decrease the width of the inner turns) to reduce the resistance while maintaining constant $d_o$ and $d_i$ . Conversely, a similar technique can be applied to the coil spacing. As the influence of magnetically induced losses is much more significant within the inner turns of the spiral, it makes sense to increase the spacing toward the center of the inductor [26]. Again, empirical evidence to support these claims is provided in Section V-B where we present results illustrating the performance benefits that can be achieved by using nonuniform inductor layouts. #### IV. ICL LAYOUT OPTIMIZATION (COIL-3-D) Having established the optimization targets, in addition to the best performing inductor topologies, this section presents a COIL-3-D. COIL-3-D combines four components in order to quickly and accurately determine best performing inductor layouts for ICLs, in addition to generating associated electrical models for simulation. These four components are as follows. - 1) A comprehensive scalable inductor model for accurately approximating the performance of multiturn nonuniform inductors (Section IV-A). - A set of mathematical expressions for quickly and accurately determining the scalable model parameters (Section IV-B). - 3) A high-speed optimization flow for identifying the best performing layouts (Section IV-C). - An efficient software implementation of the abovementioned two elements, which integrates with existing CAD flows (Section IV-D). These four contributions are elaborated in the following. ## A. Scalable Inductor Model In order to quickly and accurately evaluate these parameters, we propose the use of a scalable inductor model (based upon that presented in [26]) where each turn is considered as a separate segment. The principle of superposition may then be applied in order to distill the model into its simplified lumped equivalent (shown in Fig. 1). Fig. 5 illustrates this concept more clearly. Here, we see two monolithic square spiral inductors stacked vertically, where every turn of each coil is a single segment which exhibits resistance and inductance while sharing capacitance and mutual inductance with other segments. Considering the inductor in this way facilitates more accurate evaluation than the use of single expressions and allows for accurate evaluation of the nonuniform inductors proposed in this paper. The use of this improved scalable inductor model in COIL-3-D is supplementary to [10]. # B. Parameter Evaluation In order to determine optimized coil layouts, it is important to establish a method of quickly and accurately evaluating the scalable model parameters for a given layout. Using the expression for $\eta$ in Section III, this evaluation can simply be performed with knowledge of each segment's inductance, L, and resistance, R, in addition to the capacitance, C, and mutual inductance, M, between segments. To allow for optimization in a reasonable time, we present a set of strictly solvable expressions for evaluating R, L, M, and C which are based upon empirical measurements. Sections IV-B1–IV-B4 outline expressions for deriving R, L, M, and C for each coil segment. 1) Coil Resistance (R): Other works propose a variety of methods for estimating the resistance of rectangular conductors; however, the most commonly used model is the resistivity equation incorporating high-frequency conduction Fig. 5. Illustration of segmented scalable spiral inductor model with two stacked inductors $n_1 = n_2 = 3$ . (a) Illustration of segmented coil concept. (b) Full equivalent circuit model. loss [21], [27]. While this provides a reasonable approximation when considering micrometer-scale coils (used in 3-D-IC), the yielded values are typically too low. This is due to the proximity effect; close interturn proximity drawing electrons to the edges of traces, hence increasing the apparent resistance by a factor $k_p$ known as the proximity factor. In a depth work, deriving differential equations for calculating $k_p$ is available [28]; however, these expressions are not strictly solvable, making evaluation in software very computationally expensive. As such, in COIL-3-D, values of $k_p$ (which varies as a function of s) are empirically predetermined and stored in a lookup table for use at runtime. Using these values, the resistance of each coil segment, $R_{i,j,k}$ , is determined by the following equation: $$R_{i,j,k} = k_p(s_{i,(j-1),k}) \frac{\ell}{2(w_{i,j,k} + t) \cdot \sqrt{\frac{\rho}{\pi f u}}}.$$ (7) This equation is derived from the fundamental expression for resistance $R = \rho \ell / A$ (where A is the conductors cross section, and $\ell$ is its length). Here, however, the term $(\rho / \pi f \mu)^{0.5}$ refers to the skin depth of the conductor at frequency f [29]. Taking this into consideration, A is replaced with the denominator that appears in (7). Using the principle of superposition, the total resistance of each coil is the linear (series) summation of each of these line segments. 2) Coil Self-Inductance: In addition to the resistance of each line segment, it is also necessary to calculate the self-inductance of each segment within the coil, $L_{i,j,k}$ . For calculating $L_{i,j,k}$ we use [30] $$L_{i,j,k} = (\gamma \mu_0)/(\pi w_{i,j,k}^2) \cdot [3w_{i,j,k}^2 \ell_{i,j,k} \ln((\ell_{i,j,k} + \sqrt{\ell_{i,j,k}^2 + w_{i,j,k}^2})/w_{i,j,k}) - (\ell_{i,j,k}^2 + w_{i,j,k}^2)^{3/2} + \ell_{i,j,k}^3 + w_{i,j,k}^3 + 3 w_{i,j,k} \ell_{i,j,k}^2 \cdot \ln((w_{i,j,k} + \sqrt{\ell_{i,j,k}^2 + w_{i,j,k}^2})/\ell_{i,j,k})].$$ where $\gamma$ is an empirically determined constant and $\ell$ is the length of an individual coil segment. Again, using the principle of superposition, the total coil inductance is the linear (series) summation of each of these line segments. 3) Coil Capacitance: For calculating the capacitance between segments, an expression based upon the fundamental capacitance between two long parallel conductors [31] is used, whereby C is proportional to the length of the conducting segment, and $\ell$ is divided by $\ln[(w+s)/w]$ . This leads to the following equation: $$C_{i,j,k} = k_c \frac{\pi \, \varepsilon_0 \varepsilon_r \ell_{i,j,k}}{\ln(4[(w_{i,j,k} + s_{i,j,k})/w_{i,j,k}])}. \tag{8}$$ In this case, to tailor the accuracy to the expression toward micrometer-scale coils (as used in inductive-coupling 3-D-ICs), an additional empirical correction factor, $k_c$ , has also been added to improve accuracy. As the number of spaces between segments is equal to n-1, the total capacitance is the linear (parallel) summation of each of these capacitances from i=1 until i=n-1 (as the capacitance across the center is negligible if $d_i$ is sufficiently large). 4) Mutual Inductance Between Coils: Finally, for calculating M, an expression can be derived from Maxwell's equation for the mutual inductance between two air-cored loops. If an assumption is made that the two communicating coils are perfectly vertically aligned, the mutual inductance between two loops over a distance D is given by [32] $$M_{a,b,X} = \frac{2\mu_0}{\alpha} \sqrt{ab} \left[ \left( 1 - \frac{\alpha^2}{2} \right) K(\alpha) - E(\alpha) \right]$$ (9) where a and b are the radii of the two loops and $\alpha = 2(ab/[(a+b)^2 + X^2])^{1/2}$ . Here, $K(\alpha)$ and $E(\alpha)$ are the complete elliptic integrals of the first and second kinds, respectively. As the structure of a planar spiral inductor is not a single loop, moreover a set of n concentric interconnected segments, the approximation is often made that the total mutual inductance is the cumulative summation of mutual inductance between each segment of the Tx coil and every segment of the Rx coil [32], as illustrated in Fig. 5(b), leading to the following equation: $$M_{\text{tot}} = g \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} M(a_i, b_j, X).$$ (10) Previous works [21], [27], [33] suggest the introduction of a correction factor here, g that takes the value $g \approx 1.1$ . Although practical validation found this model to be reasonably accurate for coils with fewer than 10 turns, when considering inductors with n > 10, the model accuracy deteriorates. This is because as n increases, the assumption of equal coupling between every turn of each coil introduces larger error. In COIL-3-D, this degradation in coupling is incorporated by a scaling factor, $r_{i,j}$ , corresponding to the Pythagorean distance between turns, normalized with respect to a pair in perfect vertical alignment, such that $$r_{i,j} = \frac{1}{X} \{ [(i-j) \cdot (w_{i,j} + s_{i,j})]^2 + X^2 \}^{1/2}.$$ (11) We, therefore, replace the single correction factor g as shown in the following equation, where $k_f$ is an empirical constant: $$M_{\text{tot}} = \sum_{i=1}^{n_1} \sum_{j=1}^{n_2} \left(\frac{1}{r_{i,j}}\right)^{k_f} M(a_i, b_j, X).$$ (12) ## C. Optimization Approach Having presented the optimization objectives of ICLs in addition to the methodology for evaluating a given inductor layout, an optimization flow for determining the best performing ICL layouts must be established. This will replace the manual evaluation and adjustment cycle adopted in the previous works. Applying exhaustive linear optimization to the problems outlined in Section III-A results in an extremely high time complexity, $O(n^8)$ . This refers to the fact that as the problem size n increases (in this case by making the grid resolution g smaller), the order of magnitude of the problem increases by a factor of $n^8$ . As an example, doubling the search space (by halving the technology grid unit) would increase the number of coil-pair permutations to evaluate by $256 \times$ . It has been noted that using the equations in Section IV-B, it is possible to prepredict the fill factor of the optimal coil with high accuracy. Therefore, in order to reduce the computational overhead of the search problem, we introduce an additional catalytic parameter, the fill factor, $\phi$ . Optimized inductor layouts typically have a fill factor around 0.4 [21] and, therefore, centering the search around a fixed fill factor avoids the extra computational overhead incurred while evaluating probabilistically nonoptimal designs, e.g., where $\phi = 0.9$ . By adding this constraint, the solution space can be refined, and the time complexity reduced to $O(n^6)$ . To speed up the algorithm, optimization is further divided into two discrete stages: Rx coil optimization and Tx coil optimization. From the ICL transfer equation (4), it can be observed that $\eta$ will be maximized when $L_2$ is maximized, provided that the time constant $R_2$ $C_2$ (discussed earlier) in ## Algorithm 1 Operation of the COIL-3-D Optimization Flow ``` : D_1,D_2,f,R_L,g,w_{min},s_{min},X Inputs Constraints: C_{i_{max}}, R_{i_{max}}, D_{max} : w_1,s_1,n_1,\chi_{w,1},\chi_{s,1},w_2,s_2,n_2,\chi_{w,2},\chi_{s,2} Outputs /* Determine Optimal Fill-Factors for \phi = 0; \phi < 1; \phi += g do \eta = Evaluate_{\eta}(); if \eta > \eta_{max} then \eta_{max} = \eta; \, \phi_{opt} = \phi; end /* Rx Coil Layout Optimisation for n_2 = 1; n_2 < 4wD_2/(1+\phi_{opt}); n_2++ do for w_2 = w_{min}; w_2 < D_2/2; w_2 = w_2 + g; do s_2 = D_2 \phi_{opt} / [n_2 (1 + \phi_{opt})] - w_2; if L_2(D_2, w_2, s_2, n) > L_{2_{max}} then if Meets Constraints then L_{2_{max}} = L_2(D_2, w_2, s_2, n); w_{2_{opt}} = w; s_{2opt} = s; n_{2opt} = n; D_{2max} = D_{max}; end end end end /* Tx Coil Layout Optimisation for D_1 = D_{max}; D_1 > 0; D_1-+g do for n_1 = I; n_1 < 4wD_1/(1+\phi_{opt}); n_1++ do for w_1 = w_{min}; w_1 < D_1/2; w_1 = w_1 + g do s_1 = D_1 \phi_{opt} / [n_1 (1 + \phi_{opt})] - w_1; \eta = Evaluate_{\eta}(); if \eta > \eta_{top} then | if Meets Constraints then \eta_{top} = \eta; \ w_{1_{opt}} = \overline{w}; s_{1_{opt}} = s; n_{1_{opt}} = n; D_{1_{max}} = D_1; end end end end /* Determine \chi_w and \chi_s for \chi_w = 0; n\chi_w + w_{i,1} > 0; \chi_w = g do for \chi_s = 0; n\chi_s + s_{i,1} > 0; \chi_s = g do \eta = Evaluate_{\eta}(); if \eta > \eta_{top} then if Meets Constraints then \eta_{top} = \eta; \, \chi_{w_{opt}} = \chi_w; \chi_{s_{opt}} = \chi_s; end end end ``` the denominator of the first term is constrained. Therefore, the Rx coil is initially optimized to provide the maximum $L_2$ within the imposed bandwidth constraints. The Tx coil is then optimized for $\eta$ which considers the mutual inductance between the pair. Dividing the flow in this way reduces the time complexity of the search to $O(n^2)$ , without compromising on accuracy. It was also found that the optimization of inductor uniformity could be considered separately to its geometric parameters. As such, to reduce the overhead of the approach, $\chi_s$ and $\chi_w$ can be determined after the key layout variables (D, n, w, and s). This can be performed, again without compromising on accuracy as the sweep of $\chi_s$ and $\chi_w$ acts as a fine-tuning stage (results supporting this assertion are presented in Section V-B2). end Fig. 6. (a) Existing manual flow for establishing inductor-pair layouts for ICL-based 3-D-ICs [34]. (b) Flow for establishing inductor-pair layouts for ICL-based 3-D-ICs using COIL-3-D. Combining these improvements, Algorithm 1 demonstrates the operation of the COIL-3-D optimizer. First, an optimal value of $\phi$ is found using the efficiency equation. $\phi_{\rm opt}$ is then used to refine the search space and, incorporating the simplifications outlined earlier, the COIL-3-D optimizer exhaustively searches all parameters to guarantee a globally optimal solution. The proposed approach is also summarized in a flowchart, Fig. 6(b), for comparison with the existing manual approach flow presented in Fig. 6(a). #### D. Software Implementation In order to speed up the COIL-3-D solver, dynamic programming (DP) is used where possible. As a single coil is formed from the series superposition of many single turns (each containing four segments), the solutions from previous layouts can be stored and reused. As the most computeintensive stage in the coil evaluation is calculating the elliptic integrals of $\alpha$ (consuming on average 34% of the entire runtime resource), reuse of a previous solution reduces the compute intensity of the algorithm, even when considering the lookup penalty incurred while locating the previous useful solutions. As an example, the mutual inductance between two coils $(n_1 = 4 = n_2 = 5, s_1 = s_2 = 1 \,\mu\text{m}, w_1 = w_2 =$ $2 \mu m$ , and $\chi_w$ and $\chi_s = 0$ ) can be expressed as shown in the following equation where the second term is the mutual inductance between two coils $(n_1 = n_2 = 4, s_1 = s_2 = 1 \,\mu\text{m},$ $w_1 = w_2 = 2 \,\mu\text{m}$ , and $\chi_w$ and $\chi_s = 0$ ) which have been calculated previously: $$M_{\text{tot}} = \sum_{i=1}^{n_1(=4)} \sum_{j=1}^{n_2(=5)} \left(\frac{1}{r_{i,j}}\right)^{k_f} M(a_i, b_j, X)$$ $$= \sum_{i=1}^{4} \sum_{j=1}^{4} \left(\frac{1}{r_{i,j}}\right)^{k_f} M(a_i, b_j, X)$$ $$+ \sum_{i=1}^{4} \left(\frac{1}{r_{i,j}}\right)^{k_f} M(a_i, b_5, X). \tag{13}$$ Fig. 7. Stackup used for experimental validation. (a) 3-D illustration of face-to-back stacking arrangement. (b) Single 65-nm die cross-section. This term can be reused, reducing the computational overhead by 80% in this case. ## V. EXPERIMENTAL RESULTS ## A. Experimental Setup In this section, the presented analysis and optimization approaches are validated against existing commercial tools. For all simulations, the stackup shown in Fig. 7 was used which is representative of two vertically stacked 65-nm CMOS dies. Here, we assume that the top die has undergone chemical–mechanical planarization to a thickness of $70~\mu m$ (in line with realistic fabrication capabilities) and that the dies are attached using adhesive. ANSYS HFSS (a FEM tool) was used as the evaluation benchmark for all tests. The values of each element in the lumped element distilled model were extracted from the system's S-parameters by means of manual parameter fitting. Using the aforementioned experimental setup, COIL-3-D was evaluated against comparable existing approaches and FEM results. Experiments were performed comparing: 1) the effectiveness of the proposed nonuniform square inductor topology (Section V-B); 2) the accuracy of the lumped equivalent model with respect to broadband fit models (Section V-C); 3) the accuracy of the semiempirical expressions presented for evaluating a particular coil layout (Section V-D); 4) the effectiveness of the COIL-3-D optimization algorithm Fig. 8. (a) Variation of $\eta$ with respect to n ( $D=200\,\mu\text{m}$ , $w=3\,\mu\text{m}$ , and $s=1\,\mu\text{m}$ ). (b) Variation of $\eta$ with respect to f ( $D=200\,\mu\text{m}$ , $w=3\,\mu\text{m}$ , and $s=1\,\mu\text{m}$ ). Fig. 9. Effects of turn-width graduation, $\chi_w$ , and turn-spacing graduation, $\chi_s$ , on efficiency ( $\eta$ ) when (a) $D=200~\mu \text{m}$ , n=4 and (b) $D=300~\mu \text{m}$ , n=7. (Section V-E); and 5) the runtime overheads of COIL-3-D compared with existing approaches (Section V-F). #### B. Inductor Topology Evaluation 1) Inductor Shape: In Section III-B, the analysis was presented suggesting that square inductors will outperform other inductor types (such as circular and hexagonal) when used in ICLs. To justify this assumption, a range of inductors were simulated using FEM with varying layouts at different simulation frequencies. A sample of these results is illustrated in Fig. 8. Fig. 8(a) illustrates the efficiency ( $\eta$ ) of square, octagonal, and circular inductors with the same outer area (200 $\mu$ m), track width (3 $\mu$ m), and spacing (1 $\mu$ m) as the number of turns, n varies. It can be observed that square layouts provide better performance, due to their enhanced area utilization efficiency. Fig. 8(b) shows similar results, however, across a range of frequencies. Again, the square topology provides the highest efficiency across all operation frequencies for a fixed area budget, upholding the earlier assumptions. 2) Inductor Uniformity: In addition to the inductor shape, in Section III-B, we also present the theory to support the use of nonuniform inductors. Fig. 9 presents a selection of our results supporting this analysis. Fig. 9(a) illustrates the transmission efficiency of an ICLs with parameters $D=200 \, \mu \text{m}$ , $n=4, w_1=3 \, \mu \text{m}$ , and $s_1=1 \, \mu \text{m}$ while varying $\chi_s$ and $\chi_w$ . In this case, we can observe that using a nonuniform layout can offer efficiency improvements of 3.1% when compared with uniform layouts. Fig. 9(b) illustrates similar results, this time for a completely different inductor geometry ( $D=300 \, \mu \text{m}$ , $n=7, w_1=4 \, \mu \text{m}$ , $s_1=0.5 \, \mu \text{m}$ ). Again, it can be observed Fig. 10. Effects of tuning turn-width graduation, $\chi_w$ , and turn-spacing graduation, $\chi_s$ to maximize $\eta$ . (a) $w = 5 \mu m$ , $D = 300 \mu m$ , $s = 3 \mu m$ , n = 7. (b) $w = 11 \mu m$ , $D = 200 \mu m$ , $s = 2 \mu m$ , n = 4. (c) $w = 5 \mu m$ , $D = 250 \mu m$ , $s = 2 \mu m$ , n = 3. (d) $w = 3 \mu m$ , $D = 150 \mu m$ , $s = 2 \mu m$ , n = 5. that decreasing $\chi_w$ and modifying $\chi_s$ to find the optimal track width and spacing graduation result in an efficiency improvement (5.17%) when compared with uniform layouts. These results uphold the theory that ICL efficiency can be improved by using nonuniform inductor layouts. To demonstrate this more broadly, simulations of the width and spacing graduation tuning process were performed for a range of coil topologies. Results from these experiments are presented in Fig 10. Here, four different coils are "tuned" from uniform layouts (where track width and spacing remain constant from turn to turn) to optimized nonuniform layouts. The title of each graph includes the uniform starting point, the x-axis represents, in 2-D, the magnitude of graduation performed $(\chi_w^2 + \chi_s^2)^{1/2}$ , and the y-axis shows the efficiency, $\eta$ . For these experiments, $\chi_w$ and $\chi_s$ were varied between -0.3 and 0.3 in intervals of 0.1, and the efficiencies of all permutations are evaluated using FEM. In Fig. 10, it can be observed that the tuning process offers performance improvements of the link in all four cases, with the maximum improvement achieved by using nonuniform inductors being 29.9% [in case Fig. 10(a)] and the mean average being 14.25% [across cases Fig. 10(a)–(c)]. ## C. Lumped Model Accuracy Evaluation In addition to validating the layout topology theory presented in Section III, the validity of the simplified lumped equivalent electrical ICL-channel model (presented in [23]) was examined. Fig. 11 shows a transient simulation of a system's performance while using both the simplified lumped equivalent model (with fitted R, L, M, and C parameters), and a broadband SPICE model extracted using ANSYS HFSS. It can be observed in Fig. 11 that the simulated amplitude of the pulses is similar to the simulated amplitude when using the Fig. 11. Transient simulation comparing the performance of broadband fit SPICE channel model (generated by ANSYS HFSS) and the simplified channel model (shown in Fig. 1) used in this paper. broadband SPICE model (with a marginal average error in the received pulse amplitude of around 15%), adequate for the purpose of optimization (provided that the resulting optimal design is thoroughly validated). It is important to note that this marginal error is present in all approaches using the model in Fig. 4 and is of comparable magnitude to the error tolerance of fabricated layouts. #### D. Empirical Expression Evaluation - 1) R, L, and C Extraction Accuracy: In order to validate the accuracy of the semiempirical parameter expressions proposed in Section IV-B, R, L, and C were evaluated across a range of coil sizes. Table II shows the extraction accuracy of the R, L, and C expressions for a range of seven randomly generated coils. The results also include the accuracy of approach [21] for comparison. Here, $\chi_s$ and $\chi_w$ are set to 0 to allow fair comparison with approach [21]. - a) Inductance extraction: It can be observed in Table II that the inductance extraction accuracy of the expressions presented in this paper is very high, exhibiting an average error of 2.5% across the generated inductor layouts. When compared to the approach outlined in [21], this represents an accuracy enhancement of 91% by using the expressions and scalable model presented in this paper. - b) Resistance extraction: Table II also illustrates that the resistance extraction accuracy of the expressions presented in this paper is very high, exhibiting an average error of only 4.3% across the examined inductor layouts. The expressions perform very well in most cases; however, the approach in [21] performs marginally better than the proposed scalable model approach for inductors that have a high axial equivalent length. These slight errors are unlikely, however, to significantly affect the optimization process, and COIL-3-D still outperforms the expressions presented in [21] by an overall average of 17.5%. - c) Capacitance extraction: The final rows in Table II document the capacitance extraction accuracy of each approach. An average error of 21.1% was exhibited while calculating the capacitance of each of the seven coils. While this may seem high, accurate capacitance evaluation is a challenging task. Our proposed method outperforms those in [21] by $3.7\times$ . - 2) Mutual Inductance Extraction Accuracy: Fig. 12 illustrates the mutual inductance extraction accuracy with respect to variation in n, using the semiempirical expressions proposed in Section IV-B4 (the accuracy of approaches in [21] and [33] Fig. 12. Mutual inductance extraction accuracy as n varies ( $D=300~\mu{\rm m}$ , $w=1.5~\mu{\rm m}$ , $s=1~\mu{\rm m}$ for both coils). has been included for comparison). Here, it can be observed that the proposed mutual inductance model improves upon existing approaches, particularly in cases where n>10 (as hypothesized earlier). On average, an error within 8.6 % of FEM approaches is achieved. When combining these parameters (R, L, C, and M) to evaluate efficiency, the combined average error was determined to be 7.8%. This represents a $1.17 \times \text{improvement}$ upon the expressions used in [10]. ## E. Optimization Flow Evaluation The effectiveness of the COIL-3-D optimization algorithm was also explored and compared with both random trial-anderror approaches and the optimization flow outlined in [22] (the only other existing work that proposes an optimization scheme for ICL layouts). To examine the effectiveness of each approach, a layout was sought for an ICL with a maximum area constraint of 200 µm assuming a grid resolution of $0.1 \,\mu m$ (these parameters were set to speed up simulations of each approach to an acceptable level). The same experimental setup outlined in Section V-A was used for evaluation, and the optimization target was defined as $\eta$ . The results are shown in Fig. 13. Here, it can be observed that the COIL-3-D optimization flow performs best out of the three examined optimization approaches, finding an optimal solution after just 1500 iterations. The trial-and-error approach did reach the optimal point, however, consumed approximately one billion iterations; six orders of magnitude slower than COIL-3-D. The approach in [22] terminated after approximately one million iterations at a suboptimal solution. This is likely due to the fact that mutual inductance is not considered in the optimization flow [22] to speed up optimization. Interestingly, although the number of iterations used by approach [22] was less than the refined optimization flow proposed here, execution in [22] took a similar time to complete as the elliptic integrals for calculating mutual inductance (which contribute a significant proportion of the compute resource) are not used. #### F. Overhead Evaluation Finally, the execution overheads of COIL-3-D were evaluated. Table III shows the average time taken to evaluate the efficiency of a single ICL using COIL-3-D, the approach in [21], and FEM. While COIL-3-D is not the fastest of the three approaches explored in this paper, it is approximately six orders of magnitude faster than FEM while maintaining a high average accuracy (within 7.8%). Table IV shows the TABLE II SEMIEMPIRICAL EXPRESSION ACCURACY EVALUATION OF COIL-3-D FOR PARAMETERS L, R, and C Compared With Existing Approaches | | | I | | II | | III | | IV | | V | | VI | | VII | | | |-----------------|----------------------------------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|-------------|--------------|----------------------| | | tenna<br>icture | | | | | | | | | | | | | | | | | | $D(\mu m)$ | 150 | | 250 | | 200 | | 250 | | 300 | | 150 | | 400 | | - | | neters | $w~(\mu m)$ | 3.0 | | 3.0 | | 5.0 | | 3.0 | | 4.0 | | 1.0 | | 4.0 | | - | | Parameters | $s~(\mu m)$ | 1.0 | | 1.0 | | 1.0 | | 0.5 | | 1.0 | | 1.0 | | 0.8 | | - | | | n | 5 | | 3 | | 3 | | 4 | | 5 | | 5 | | 4 | | | | | | Value | Error<br>(%) Average<br>Error (%) | | | L (nH) | 9.22 | 0.0 | 8.55 | 0.0 | 5.32 | 0.0 | 14.9 | 0.0 | 23.9 | 0.0 | 14.4 | 0.0 | 25.4 | 0.0 | 0 | | FEM | $\mathbf{R}(\Omega)$ | 16.8 | 0.0 | 18.54 | 0.0 | 8.82 | 0.0 | 23.7 | 0.0 | 24.31 | 0.0 | 52.9 | 0.0 | 26.7 | 0.0 | 0 | | | C (fF) | 34.2 | 0.0 | 38.0 | 0.0 | 45.6 | 0.0 | 68.2 | 0.0 | 112 | 0.0 | 76.6 | 0.0 | 86.8 | 0.0 | 0 | | 3D | L (nH) | 9.12 | 1.1 | 8.57 | 0.3 | 5.39 | 1.4 | 14.5 | 2.4 | 23.9 | 0.0 | 12.9 | 10 | 26.0 | 2.4 | 2.5 | | | | | | | | | | | | | | | | | | | | Ä | $\mathbf{R}(\Omega)$ | 16.8 | 0.1 | 18.6 | 2.80 | 8.48 | 4.0 | 24.2 | 2.0 | 26.5 | 10 | 53.9 | 2.0 | 29.2 | 9.4 | 4.3 | | COIL-3D | <b>R</b> (Ω)<br><b>C</b> ( <b>fF</b> ) | 16.8<br>45.1 | 0.1<br>32.0 | | 2.80<br>7.90 | 8.48<br>33.7 | 4.0<br>26.1 | | 2.0<br>2.78 | 26.5<br>98.7 | 10<br>11.9 | 53.9<br>38.8 | 2.0<br>49.3 | 29.2<br>102 | 9.4<br>17.6 | 4.3<br>21.1 | | _ | | | | 18.6 | | | | 24.2 | | | | | | | | | | Work [21] COIL. | C (fF) | 45.1 | 32.0 | 18.6<br>41.0 | 7.90 | 33.7 | 26.1 | 24.2<br>66.3 | 2.78 | 98.7 | 11.9 | 38.8 | 49.3 | 102 | 17.6 | 21.1 | Fig. 13. COIL-3-D optimization approach efficiency compared with random trial-and-error and approach [22]. TABLE III EXECUTION OVERHEADS OF THE PROPOSED ELECTRICAL PARAMETER EXPRESSIONS (WHEN EVALUATING $\eta$ ) COMPARED WITH EXISTING APPROACHES | Solver | Average Execution<br>Time (per geometry) | Normalised Average<br>Error (%) | | | |-----------------------------------------|------------------------------------------|---------------------------------|--|--| | FEM Solver [35] | 5,450 s | 0% | | | | Simplified Expressions [21] | 0.008 s | 22.3% | | | | Prior Implementation of<br>COIL-3D [10] | 0.015 s | 9.1% | | | | Improved COIL-3D | 0.081 s | 7.8% | | | estimated/simulated total optimization times for the various solver/optimizer combinations assuming a $0.1-\mu m$ grid and an area constraint of $300 \, \mu m$ . Here, it can be observed that the COIL-3-D tool arrives at optimized geometries faster $\label{total} \mbox{TABLE IV}$ Total Optimization Time Using COIL-3-D and Other Approaches | Approach | Predicted <sup>†</sup> /Actual<br>Execution Time | |---------------------------------------------------------------------------------|--------------------------------------------------| | FEM [35] + exhaustive linear search | 10 <sup>22</sup> Years † | | FEM Solver [35] with proposed refined search algorithm (proposed for COIL-3D) | 518 Years † | | Semi-empirical expressions (proposed for COIL-3D) with exhaustive linear search | 10 <sup>18</sup> Years † | | Iterative Optimisation Flow [21] | 124 Mins | | ThruChip Inductive Coupling Channel Design<br>Optimisation Flow [22] | 12.9 Mins | | COIL-3D (Semi-empirical solver with refined search algorithm) | 47.1 Mins | than each of the analyzed alternatives, with the exception of the approach in [22] which considers the two inductors independently, and hence suffers from reduced accuracy (as discussed in Section V-E). # VI. COIL-3-D EXAMPLE USAGE APPLICATION To demonstrate the applications of COIL-3-D, this section presents a use-case example based upon [36], where an ICL is designed for 3-D integration of digital CMOS and analog BiCMOS dies for use in an implantable neuromodulator. Presently, the work uses congruent uniform "off-the-shelf" inductor layouts with the parameters $D=200\,\mu\text{m}$ , n=5, $w=9\,\mu\text{m}$ , and $s=0.72\,\mu\text{m}$ . The design achieves a maximum bandwidth of 1.6 GHz and to meet the minimum pulse amplitude requirements in the receiver ( $V_{R_{L,2}} \geq 100\,\text{mV}$ ), requires transmission current pulses of 0.77mA in 0.11-ns durations. Fig. 14. Example end-to-end (specification to GDS-II and power/performance statistics) ICL design flow when using COIL-3-D. Fig. 15. Performance [(a) required transmit pulse power and (b) achievable bandwidth] of the inductor layout used in our previous work [36] compared with COIL-3-D optimized solution. This section provides a step-by-step overview of the optimized ICL design process (considering both the transceiver circuits and coil layouts) when using the COIL-3-D tool developed in Sections III and IV. Fig. 14 shows this full design flow from concept through to GDS-II. The first stage of the process is to define the link specification, including the maximum coil footprint, the technology stackup used in the process, and the minimum voltage pulse threshold that can be successfully detected by the receiving sense amplifier [Fig. 3(b)]. In this case, the maximum area was defined as $200 \, \mu \text{m} \times 200 \, \mu \text{m}$ (as per the benchmark work [36]), the receiver sensitivity was defined as $100 \, \text{mV}$ (again as per the benchmark work [36]), and the same technology stackup as that in [36] was adopted. Following this, the COIL-3-D tool was run (stage 2) and the best performing layout within the specified dimensions generated (including both a SPICE model of the link and a physical GDS-II file containing the inductor layout). At this point, the SPICE model can optionally be checked using FEM by importing the GDS-II layout for subsequent analysis. Next, in stage 3, the transmitter circuits were designed. As outlined in Section II, the transmitting current $I_{TX}$ is controlled predominantly by the widths of transistors M1 and M0 (in the ICL architecture in Fig. 3). The COIL-3-D tool generates a coil-pair layout that maximizes the efficiency of the inductive-coupling channel, $\eta$ . The transmitter circuits can, therefore, be designed by increasing the widths of M1 and M0 until the required receiver sensitivity is met, resulting in the full system design (stage 4). This design was simulated in SPICE (stage 5) using the circuit models in conjunction with the inductive-channel SPICE model generated by COIL-3-D in order to obtain power and performance statistics. Fig. 15 illustrates the performance improvements yielded by using the COIL-3-D optimized design flow for the use-case example based upon [36]. For the same scenario, the maximum achievable link bandwidth is improved by 41.5% and, while operating at a data rate of 1 Gb/s (as in [36]), the required transmit pulse power is reduced by 8.1% equating to a significant power reduction when operating at 1 Gb/s. #### VII. CONCLUSION In Section III, we presented a formulation of the inductor layout optimization problems for ICLs. Detailed analysis of multiple inductor topologies was performed concluding that square nonuniform layouts can provide the highest efficiency for ICL applications. In Section IV, we present the COIL-3-D software tool which consists of: 1) a scalable comprehensive inductor model; 2) a fast mathematical solver for determining model parameters; 3) a high-speed optimization flow; and 4) an efficient DP-based software implementation. Through a use-case example study, we demonstrate that using COIL-3-D to optimize ICL inductor layouts can yield significant performance benefits (41.5% bandwidth improvement and 8.1% power improvement for the presented example). In addition, we demonstrate that the evaluation expressions presented in this paper achieve an average accuracy within 7.8% of finiteelement tools while consuming a small fraction of the time $(1.5 \times 10^{-3})$ , unlocking new potential for power-efficient 3-D-IC design using ICLs. #### REFERENCES - P. Lindner et al., "Key enabling processes for more-than-Moore technologies," in Proc. IEEE Int. SOI Conf. (SOI), Oct. 2012, pp. 1–2. - [2] B. Swinnen *et al.*, "3D integration by Cu-Cu thermo-compression bonding of extremely thinned bulk-Si die containing 10 $\mu$ m pitch through-Si vias," in *IEDM Tech. Dig.*, 2006, pp. 1–4. - [3] T. Kuroda, "Wireless proximity communications for 3D system integration," in *Proc. IEEE Int. Workshop Radio-Freq. Integr. Technol.*, Dec. 2007, pp. 21–25. - [4] J. Park, J. Lee, B. Seol, and J. Kim, "Efficient calculation of inductive and capacitive coupling due to electrostatic discharge (ESD) using PEEC method," *IEEE Trans. Electromagn. Compat.*, vol. 57, no. 4, pp. 743–753, Aug. 2015. - [5] N. Miura, D. Mizoguchi, T. Sakurai, and T. Kuroda, "Analysis and design of inductive coupling and transceiver circuit for inductive interchip wireless superconnect," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 829–837, Apr. 2005. - [6] M. Ikebe et al., "An image sensor/processor 3D stacked module featuring ThruChip interfaces," in Proc. 22nd Asia South Pacific Design Automat. Conf. (ASP-DAC), Jan. 2017, pp. 7–8. - [7] D. Ditzel, T. Kuroda, and S. Lee, "Low-cost 3D chip stacking with ThruChip wireless connections," in *Proc. IEEE Hot Chips 26* Symp. (HCS), Aug. 2014, pp. 1–37. - [8] K. Niitsu et al., "An inductive-coupling link for 3D integration of a 90 nm CMOS processor and a 65 nm CMOS SRAM," in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2009, pp. 480–481 and 481a. - [9] J. M. Lopez-Villegas, J. Samitier, C. Cane, P. Losantos, and J. Bausells, "Improvement of the quality factor of RF integrated inductors by layout optimization," *IEEE Trans. Microw. Theory Techn.*, vol. 48, no. 1, pp. 76–83, Jan. 2000. - [10] B. J. Fletcher, S. Das, and T. Mak, "A high-speed design methodology for inductive coupling links in 3D-ICs," in *Proc. Design, Automat. Test Eur. Conf. Exhib.*, Mar. 2018, pp. 497–502. - [11] P. S. Andry et al., "Fabrication and characterization of robust throughsilicon vias for silicon-carrier applications," *IBM J. Res. Dev.*, vol. 52, no. 6, pp. 571–581, Nov. 2008. - [12] J. H. Lau, "TSV manufacturing yield and hidden costs for 3D IC integration," in *Proc. 60th Electron. Compon. Technol. Conf. (ECTC)*, Jun. 2010, pp. 1031–1042. - [13] K. Ueyoshi et al., "QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductivecoupling technology in 40nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 216–218. - [14] I. A. Papistas and V. F. Pavlidis, "Contactless heterogeneous 3-D ICs for smart sensing systems," *Integration*, vol. 62, pp. 329–340, Jun. 2018. - [15] I. A. Papistas and V. F. Pavlidis, "Efficient modeling of crosstalk noise on power distribution networks for contactless 3-D ICs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 8, pp. 2547–2558, Aug. 2018. - [16] W. R. Davis et al., "Demystifying 3D ICs: The pros and cons of going vertical," *IEEE Des. Test Comput.*, vol. 22, no. 6, pp. 498–510, Nov. 2005. - [17] C. P. Yue and S. S. Wong, "Physical modeling of spiral inductors on silicon," *IEEE Trans. Electron Devices*, vol. 47, no. 3, pp. 560–568, Mar. 2000. - [18] K. B. Ashby, I. A. Koullias, W. C. Finley, J. J. Bastek, and S. Moinian, "High Q inductors for wireless applications in a complementary silicon bipolar process," *IEEE J. Solid-State Circuits*, vol. 31, no. 1, pp. 4–9, Jan. 1996. - [19] N. M. Nguyen and R. G. Meyer, "Si IC-compatible inductors and LC passive filters," *IEEE J. Solid-State Circuits*, vol. 25, no. 4, pp. 1028–1031, Aug. 1990. - [20] A. M. Niknejad and R. G. Meyer, "Analysis and optimization of monolithic inductors and transformers for RF ICs," in *Proc. Custom Integr. Circuits Conf. (CICC)*, May 1997, pp. 375–378. - [21] U.-M. Jow and M. Ghovanloo, "Design and optimization of printed spiral coils for efficient transcutaneous inductive power transmission," *IEEE Trans. Biomed. Circuits Syst.*, vol. 1, no. 3, pp. 193–202, Sep. 2007. - [22] Li-C. Hsu, J. Kadomoto, S. Hasegawa, A. Kosuge, Y. Take, and T. Kuroda, "Analytical thruchip inductive coupling channel design optimization," in *Proc. 21st Asia South Pacific Design Automat.* Conf. (ASP-DAC), Jan. 2016, pp. 731–736. - [23] D. Mizoguchi, Y. B. Yusof, N. Miura, T. Sakura, and T. Kuroda, "A 1.2Gb/s/pin wireless superconnect based on inductive inter-chip signaling (IIS)," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 1, Feb. 2004, pp. 142–517. - [24] S. S. Mohan, M. D. M. Hershenson, S. P. Boyd, and T. H. Lee, "Simple accurate expressions for planar spiral inductances," *IEEE J. Solid-State Circuits*, vol. 34, no. 10, pp. 1419–1424, Oct. 1999. - [25] M. Bak, M. Dudek, and A. Dziedzic, "Chosen electrical and stability properties of surface and embedded planar PCB inductors," in *Proc. 31st Int. Spring Seminar Electron. Technol.*, May 2008, pp. 545–549. - [26] J. R. Long and M. A. Copeland, "The modeling, characterization, and design of monolithic inductors for silicon RF IC's," *IEEE J. Solid-State Circuits*, vol. 32, no. 3, pp. 357–369, Mar. 1997. - [27] M. Farran, M. Baù, D. Modotto, M. Ferrari, and V. Ferrari, "Design, simulation and testing of planar spiral coils for the time-gated interrogation of quartz resonator sensors," in *Proc. 28th Eur. Conf. Modelling Simulation*, 2014, pp. 147–152. - [28] G. S. Smith, "Proximity effect in systems of parallel conductors," J. Appl. Phys., vol. 43, no. 5, pp. 2196–2203, 1972. - [29] H. A. Wheeler, "Formulas for the skin effect," *Proc. IRE*, vol. 30, no. 9, pp. 412–424, Sep. 1942. - [30] Z. Piatek, B. Baron, T. Szczegielniak, D. Kusiak, and A. Pasierbek, "Self inductance of long conductor of rectangular cross section," *Przeglad Elektrotechniczny (Elect. Rev.)*, vol. 88, no. 8, pp. 323–326, 2012. - [31] Y. Y. Iossel, E. S. Kochanov, and M. G. Strunskly, "The calculation of electrical capacitance," Air Force Syst. Command Wright-Patterson AFB Foreign Technol. Division, Dayton, OH, USA, Tech. Rep. PTD-MT-21 269-70, 1971. - [32] M. F. Chang, V. P. Roychowdhury, L. Zhang, H. Shin, and Y. Qian, "RF/wireless interconnect for inter- and intra-chip communications," *Proc. IEEE*, vol. 89, no. 4, pp. 456–466, Apr. 2001. - [33] B. Noroozi and B. I. Morshed, "PSC optimization of 13.56-MHz resistive wireless analog passive sensors," *IEEE Trans. Microw. Theory Techn.*, vol. 65, no. 9, pp. 3548–3555, Sep. 2017. - [34] L.-C. Hsu, Y. Take, A. Kosuge, S. Hasegawa, J. Kadamoto, and T. Kuroda, "Design and analysis for ThruChip design for manufacturing (DFM)," in *Proc. 20th Asia South Pacific Design Automat. Conf.*, Jan. 2015, pp. 46–47. - [35] C. S. Technology. (2015). CST Microwave Studio. [Online]. Available: https://www.cst.com/products/cstmws - [36] B. J. Fletcher, S. Das, C.-S. Poon, and T. Mak, "Low-power 3D integration using inductive coupling links for neurotechnology applications," in *Proc. Design, Automat. Test Eur. Conf. Exhib.*, Mar. 2018, pp. 1211–1216. **Benjamin J. Fletcher** (S'18) received the B.Eng. degree (honors) in electronic engineering from the University of Southampton, Southampton, U.K., in 2016. He is currently working toward the Ph.D. degree at the ARM-ECS Research Center, Cambridge, U.K. His current research interests include 3-D integrated circuit design, energy-efficient embedded systems, and low-power VLSI. Mr. Fletcher was a recipient of the Postgraduate Prize Award from the Institute of Engineering Tech- nology for his research on low-cost 3-D integration approaches in 2018. Shidhartha Das (S'03–M'08) received the B.Tech. degree in electrical engineering from IIT Bombay, Mumbai, India, in 2002, and the M.S. and Ph.D. degrees in computer science and engineering from the University of Michigan, Ann Arbor, MI, USA, in 2005 and 2009, respectively. He is currently a Principal Engineer at the Research and Development Group, ARM Ltd., Cambridge, U.K. His current research interests include microarchitectural and circuit techniques for low-power and variability-tolerant digital IC design. Dr. Das serves on the Technical Program Committee for the European Solid-State Circuits Conference and the International Online Testing Symposium. Terrence Mak (S'05–M'09–SM'17) was a Visiting Scientist at the Massachusetts Institute of Technology, Cambridge, MA, USA, supported by the U.K. Royal Society. He is currently an Associate Professor at the University of Southampton, Southampton, U.K. He is also a Visiting Professor at the Chinese Academy of Sciences, Beijing, China. He has pioneered a spectrum of novel methods to regulate and engineer network-on-chip dynamics, which have enabled him to author or coauthor more than 30 journals, including the IEEE Transactions and ACM Transactions and more than 60 conference proceedings. His current research interests include cutting-edge computing systems and applications using novel architectures, algorithms, and technologies, including VLSI, 3-D-IC, many-core and field-programmable gate array systems. Mr. Mak was a recipient of multiple prestigious best paper awards from three major conferences, DATE11, VLSISoC14, and PDP15, for his newly proposed approaches using runtime optimization and adaptation strategies.