# Clocking Structures and Power Analysis for Nanomagnet-Based Logic Devices M.T. Niemier, X.S. Hu University of Notre Dame Dept. of Comp. Sci. & Eng. Notre Dame, IN 46556 mniemier, shu@nd.edu M. Alam, G. Bernstein, W. Porod University of Notre Dame Dept. of Elec. Eng. Notre Dame, IN 46556 malam1, gbernste, porod@nd.edu M. Putney, J. DeAngelis University of Notre Dame Dept. of Comp. Sci. & Eng. Notre Dame, IN 46556 mputney1, jdeange2@nd.edu #### **ABSTRACT** Logical devices made from nano-scale magnets have many potential advantages — systems should be non-volatile, dense, low power, radiation hard, and could have a natural interface to MRAM. Initial work includes experimental demonstrations of logic gates and wires and theoretical studies that consider their power dissipation. This paper looks at power dissipation too, but also considers the circuitry needed to drive a computation. Initial results are very encouraging and indicate that clocked magnetic logic could — in the worst case — match equivalent low power CMOS circuits and — in the best-case — potentially provide more than 2 orders of magnitude improvement when one considers energy per operation. #### 1. INTRODUCTION Magnetic logic based on coupled ferrite cores was originally pursued in the 1950s, but was eventually replaced by semiconductor chips. The lithographically-defined nanomagnets that form the basis of this work (i) do not possess the disadvantages of the early, bulky, ferrite core magnets, and (ii) can be arranged to form circuits within the quantum-dot cellular automata (QCA) architecture scheme [11]. For nanomagnet-based QCA (MQCA), wires, gates, and inverters made from nanomagnets have all been experimentally realized and verified [11], they operate at room temperature [3], and if $10^{10}$ of these nanomagnets switch $10^8$ times each second, the magnets would only dissipate about 0.1 W of power [6]. That said, more than just magnets are required for computation. A clock structure capable of generating an external magnetic field is also needed to drive the computation. "Clocking" removes remanent magnetizations from previous logical operations, and allows devices to be re-used to evaluate new input combinations. Essentially, a magnetic field is applied to a group of nanomagnets to polarize them along their hard axes. The field is then removed and the magnets relax to their new preferred state – ideally in response to some initial magnetization at the inputs. Information is transferred via magnetic dipoles. This paper considers one possible implementation of that clock, what field strength it could provide, whether or not that field is sufficient to drive a computation, and the power dissipated from the clock lines and magnets. We note that application spaces should be abundant as the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. *ISLPED'07*, August 27–29, 2007, Portland, Oregon, USA. Copyright 2007 ACM 978-1-59593-709-4/07/0008 ...\$5.00. devices should be low power and non-volatile, and any application that has these performance requirements might benefit. Patterned thin film nanomagnets are also similar in nature and compatible with the processing and fabrication of MRAM devices – which among other things suggests another way to implement a processing in memory architecture. We should also be able to capitalize on advances made in magnetic data technology to address input and output in MQCA systems. We will begin with a discussion of relevant background in Sec. 2. In Sec. 3 we will discuss simulations that show what field strengths can be provided by current carrying wires (the fundamental construct of our proposed clock). In Sec. 4 we will show via simulation how various logic elements might function in the presence of fields generated by the clock. In Sec. 5, we will consider the clock generation circuitry itself, perform an initial power analysis, and consider our results in the context of CMOS benchmarks. We conclude in Sec. 6. # 2. BACKGROUND ## 2.1 QCA Circuit Constructs The initial description of a QCA device called for a device with 2 or 4 "charge containers" (i.e. quantum dots) and 1 or 2 excess charges respectively. One configuration of charge represents a binary '1' and the other a binary '0' (Fig. 1a) [12]. Logical operations and data movement are accomplished via Coulomb (or nearest-neighbor) interactions. In a magnetic implementation of QCA, charge configurations are replaced with magnetic polarizations. Figs. 1b-e illustrate the building blocks that would be used to construct QCA circuits [13, 14]. A QCA wire (Fig. 1b) is just a line of QCA cells. The wire is driven at the input cell by a cell with a fixed/held polarization. The majority gate (Fig. 1c) implements the logic function AB + BC + AC. The output cell assumes the polarization of the majority of the 3 input cells [12]. By setting one input of a majority gate to a logic '0' or '1', the gate will execute an AND or OR function respectively. An inverter can also be easily built with QCA devices (Fig. 1d). QCA wires with different orientations (Fig. 1e) can theoretically cross in the plane without destroying the binary value on either wire. As one can see from the second column of Fig. 1, for MQCA, every theoretical construct has been physically realized (using permalloy nanomagnets) with the exception of crossovers. That said, unlike electrostatic QCA [5], crossovers in MQCA should be much easier to realize and several viable options exist. For example, we could leverage out-of-plane magnetization and the nanomagnets would not prefer one wire direction over the other. This can be done using Co-Pt multilayer structures [10]. We can also consider different magnet Figure 1: Schematics of the fundamental structures needed to build QCA circuits and their experimental state of the art: (a) basic devices, (b) wires, (c) logic gates, (d) inverters, and (e) crossovers. geometries. One configuration that appears promising (based on micromagnetic simulations) appears in Fig. 1e. Here, the middle magnet is capable of representing two bits of information – i.e. the inputs to the crossing – simultaneously. Only two examples are shown, but this configuration appears to function properly for all binary input combinations. As future work, magnet geometries and out-of-plane magnets, along with multi-domain magnets and electrical structures, will all be studied further. Finally, we also note that a majority gate could be configured to permanently function as an AND or OR gate by placing a horizontal magnet to the left or right of one of the inputs. # 2.2 Clock Structure The structures in Fig. 1b-d, were tested with a clock that took the form of a periodically oscillating external magnetic field that drove a system to an initial state, and then controlled the relaxation of the said system to a ground state [11]. The clock helps the system to overcome the energy barriers between metastable states and the ground state. Clocking can be performed by applying the magnetic field along the short axis of the dots as illustrated in Fig. 2 and discussed in [10]. Fig. 2a, illustrates a line (or "wire") of nanomagnets that has relaxed to a ground state and has moved a value from one end of the wire to the other. In Fig. 2b, the external field turns the magnetic moments of all magnets horizontally into a Figure 2: Operating scheme of a wire: (a) initial configuration, (b) high-field ("null") state, (c) after the application of the input, and the final ordered state. neutral logic state against the preferred magnetic anisotropy (i.e. along the hard axes of the magnets). This is an unstable state of the system, and when the field is removed, the nanomagnets relax into the anti-ferromagnetically ordered ground state as illustrated in Fig. 2c. If the first dot of the chain is influenced by an input device during relaxation, then its induced switching sets the state of the whole chain due to the bipolar coupling [10]. One may question how dataflow directionality in the nanomagnets is ultimately achieved. It is important to note that a clocked line of magnets (for example) will begin to switch in response to a held input even before the B-field is turned off. (This is the true whether or not the B-field is applied in the form of a square wave or a sine wave.) The magnet closest to the driver is biased (from its hard axis) toward a new polarization associated with the held input. This magnet in turn biases the next magnet in the line, etc. As the clock field is reduced to zero, the magnets in a line will have already switched to a new (and logically correct) value or are at least biased toward their correct value. When the field is removed, the magnets completely relax along their easy axes. It is worth noting that fabrication variations *could* cause different parts of a wire to switch simultaneously – and even from both directions. For example, a magnet that is misshapen could begin to switch earlier than it otherwise should and drive other magnets around it. Eventually, some part of a wire would become flustered – and not have perfect antiferromagnetic coupling. This end result has been observed experimentally as well as in a time evolving simulation. Fortunately, magnets can also experience lithographic variation and still function exactly as intended. # 3. CLOCK WIRES Although external magnetic-field clocking of micromagnet domains was used for magnetic bubble technology (as well as for the experiments discussed above), it is not ideal for MQCA due to (a) the relatively high clock speeds desired for circuit-level structures (i.e. 100 MHz to 1 GHz are sought) and (b) the desire for more local control. Instead, we are developing on-chip magnetic field clocking of individual circuit stages through current-carrying wires embedded underneath the nanomagnets. A recent precedent for on-chip magnetic switching exists in the domain of early MRAM development, in which individual memory bits were switched using current in the tens of milliAmps [1, 9]. We are considering a similar Figure 3: Cross section of the clocking structure. structure but instead, we propose a larger wire that defines each stage of a pipelined architecture and would control multiple nanomagnets. Here, we consider a wire that is $2 \mu m$ wide and would support paths/lines of about 20 nanomagnets [3] – experimentally demonstrated to be well within the range of reliable switching [10]. Challenges to this clocking approach revolve around switching the nanomagnets with the least amount of power dissipation in the clocking lines. To this end, we are exploring the use of copper wires wrapped by ferrite on the sides and bottom (i.e. yoked lines to concentrated the field as is done with MRAM bit and word lines). Nanomagnets would be aligned on the surface of the wires. According to the behavior of magnetic circuits, a magnetic field, H, wraps around the current-carrying wire, inducing a constant flux, $\phi$ , whose density, B, depends on the space through which it passes. In traversing a path around the clock wire, the flux threads both the magnetic materials (i.e. the ferrite and nanomagnets) and the intervening gaps (e.g. the oxide or air). Maximizing the fraction of the total distance that the flux travels in the high-permeability materials increases the field intensity, and therefore the flux density, in the remaining gaps. To begin to understand what field strength we might see at the surface (where the nanomagnets reside) we simulated a 2 $\mu m$ wide and 0.2 $\mu m$ thick ferrite yoked copper wire with an applied current density of $10^6$ A/cm<sup>2</sup> using the electromagnetic field simulator Maxwell 2D [2]. Fig. 3 illustrates our simulation configuration. The simulation was repeated with and without nanomagnets as the nanomagnets will have an affect on the B-field as will be discussed. The key result from the simulations discussed above is illustrated in Fig. 4 which plots $B_x$ along the surface of the copper wire. As one can see, a field of about 3-to-5 mT should be expected at the edge of the wire. Note that these numbers are in-line with the fields produced by the word and bit lines in an MRAM array – where fields in the 10s of Oe are used [1, 9]. It is also worth noting that the magnitude of the B-field increases linearly as a function of current density (at least within the range of $10^5 \text{ A/cm}^2$ and $10^7 \text{ A/cm}^2$ ). These results have been verified by simulation with Maxwell. Finally, we also considered our simulation without sidewall yokes. Lower fields are obtained indicating that sidewall yokes are beneficial – as the ferrite yoke confines the magnetic flux generated by current flowing in the copper wire (although the resulting fields are still within the same order of magnitude). ### 4. NANOMAGNET SIMULATIONS More detailed studies will obviously be necessary, but the work discussed above provides an excellent starting point for considering whether or not the B-fields produced by the clock Figure 4: Variation of the magnitude of the flux densities $(B_x)$ , along the top surface of the copper wire. $B_x$ is reported for a copper wire without sidewall yokes (an oxide is used), a copper wire with a full yoke and no nanomagnets, and a copper wire with a full yoke and nanomagnets. $B_x$ peaks occur inside the nanomagnets and valleys occur in the free space between them $(\mathbf{J} = 10^6 \ \mathrm{A/cm^2})$ . wires are sufficient to facilitate logical operations beyond the gate level. We will focus our efforts on fields that appear to be obtainable with $J \leq 10^6 \ {\rm A/cm^2}$ (i.e. around 5 mT). # 4.1 Simulation Mechanisms To study logic in this context, we leverage the Object Oriented Micromagnetic Framework (OOMMF). This simulation suite is a numerical Landau-Lifshitz equation integrator developed by the National Institute of Science and Technology (NIST) [8]. It allows us to physically represent the magnetic response of applied external fields to a configuration of permalloy nanomagnets using a quasi-classical approximation of exchange energy interactions. There is precedence for this approach and some initial work is reported in [3, 10]. Still, we would like to take into account even more realistic circuit and system-level detail in our simulations. For example, in the majority gate and line experiments discussed in [11], the inputs are represented by horizontally-oriented magnets (see Fig. 1c) – and the magnet's input value is actually determined by the position of the input magnet. Referring to Fig. 1c, notice that each input of the majority gate sees a different value (based on position) even though these inputs magnets have the same polarization. To affect line switching, we would want to have more local control – ideally with an input device that produces a local magnetic field and only influences the first nano-magnet in the chain. This could also be realized by a current carrying wire or by the last magnet of a clock wire group. We also note that we could relatively easily show that the logic elements discussed in Sec. 2.1 function correctly using low fields when all of the magnets begin unpolarized (or polarized along their hard axes in the direction of the clocking field | 1 | Magnet | Magnet | Spacing | B-Field | |---|--------|--------|---------|---------| | | (nm) | (nm) | (nm) | (mT) | | | 56 | 76 | 14 | 5 | | | 50 | 70 | 12.5 | 3 | | | 60 | 86 | 10 | 3 | Table 1: Example simulation results. to be applied). However, this would be overly optimistic. Ultimately, a clock field will have to *remove* any remanent magnetization from the previous computation before new values can be driven into a circuit – a harder problem. We have leveraged the OOMMF simulation framework to take into account these constructs. A detailed discussion of the exact simulation methodology is beyond the scope of this paper, but we do briefly discuss an example. In OOMMF, a line of magnets was driven to a logically correct, ground state (analogous to Fig. 2a). The polarization of the first magnet in the line was then flipped to represent a new input and a 3 mT external field was then applied to the line (as shown in Fig. 2b). The rest of the wire switched in accordance with a new input, and assumed a new, logically correct state (analogous to Fig. 2c). The magnets switch in the proper order. ## 4.2 Simulation Results Table 1 presents the results of several simulation sets whose purpose was to determine whether or not nanomagnets of reasonable size could function properly (i.e. logically) given the field strengths discussed in Sec. 3<sup>1</sup>. Representative configurations are reported in Table 1 and function correctly – i.e. we were able to remove a remanent magnetization and drive a new value on the wire. Our study was not meant to be exhaustive – the details of such a study are well beyond the scope of this paper. (For example, work with MRAM [1] shows that even a minor change to the aspect ratio of the memory bit can dramatically affect the field strength needed to set it.) That said, we have included a brief discussion to illustrate that the field strengths discussed above should be able to facilitate dataflow. Moreover, simulations show that, given lines of 10-15 nanomagnets, a B-field can remove a remanent magnetization associated with a previous computation and drive the line to a new polarization associated with a new input in just a few nanoseconds. (Thus, the line begins in an initial polarization, the magnets are polarized along their hard axes, and they take on a new, anti-ferromagnetic coupling associated with a new input.) These results are close to the theoretical minimum switching times projected in other studies. They also hold whether or not the applied B-field is square or sinusoidal in nature. #### 5. POWER ANALYSIS Power consumption is a critical design consideration especially as devices are being continuously scaled down. We analyze the power consumed by our proposed MQCA-based system. We focus on the average power consumption as it is directly related to the energy required by the system. There are three major power consumers in an MQCA-based system: (i) the nanomagnets used to carry out logic functions, (ii) the clocking wires that provide magnetic fields, and (iii) the miscellaneous CMOS circuitry for handling clock generation and input/output control. In the following two subsections, we will discuss power consumed by nanomagnets and clocking wires. We will also compare the power consumption by an MQCA circuit to that of a conventional CMOS circuit. For the third component, we designed a simple CMOS-based circuit (with 3 MOSFETs) to convert a 0-5V voltage clock signal to a 0-4 mA (square wave) clock current source and used this circuit for simulation purposes. Though this circuit consumed a significant amount of power compared to the nanomagnets, we believe that alternative clock generation circuits can be designed that will consume very low power [4, 16] and are working on this. For this reason, we are not going to discuss the power consumption due to the third component. However, we will discuss power consumed by nanomagnets and clocking wires. We will also compare the power consumption by an MQCA circuit to that of a conventional CMOS circuit. # **5.1** Power Components ## 5.1.1 Magnets In a recent paper [6], the power dissipated during one nanomagnet switching event was analyzed based on the micromagnetic Landau-Lifshitz equations. For abrupt switching, i.e. switching due to an external field applied along the easy axis, the energy dissipation is on the order of the energy barrier between the "up" and "down" states. Since this barrier is appreciable (several hundred room-temperature kT), abrupt switching results in appreciable power dissipation. On the other hand, switching can also be accomplished in an adiabatic fashion with the help of a clocking field as described in this paper. The role of the clocking field is to reduce the energy barrier between the "up" and "down" states. The dot is switched at a point during the clock cycle when the barriers are low, resulting in significantly less power dissipation. Simulations show that, with this adiabatic clocking scheme, the power dissipation per switching event is on the order of ten room-temperature kT. This translates to an estimated power dissipation of 0.1 W for an array of 10<sup>10</sup> nanomagnets adiabatically switched 10<sup>8</sup> times per second. #### 5.1.2 Clock Lines To estimate the power consumption by the clocking wires, let us examine the clocking wire structure shown in Fig. 4. For each wire, the power consumption can be attributed to the inherent wire resistance, the parasitic capacitance to its neighboring wires, and the parasitic capacitance between the wire and the substrate. Since the ferrite layer between the wire layer and the substrate is much thicker than that between the wires, we can safely ignore the capacitance between the wire and the substrate. In order to provide the desired magnetic field distribution, each wire must be able to carry certain amounts of current that changes in time. Therefore, we model this current as a controlled current source. The lumped-circuit model of one clocking wire is shown in Fig. $5a^2$ . The power consumed by the clocking wire modeled in Fig. 5a $<sup>\</sup>overline{\ }^{1}$ The initial line experiments referenced above considered magnets on the order of $70\times135$ nm with 25 nm separation. We chose slightly smaller devices as local EBL capabilities have improved. $<sup>^2</sup>$ The frequency of the current clock source is 100 MHz, which corresponds to a wavelength of 300 cm . The length of the clocking wire is about 4 $\mu \rm m$ , which is much shorter than the wavelength of interest and thus we can use lump circuit model. Figure 5: (a) Lumped circuit model of one clock wire. (b) Majority gate-based full adder. depends on the actual current waveforms as well as resistance and capacitance values. We considered two different waveforms in our study: square waves and sine waves. The former can be easily generated from conventional CMOS clock generation circuitry and our clock current source while methods for generating the latter can be derived from research papers on adiabatic clock generation [4, 16]. Deriving the power consumption by the circuit in Fig. 5a for both waveforms can be readily done and the details are omitted. The average power consumption for a square-wave current with magnitude of I and period of T (assuming 50% duty cycle) is $$P = \frac{I^2 R}{2} [1 - \frac{RC}{T} (1 - e^{-T/2RC})], \tag{1}$$ and for a sine-wave with the same magnitude and period is $$P = \frac{I^2 R}{2} \cos \theta \tag{2}$$ where $\theta$ is the phase difference between the voltage across the resistor and the current from the current source, and is a function of the time constant RC. Note that the average power for both the above current wave forms are upper bounded by $I^2R/2$ , the power consumption assuming no capacitance. Table 2 summarizes the power consumption for the clocking wire dimensions assumed in the work discussed in Sec. 3. Note that for the R and C values given, the difference in average power between the sine and square waves are negligible and hence only one number is reported. We have used Q3D Extractor (a 3D/2D quasi-static electromagnetic-field simulation for parasitic extraction of electronic components) to extract RLC equivalent circuit of a sample clocking wire. This equivalent circuit has been used in HSPICE, along with a clock current source of frequency 100 MHz, to calculate power consumption. Considering the simplicity of our analytical model and the differences in the material parameters used in the analytical calculation and in simulation, the number obtained from the analytical calculation is a good estimate to the numbers obtained from simulation. # 5.2 A Case Study We will now use the above information to compare MQCA circuits and systems to CMOS circuits with equivalent functionality. We consider a 32-bit ripple carry adder (RCA). A one bit full adder can be constructed with three majority gates (M1-M3) (see Fig. 5b). This structure could be cascaded to form a larger RCA of arbitrary size. We will leverage this schematic to estimate the energy required to perform a 32-bit addition operation given an MQCA-based RCA. We begin by considering loss from the clock lines. More specifically, we need to determine how many clock lines are needed to accommodate the critical path of a 32-bit RCA. Referring to Fig. 5b, this is a function of the number of mag- | Parameter | Symbol | Value | Units | |------------------------|-----------------------------------|-------------------------|-------------| | Current density | J | $10^{6}$ | $A/cm^2$ | | Wire width | w | $2 \times 10^{-4}$ | cm | | Wire length | l | $4 \times 10^{-4}$ | cm | | Wire thickness | t | $2 \times 10^{-5}$ | cm | | Wire separation | d | $2 \times 10^{-6}$ | cm | | Resistivity $_{Cu}$ | ρ | $1.8 \times 10^{-6}$ | $\Omega cm$ | | Per. of Free Space | $\epsilon_0$ | $8.854 \times 10^{-14}$ | F/cm | | $Dielectric_{Ferrite}$ | $\epsilon_d$ | 20 | None | | Resistance | R = | 0.18 | Ω | | | $(\rho l)/(wt)$ | | | | Current | I = Jwt | 4 | mA | | Capacitance | C = | 7.79 | fF | | | $(\epsilon_0 \epsilon_{Fe} lt)/d$ | | | | Average power | Р | $1.44 \times 10^{-6}$ | Watts | Table 2: Design parameters and power consumption of a single clocking wire. nets per bit slice in the x-direction (MPBS<sub>x</sub>), the width of a nanomagnet ( $M_w$ ), and the spacing between the nanomagnets ( $M_{sx}$ ). Eq. 3 summarizes this relationship for a 32-bit RCA. It provides the number of clock wires along the critical path through the nanomagnets. $$\frac{32 \times MPBS_x \times (M_w + M_{sx})}{w} \tag{3}$$ That said, the clock wires that we have considered are 4 $\mu m$ long – and it is possible that more than one adder could fit in this area. We can estimate the number of nanomagnet adders that could be stacked in this dimension by considering the number of magnets per bit slice in the y-dimension (MPBS<sub>y</sub>), the height of a magnet (M<sub>h</sub>), and the spacing between nanomagnets in the y-dimension (M<sub>sy</sub>)<sup>3</sup>. Thus, the number of adds per 4 $\mu m$ wire is given by Eq. 4: $$\frac{l}{MPBS_y \times ((AR \times M_x) + (1.5M_{sx}))} \tag{4}$$ To estimate the power dissipation from the clock lines required to facilitate one 32-bit add, we multiply the results of Eq. 3 and Eq. 4 by 2P as we want to calculate the energy for a single add and every wire must switch once. To calculate energy, we would simply divide this result by the operating frequency f. We also consider the nanomagnets. From the discussion above, it is fairly easy to calculate that there is an energy loss of about $10^{-19}$ J/magnet. We can easily estimate this energy loss in the context of our adder by calculating the maximum number of magnets that could cover the area associated with the wire group. We divide this result by 2 as a tiled array of magnets would not provide any functionality! We now return to the adder schematic illustrated in Fig. 5b. What such a design might look like in the context of magnets is illustrated in Fig. 6. Ultimately, careful studies and simulations that account for selective inversion, ensure the lack of crosstalk, fan-out and fan-in, etc. will be required. However, for the time being we simply leverage this "schematic" to establish estimates of $MPBS_x$ and $MPBS_y$ . By examining Fig. 6, $MPBS_x$ is approximately 15 and $MPBS_y$ is approximately 10. For our calculations, we assume slightly higher <sup>&</sup>lt;sup>3</sup>To simplify the number of variables, we will express $M_h$ as a function of $M_w$ , and $M_{sy}$ as a function of $M_{sx}$ . Specifically, $M_h = M_w \times AR$ (where AR is the aspect ratio of the magnet) and $M_{sy} = M_{sx} \times 1.5$ [11]. Figure 6: Magnet Adder. Figure 7: pJ/32-bit MQCA add as a function of J. values of 20 and 15 for $MPBS_x$ and $MPBS_y$ respectively to attempt to account for some of the overhead that will surely be required. We assume a value of 60 nm for $M_x$ , a spacing between magnets of 15 nm $(M_{sx})$ , and an AR of 1.45. Given the above parameters, we estimate the energy due to the clock wires and magnets as a function of current density and switching frequency. Results are summarized in Fig. 7. Energy loss from the magnets and clock – as well as the total energy loss – is reported. With a switching frequency of $10^8$ , the total energy loss for a 32-bit add ranges from 0.004 – 0.285 pJ. If the switching frequency can be increased (which may be possible based on the simulation results discussed earlier), the energy per operation could be reduced even further. (However, as the switching frequency increases from $10^8$ , the energy loss from the magnets could also increase [7] and this needs more study. Still, especially at the higher current densities, an order of magnitude improvement over the energy per add reported in Fig. 7 appears feasible.) We now compare with CMOS. Vratonjic, et. al. predict an average energy consumption of 1.1 pJ for a low power 32-bit RCA (130 nm CMOS with a 1.2 V supply and a 2.1 ns delay) [15]. If $V_{dd}$ can be scaled to 0.6 V for ultra-low power operation, there would be an energy loss of approximately 0.25 pJ and a delay of 8.5 ns. These energy loss numbers are comparable with the *worst* numbers discussed above and an improvement of 2-3 orders of magnitude could be possible. Finally, we briefly comment on latency and throughput. The latency of a single 32-bit adder based on MQCA will realistically be greater than that of a CMOS design. However, the throughput of a MQCA adder will be quite good— as we could get a new result every time the last clock wire switches. Additionally, when considering the ultra-low power CMOS adder, latency does increase. That said, even with higher latencies, numerous applications should benefit from low-power/high-throughput operations (i.e. DSPs). Alternatively, lower power budgets can often tolerate higher latencies. ## 6. CONCLUSIONS We have conducted an initial analysis of a clock structure that would control MQCA-based systems. We have determined initial estimates for the B-field that a current carrying wire could provide. We have considered dataflow in the nanomagnets with that field, and we have calculated an upper bound for the power that would be consumed by our clock wires. Using this information, we have estimated the energy required for a 32-bit RCA. Initial estimates show that even the pessimistic case compares well with low power CMOS – and gains of several orders of magnitude could be possible. Future work will consider each component of our initial estimates in more detail – i.e. looking at specific materials for magnets that might function at lower fields (reducing current density and power consumption), the transmission of logical values between clock wires (initial results show that magnets controlled by one wire should be able to transfer data from one clocked region to the next), etc. Studies are already ongoing. #### 7. REFERENCES - [1] Conversations with Joe Nahas (Freescale). - [2] http://www.ansoft.com/products/em/max2d/. - [3] G. Bernstein, A. Imre, V. Metlushko, A. Orlov, L. Zhou, G. C. L. Ji, and W. Porod. Magnetic QCA systems. *Microelectronics Journal*, 36:619–624, 2005. - [4] A. Chandrakasan and B. Brodersen. Low Power Digital CMOS Design. Kluwer Academic Publishers, 1996. - [5] A. Chaudhary et al. Eliminating Wire Crossings for Molecular Quantum-dot Cellular Automata Implementation. In In Proc. of ICCAD, pages 565–571, 2005. - [6] G. Csaba, P. Lugli, A. Csurgay, and W. Porod. Simulation of Power Gain and Dissipation in Field-Coupled Nanomagnet. J. of Comp. Electronics, 4(1/2):105-110, 2005. - [7] G. Csaba, P. Lugli, and W. Porod. Power Dissipation in Nanomagnetic Logic Devices. Proc. of 4th IEEE Conf. on Nano., pages 346–8, 2004. - [8] M. Donahue and D. Porter. Oommf User's Guide, Version 1.0, Interagency Report NISTIR 6367. http://math.nist.gov/oommf. - [9] W. Gallagher and S. Parkin. Development of the Magnetic Tunnel Junction MRAM at IBM: From First Junctions to a 16-mb MRA Demonstrator Chip. IBM J. of Res. and Dev., 50(1):5–23, Jan. 2006. - [10] A. Imre. Experimental Study of Nanomagnets for Magnetic Quantum-dot Cellular Automata (MQCA) Logic Applications. *Disseration*, U. of Notre Dame, April 2005. - [11] A. Imre et al. Majority Logic Gate for Magnetic Quantum-Dot Cellular Automata. Science, 311, No. 5758:205–8, Jan. 13, 2006. - [12] C. Lent and P. Tougaw. A Device Architecture for Computing with Quantum Dots. Proc. IEEE, 85:541, 1997. - [13] M. Niemier and P. Kogge. Exploring & Exploiting Wire-Level Pipelining in Emerging Technologies. In Proc. of Int. Sym. of Comp. Arch., pages 166–177, 2001. - [14] P. Tougaw and C. Lent. Logical Devices Implemented Using Quantum Cellular Automata. J. App. Phys., 75:1818, 1994. - [15] M. Vratonjic et al. Low-and Ultra Low-Power Arithmetic Units: Design and Comparison. In Proc. of ICCD, 2005. - [16] C. Ziesler, S. Kim, and M. Papaefthymiou. A Resonant Clock Generator for Single-Phase Adiabatic Systems. In Proceedings of ISLPED, 2001.