Analysis of integrated circuits thermal dynamics with point heating time
Introduction
The current trend in the modern multi-purpose processor design leads to the development of multi-core universal processing units [1]. The design principle is to place many functionally identical, independent processing units on one die. This may help sustain the Moore Law principle, because nowadays there is no visible roadmap for implementing more “frequency powered” processors [2], [3]. All of this has led to the production of, e.g., Intel Core i7 [4] processor (Fig. 1), future (in laboratory tests) designs like 80-core Intel Polaris [5] (Fig. 2) and IBM/Toshiba CELL processor [6] (Fig. 3). Despite the complicated internal structure, designers can determine power consumption for a whole chip surface. Unfortunately in this case, when the chip consists of many independent modules on a rather substantially large chip, the thermal aspects still are difficult to keep on acceptable levels [25], [26], [27]. These aspects are connected with the processor’s work in order to achieve the maximum performance rating near a thermal boundary of the integrated circuit for a given technology and simultaneously the maximum temperature does not exceed the critical thermal level. The thermal design and its influence on the chip do not currently take into account the dynamic aspect of thermal simulations.
An additional factor that has increasingly significant impact on the computation capabilities of integrated circuits is their thermal restrictions. The actual design of integrated circuits needs an effective heat dissipation method. Because of that, designers include even more complicated active thermal cooling techniques. For the most popular CMOS process, the thermal boundary is set at 125 °C [7]. The active cooling mechanism reacts to temperature changes considerably later just after the time of the cause. A steep temperature rise on the surface of the chip (generated by e.g. increased activity of the computation module) is not likely to be neutralized by passive and active cooling. This is the main reason for decreasing the working temperature for all current processor designs to acceptable levels by a cooling mechanism. The main goal for this paper is to present a new method to analyze the dynamic thermal behaviour of the multi-core integrated circuit.
In the published paper [8], partitioning dissipation of power into dynamic and static power losses emphasizes the rising impact of the latter’s value on total power. Nonetheless, in [9] the authors presented an analysis showing that in current evolution in microelectronics research, dynamic power will still have considerable influence on the total power loss in an integrated circuit. It is a strong justification to do research on minimizing dynamic power loss and its influence on the thermal working boundaries for integrated circuit design. Many researches were made in order to minimize total power dissipation. The most popular techniques include dynamic voltage scaling [10], [11], [12], [13], [14], [15] and dynamic frequency scaling [16], [17], [18], [19]. Dynamic power loss [20] is described aswhere f is the chip working frequency; CL is load capacity; and VDD is supplied power.
Based on thermal analysis, the authors proposed a novel method for asynchronous control of functional module activity in an integrated circuit. In [21], the authors present an example mechanism of activity control in order to minimize maximum peak temperature of an integrated circuit. The switching frequency of the integrated circuit module activity influenced the peak temperature. Let us stress that the switching frequency determines the switching between the active and passive modes of an integrated circuit module—and not the working frequency of an integrated circuit. Additionally, the algorithm introduced switching between the modules in order to spread the thermal activity all over of the surface of an integrated circuit. The example mechanism that consists of switching between cores is presented visually in Fig. 4. The activity is spread to the available cores.
As stated in [21], the temperature caused only by the active power dissipation during activity switching conforms to the equationwhere T is the temperature and ω is the pulsation of the switching activity ω=2πf. Visually, the activity switching method can be shown as in Fig. 5. Asynchronous activity switching occurs when the temperature of an integrated circuit is over some defined level—as seen in the picture: the over-temperature area.
Because of these there is a strong demand to estimate and define the dynamic power loss in real time. The best solution is to estimate not only current power loss but also on the basis of computer flow to be able to predict and control power dissipation before real damage can be caused on the hardware level.
In order to define dynamic thermal behaviour of an integrated circuit, a corresponding variable has to describe dependencies that connect physical components of the integrated circuit with its thermal response to a test pattern [22]. The proposed dynamic change is a value that can describe the influence of one thermal active module on the total chip temperature and its dynamic change during computation of the stream of data. In order to investigate thermal dynamics of any integrated circuit, point heating time (PHT) value will be introduced. The PHT value allows us to compare physical and thermal properties of an integrated circuit. The next section will cover the PHT definition and method of its calculation from the initial thermal simulation data.
Section snippets
Point heating time
The point heating time (PHT) value is the duration of temperature growth in any measurement point in the chip volume, which is caused by a heat source located on the chip surface. When we assume that an integrated circuit/micro-circuit temperature reaches its stationary state in the time 3τ, τ is the point heating time value for mean temperature [23]. This value is derived from the equation that describes a mean temperature on the integrated circuit [22].where
Point heating time for substrates
The authors analyze PHT changes for different substrate materials on which an active module is placed. For the analysis authors chose silicon (Si), gallium arsenide (GaAs), cuprum (Cu), aluminium (Al) and aluminium oxide (Al2O3). As a test model, let us define a single heating module (M1) of the size 4×4 mm2 placed on a 20×10 mm2 surface plane. The total thickness of the integrated circuit is 0.625 mm (thickness of a 5″ silicon wafer). Dimensions of the test model are shown in Fig. 7. The boundary
Point heating time for different chip thickness
According to Eq. (4) the time constant value (and consequently the PHT value) depends on chip thickness. The next analysis consists of chip thickness comparison. The same test model as described in the previous section is chosen for this analysis. For the purpose of comparison, the authors chose silicon substrate material with standard wafer thicknesses of 0.275, 0.375, 0.625 and 0.925. The PHT and the time constant values for the given thickness types are presented in Table 2.
The data show
Point heating time for distance estimation
In order to compute the PHT value for the whole chip surface, the authors chose a test model identical to the one in the previous section. The chip thickness was 0.625 mm, the substrate material was silicon. The heating module was placed according to Fig. 8. The PHT value for the chip surface is presented in Fig. 9.
Fig. 8 shows regular change of the PHT value depending on the distance from an active module. The PHT value versus distance from the heat centre has been calculated. Results are
Test models for multi-source case
The analysis of concurrent use of multi-core modules will be presented for two cases. The difference will be in the total core – module – count. Let us define an integrated circuit with the dimensions 20×10 mm2 in the X–Y plane. The chip thickness will be taken as the thickness of most significant silicon wafers, that is 0.275 mm, 0.625 mm and 0.925 mm.
Let us define the core count and its placement. For research purposes, two cases were defined. The first scenario had two functional modules placed
Multi-source case results
Analysis of dynamic PHT values was performed for the two presented test models. The simulation data were collected regarding minimal and maximal value of dynamic PHT for a given case. The PHT value computed from the mean temperature – identically to the τ value (Eq. (4)) – was computed for comparison.
The values above were computed for any possible combination of functional module activity. Results for the two active modules are presented in Fig. 14. Colour bars represent dynamic PHT variation.
ASTER—control algorithm verification
In order to check the theoretical conclusions presented above, one needs to define a simulation case for a multi-core integrated circuit. As was mentioned in the introduction, massive multi-core systems are going to be a major player in the computing industry. Because of this, the authors are going to analyze thermal aspects of the Intel Polaris processor. The processor consists of 80 cores placed regularly on the chip surface. The area of the chip – 21×10 mm2 – has on the basic plane nothing
ASTER, round-robin, random—results
Based on the presented principles for all three algorithms, a new software testbench was created, which performed the following simulation steps:
- •
Test the program definition which consists of two types of instructions (A/B) whose computation time was defined for type A—20 ms and for B—30 ms. Software selects randomly program code from type A/B as long as the total code length is less than 2 s.
- •
Program data is forwarded to the three algorithms—ASTER using 100% of available cores, round-robin using
Conclusions
In the article the authors presented the analysis of the integrated circuit thermal dynamics. The research connects the thermal state of the chip with its physical description. PHT mean values were compared with the analytical time constant values. Different substrate materials and chip thicknesses were examined. The simulation results prove the analytical equations to be correct. The values do not differ by more than 10%. For the given test materials, silicon has the lowest PHT value from
Acknowledgement
This work was prepared as a part of project grant number N R13 0065 10 and paid for by the National Center for Research and Development, Poland.
References (27)
- et al.
Asynchronous control of modules activity in integrated system for reducing peak temperatures
Integration, the VLSI Journal
(2008) - S. Borkar, Thousand core chips: a technology perspective, in: Proceedings of the 44th Annual Conference on Design...
- J. Bautista, Tera-scale computing – motivation and challenges – Conference Computing of the Future—Energy-Efficient...
- J. Shalf, B. Tchudi S. Elbert et al., Power, cooling, and energy consumption for the petascale and beyond, in:...
- Intel CoreTM i7 Processor Extreme Edition Series and Intel CoreTM i7 Processor, Specification update, Intel...
- T. Mattson, Scalable software for many core chips—programming Intel-s 80-core research chip, in: Proceedings of...
- D. Pham, H. Anderson, E. Behnen, Bolliger, et al., Key features of the design methodology enabling a multi-core SoC...
- U. Paschen D. Dittrich H. Vogt N. Kordas, High temperature CMOS process with dielectric isolatio, in: Proceedings of...
- Dirk Grunwald, Philip Levis, Keith I. Farkas, Charles B. Morrey III, Michael Neufeld, Policies for dynamic clock...
- A. GoŁda, A. Kos, Energy losses in digital CMOS integrated circuits: state-of-the-art and future trends, in:...
Dynamic voltage scaling in multitier web servers with end-to-end delay control
IEEE Transactions on Computers
Cited by (4)
Improvement of multicores throughput based on environmental conditions
2016, Microelectronics ReliabilityCitation Excerpt :A processor driven by TCO is operating with a frequency set in such a way to keep always the same chip temperature, regardless of processor's throughput. In [23,24] an asynchronous control of power method in multicore devices is presented. The aim of this method is to achieve a more uniform temperature on entire surface of a chip by means of assigning tasks to a dynamically selected cores, preventing single core from working in unnecessary high temperature and prolonging its life.
Effective approach of microprocessor throughput enhancement
2019, Microelectronics InternationalEffective temperature control approach for ICs
2018, Proceedings of 25th International Conference Mixed Design of Integrated Circuits and Systems, MIXDES 2018Quiet passive cooling of high performance microsystems with additional temperature sensor
2016, Proceedings of the 23rd International Conference Mixed Design of Integrated Circuits and Systems, MIXDES 2016