Performance and energy optimisation in CPUs through fuzzy knowledge representation
Introduction
The main goal of this article is the optimisation of processor architecture, with the aim of containing energy consumption and improving performance, as well as the quantification and control of thermal dissipation in the obtained Pareto-optimal configurations. To this end, we propose an improvement in the selective load value prediction (SLVP) microarchitecture [15] analysing processor design knowledge expressed through fuzzy logic rules (PDK-FLR) over the design space exploration (DSE) process from both the viewpoints of solution quality and convergence speed.
The SLVP unit is a hardware structure which exploits the locality property of load values by anticipating the next output of a certain load instruction based on its previous outputs. The prediction process is applied selectively, only on long-latency loads, resulting in a simpler hardware structure, fewer SLVP accesses, and better performance. In this work, the SLVP unit has been extended with data memory address-based access, set-associative organisation, multiple load values, as well as the possibility to increase the selectivity level by accessing the SLVP solely when a miss in the level 2 (L2) cache occurs. The study in [14] covered 19 architectural parameters with 2.5 · 1015 possible configurations. As the number of parameters considered in this work is 23, potential configurations grow to more than 121 millions of billions, thus requiring the use of a heuristic search. The automatic DSE will be generated with the Framework for Automatic Design Space Exploration (FADSE) presented in [5], applying the NSGA-II multi-objective genetic algorithm [10]. We significantly improve and enlarge initial design knowledge, represented inclusively through fuzzy logic rules, during the optimisation process of the target microarchitecture for faster convergence and better solution diversity and quality. We will analyse how processor design knowledge expressed by experts through fuzzy logic rules might be formally represented and we will evaluate how it could improve the effectiveness of multi-objective DSE algorithms. These improvements lead to a new, original optimisation methodology of micro-architectural design.
We optimise the target microarchitecture with respect to performance, energy, and temperature. Thermal evaluation and control is an essential feature in modern central processing unit (CPU) design. All power consumption is finally dissipated as heat and, in large data centres, heating ventilation and air conditioning (HVAC) systems absorb almost the same amount of energy as the computing resources themselves [34]. Considering that a non-negligible fraction of the worldwide energy production is attributable to information technology equipment (ITE), and that the corresponding greenhouse gases (GHG) emissions are expected to grow relatively faster than in other sectors, energy-efficiency and thermal performance of CPUs are fundamental factors in the quest for a reduction of the global energy footprint, and consequently GHG emissions.
CPU performance can usually be improved by increasing clock frequencies, enlarging and refining hardware resources, or adding new hardware resources. Over the last years, this has however resulted in increased power consumption of servers, stretching the limits of power supply and cooling equipment. In high-performance computing (HPC) systems, when extremely expensive computation is involved, e.g. in simulations for engineering, biochemistry, or finance, thirst for speed has been coupled with consumption of egregious amounts of power.1 For such supercomputers, ultimate performance depends on complex optimisations involving the manner in which cores interact with memory and their communication paradigms [9]. The effect of architectural evolution on HPC can be appreciated by observing the TOP5002 list. Whilst the average power consumption of the top 10 (resp., 50, and 500) systems in the list is 1.32 (resp., 0.908, and 0.257) MW, the average power efficiency scores at 248 (resp., 193, and 122) MFLOPS/W. Therefore, the most powerful supercomputers (which also happen to be the most recent) are more energy efficient than their predecessors. In synthesis, although the net effect of an architectural improvement such as the one proposed here on the energy consumed by an HPC system depends on the microarchitecture of its CPUs, even a modest reduction in power requirements will account for significant savings.
Given the conspicuous amounts of money invested in data centres, energy-awareness has also attracted the attention of the security research community. Indeed, recent evolution in denial of service (DoS) attacks contemplates a new facet where, instead of focusing on degrading the performance of a system to make it useless, raiders concentrate on actions whose ultimate purpose is to raise the energy consumption, with potentially devastating effects on the financial side [27]. This is particularly evident with battery-powered devices, where a targeted attack may stealthily deplete the battery, finally leading to service failure (see, e.g. [13]). It should be kept in mind that, as far as next-generation mobile devices are concerned, the requirements for secure communications with the involved computation-intensive cryptographic operations will affect the energy needs of the mobile SoC [6]. In large-scale data centre infrastructures, complex contract structures for energy supply exacerbate the amplitude of this type of attack, because if a threshold billing system (a higher tariff is applied when consumption exceeds a given threshold) is in place, even a small increase in consumption may result in a significant amount of monetary loss [28]. In designing new microarchitectures, then, all the above aspects should be considered.
This paper is structured as follows. Section 2 reviews the existing value prediction based speculative microarchitectures and the DSE concept. Section 3 describes the target SLVP based superscalar microarchitecture. Section 4 presents our developed method to improve the effectiveness of multi-objective DSE algorithms with processor design knowledge expressed through fuzzy logic rules and other restrictions. Section 5 details the simulation methodology and presents the quality metrics applied in this work. Section 6 discusses the simulation results generated on the Alpha AXP 21264 microarchitecture improved with SLVP capabilities. Finally, Section 7 highlights the main contributions and proposes some possible further work.
Section snippets
Load value prediction
Load value prediction is a data-speculative micro-architectural technique introduced in [23] exploiting value locality and the correlation between the load instruction addresses and their actual values. In this work, we apply a selective approach; prediction will only apply to load instructions with a miss in the data cache. Doing so, we reduce mis-prediction costs, decrease hardware costs and significantly improve the performance-energy ratio of the microarchitecture (a lower hardware
Highly parameterised SLVP
We extended the SLVP-based superscalar architecture [14] with the following new important capabilities:
- 1.
Indexing the SLVP table, also using data memory address (memory-centric approach). As access is performed selectively (only upon a miss in the L1 data cache when the data memory address is already available), the SLVP can be indexed using the data memory address without causing delays in the pipeline structure. This comparative analysis is motivated by the fact that some studies have
Processor design knowledge expressed through fuzzy rules
The previous section presented our micro-architectural improvements. This section discusses an improved and enlarged processor design knowledge used during the optimisation process of the target microarchitecture for faster convergence and better solution spread and quality.
Fuzzy logic is based on fuzzy set theory [46]. In classical set theory, the characteristic function associated to a set S, for a certain element, is 1 or 0 depending on whether that element belongs or not to S, respectively.
Tools used in simulations
The FADSE framework integrates the jMetal library5 which provides many multi-objective heuristics. One of the implemented algorithms used in this work is NSGA-II, a multi-objective genetic algorithm described in [10]. FADSE has as inputs the metaheuristic (NSGA-II in this case) and specific parameters such as the following:
- •
The name of the genetic algorithm parameters (crossover operator, mutation operator, selection operator, population size, maximum
Performance and energy consumption evaluations
First, we compared the Pareto fronts of our new extended architecture with and without fuzzy logic rules. Further we selected six configurations from the obtained Pareto front for thermal analysis.
Fig. 3 presents the Pareto front approximations discovered by our modified NSGA-II genetic algorithm after 52 generations (at the end of the exploration). Globally, the best results are obtained with PDK-FLR, showing a noticeable gain. We can observe that in the area with high energies, the DSE
Conclusion and further work
In this work, we presented an improved automatic DSE methodology which significantly enlarges an initial domain knowledge represented through fuzzy logic rules and other deterministic restrictions. We highly extended the SLVP microarchitecture and we also performed a thermal analysis and control which verifies the energy efficiency of the design and its temperature optimisation. These improvements led to a new, realistic and original methodology for performance, energy consumption, and
Acknowledgements
We would like to thank former M.Sc. student Claudiu Buduleci of the Embedded Systems Specialisation at ‘Lucian Blaga’ University of Sibiu, Romania, for offering his help in the technical development of the thermal analysis section.
References (47)
- et al.
Modeling energy-efficient secure communications in multi-mode wireless mobile devices
J. Comput. Syst. Sci.
(2015) - et al.
Multi-objective optimization applied to unified second level cache memory hierarchy tuning aiming at energy and performance optimization
Appl. Soft Comput.
(2016) - et al.
Exploiting selective instruction reuse and value prediction in a superscalar architecture
J. Syst. Archit.
(2009) - et al.
Incorporating domain knowledge into the optimization of energy systems
Appl. Soft Comput.
(2016) - et al.
Fast and standalone design space exploration for high-level synthesis under resource constraints
J. Syst. Archit.
(2014) - et al.
Developing domain-knowledge evolutionary algorithms for network-on-chip application mapping
Microprocess. Microsyst.
(2013) - et al.
Early-phase performance exploration of embedded systems with ABSOLUT framework
J. Syst. Archit.
(2013) Degrees of contradiction for fuzzy logic rules implementing computer architecture ontologies
Revista Română de Informatică şi Automatică
(2013)- et al.
The hypervolume indicator revisited: On the design of Pareto-compliant indicators via weighted integration
Evolutionary Multi-Criterion Optimization
(2007) - et al.
QUILT: a GUI-based integrated circuit floorplanning environment for computer architecture research and education
Proceedings of the 2005 Workshop on Computer Architecture Education: Held in Conjunction with the 32nd International Symposium on Computer Architecture, Madison, WI, USA
(2005)
Wattch: A framework for architectural-level power analysis and optimizations
Energy efficient co-adaptive instruction fetch and issue
Proceedings of the 30th Annual International Symposium on Computer Architecture, San Diego, CA, USA
Multi-Objective Optimization of Advanced Computer Architectures using Domain-Knowledge
An automatic design space exploration framework for multicore architecture optimizations
9th RoEduNet International Conference, Sibiu, Romania
Integrating complete-system and user-level performance/power simulators: the SimWattch approach
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, USA
Hybrid MPI/OpenMP application on multicore architectures: the case of profit-sharing life insurance policies valuation
Appl. Math Sci.
A fast and elitist multiobjective genetic algorithm: NSGA-II
IEEE Trans. Evol. Comput.
Archexplorer.org: joint compiler/hardware exploration for fair comparison of architectures
13th Workshop on Interaction between Compilers and Computer Architecture (Interact-13), Raleigh, NC, USA
Characterizing the branch misprediction penalty
IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, USA
Exploiting battery-drain vulnerabilities in mobile smart devices
IEEE Trans. Sustainable Comput.
Multi-objective optimisations for a superscalar architecture with selective value prediction
IET Comput. Dig. Tech.
Energy-performance design space exploration in SMT architectures exploiting selective load value predictions
Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany
Computer Architecture: A Quantitative Approach
Cited by (17)
A competitive new multi-objective optimization genetic algorithm based on apparent front ranking
2024, Engineering Applications of Artificial IntelligenceMeta-analytical comparison of energy consumed by two sorting algorithms
2022, Information SciencesCitation Excerpt :Hardware and software work efficiently with this interaction [4]. Noticing the effort put by computer hardware researchers, software researchers conducted studies measuring the energy consumption of different methods and algorithms varying experimental environment [5]. The increase of conducted primary studies has raised the challenge of making conclusion from an ever growing number of primary studies having at times varying experimental setups and conclusions.
Peak temperature analysis and optimization for pipelined hard real-time systems
2021, Information SciencesCitation Excerpt :Lee et al. addressed the problem of optimizing and improving the throughput of a multi-core processor with power and thermal constraints [9]. In [25], the authors tried to optimizie the processor microarchitecture with respect to performance, energy, and temperature using selective load value prediction (SLVP). New SLVP parameters related to these objectives and fussy rules are introduced.
Heat sinks with minichannels and flow distributors based on constructal law
2021, International Communications in Heat and Mass TransferCitation Excerpt :The Central Processing Unit (CPU) capability has been growing and becoming more and more powerful, being indispensable for new currentcomputer systems, which leads to large transistors amounts in the processor, as indicated by Moore's Law [1], see Fig. 1. Commercial processors are being built in smaller and smaller sizes, creating very localized hot spots that lead to poor performance and decrease the lifetime for these systems [2]. Over the past two decades, there have been several heat sinks breakthroughs for computer equipment research, both experimental and numerical.
FARSI: An Early-stage Design Space Exploration Framework to Tame the Domain-specific System-on-chip Complexity
2023, ACM Transactions on Embedded Computing SystemsResearch on Multirelational Entity Modeling based on Knowledge Graph Representation Learning
2023, Recent Advances in Computer Science and Communications