Elsevier

Information Sciences

Volume 476, February 2019, Pages 375-391
Information Sciences

Performance and energy optimisation in CPUs through fuzzy knowledge representation

https://doi.org/10.1016/j.ins.2018.03.029Get rights and content

Highlights

  • Processor Design Knowledge expressed through Fuzzy Logic Rules over Design Space Exploration is studied for the Selective Load Value Prediction microarchitecture.

  • CPU microarchitecture is optimized for performance, energy consumption and thermal dissipation through a modified NSGA-II algorithm.

  • Six architectural configurations from the Pareto front were analyzed; all were found hardware feasible (temperatures lower than 111.8 °C).

  • The Framework for Automatic Design Space Exploration tool was used and extended. Metrics were assessed with M-SIM2, CACTI, QUILT and HOTSPOT.

  • Cross-fertilization between CPU architecture, multi-objective optimization methods, and knowledge representation leads to an effective processor design.

Abstract

This paper presents an automatic design space exploration using processor design knowledge for the multi-objective optimisation of a superscalar microarchitecture enhanced with selective load value prediction (SLVP). We introduced new important SLVP parameters and determined their influence regarding performance, energy consumption, and thermal dissipation. We significantly enlarged initial processor design knowledge expressed through fuzzy rules and we analysed its role in the process of automatic design space exploration. The proposed fuzzy rules improve the diversity and quality of solutions, and the convergence speed of the design space exploration process. Experiments show that a set-associative prediction table is more effective than a direct mapped table and that 86% of the configurations in the Pareto front use multiple values per load. In conclusion, our experiments show that integrating an SLVP module into a superscalar microarchitecture is hardware feasible; in comparison with the case without SLVP, performance is better, energy consumption is lower, and the temperatures inside the chip decreases, remaining below 75 °C.

Introduction

The main goal of this article is the optimisation of processor architecture, with the aim of containing energy consumption and improving performance, as well as the quantification and control of thermal dissipation in the obtained Pareto-optimal configurations. To this end, we propose an improvement in the selective load value prediction (SLVP) microarchitecture [15] analysing processor design knowledge expressed through fuzzy logic rules (PDK-FLR) over the design space exploration (DSE) process from both the viewpoints of solution quality and convergence speed.

The SLVP unit is a hardware structure which exploits the locality property of load values by anticipating the next output of a certain load instruction based on its previous outputs. The prediction process is applied selectively, only on long-latency loads, resulting in a simpler hardware structure, fewer SLVP accesses, and better performance. In this work, the SLVP unit has been extended with data memory address-based access, set-associative organisation, multiple load values, as well as the possibility to increase the selectivity level by accessing the SLVP solely when a miss in the level 2 (L2) cache occurs. The study in [14] covered 19 architectural parameters with 2.5 · 1015 possible configurations. As the number of parameters considered in this work is 23, potential configurations grow to more than 121 millions of billions, thus requiring the use of a heuristic search. The automatic DSE will be generated with the Framework for Automatic Design Space Exploration (FADSE) presented in [5], applying the NSGA-II multi-objective genetic algorithm [10]. We significantly improve and enlarge initial design knowledge, represented inclusively through fuzzy logic rules, during the optimisation process of the target microarchitecture for faster convergence and better solution diversity and quality. We will analyse how processor design knowledge expressed by experts through fuzzy logic rules might be formally represented and we will evaluate how it could improve the effectiveness of multi-objective DSE algorithms. These improvements lead to a new, original optimisation methodology of micro-architectural design.

We optimise the target microarchitecture with respect to performance, energy, and temperature. Thermal evaluation and control is an essential feature in modern central processing unit (CPU) design. All power consumption is finally dissipated as heat and, in large data centres, heating ventilation and air conditioning (HVAC) systems absorb almost the same amount of energy as the computing resources themselves [34]. Considering that a non-negligible fraction of the worldwide energy production is attributable to information technology equipment (ITE), and that the corresponding greenhouse gases (GHG) emissions are expected to grow relatively faster than in other sectors, energy-efficiency and thermal performance of CPUs are fundamental factors in the quest for a reduction of the global energy footprint, and consequently GHG emissions.

CPU performance can usually be improved by increasing clock frequencies, enlarging and refining hardware resources, or adding new hardware resources. Over the last years, this has however resulted in increased power consumption of servers, stretching the limits of power supply and cooling equipment. In high-performance computing (HPC) systems, when extremely expensive computation is involved, e.g. in simulations for engineering, biochemistry, or finance, thirst for speed has been coupled with consumption of egregious amounts of power.1 For such supercomputers, ultimate performance depends on complex optimisations involving the manner in which cores interact with memory and their communication paradigms [9]. The effect of architectural evolution on HPC can be appreciated by observing the TOP5002 list. Whilst the average power consumption of the top 10 (resp., 50, and 500) systems in the list is 1.32 (resp., 0.908, and 0.257) MW, the average power efficiency scores at 248 (resp., 193, and 122) MFLOPS/W. Therefore, the most powerful supercomputers (which also happen to be the most recent) are more energy efficient than their predecessors. In synthesis, although the net effect of an architectural improvement such as the one proposed here on the energy consumed by an HPC system depends on the microarchitecture of its CPUs, even a modest reduction in power requirements will account for significant savings.

Given the conspicuous amounts of money invested in data centres, energy-awareness has also attracted the attention of the security research community. Indeed, recent evolution in denial of service (DoS) attacks contemplates a new facet where, instead of focusing on degrading the performance of a system to make it useless, raiders concentrate on actions whose ultimate purpose is to raise the energy consumption, with potentially devastating effects on the financial side [27]. This is particularly evident with battery-powered devices, where a targeted attack may stealthily deplete the battery, finally leading to service failure (see, e.g. [13]). It should be kept in mind that, as far as next-generation mobile devices are concerned, the requirements for secure communications with the involved computation-intensive cryptographic operations will affect the energy needs of the mobile SoC [6]. In large-scale data centre infrastructures, complex contract structures for energy supply exacerbate the amplitude of this type of attack, because if a threshold billing system (a higher tariff is applied when consumption exceeds a given threshold) is in place, even a small increase in consumption may result in a significant amount of monetary loss [28]. In designing new microarchitectures, then, all the above aspects should be considered.

This paper is structured as follows. Section 2 reviews the existing value prediction based speculative microarchitectures and the DSE concept. Section 3 describes the target SLVP based superscalar microarchitecture. Section 4 presents our developed method to improve the effectiveness of multi-objective DSE algorithms with processor design knowledge expressed through fuzzy logic rules and other restrictions. Section 5 details the simulation methodology and presents the quality metrics applied in this work. Section 6 discusses the simulation results generated on the Alpha AXP 21264 microarchitecture improved with SLVP capabilities. Finally, Section 7 highlights the main contributions and proposes some possible further work.

Section snippets

Load value prediction

Load value prediction is a data-speculative micro-architectural technique introduced in [23] exploiting value locality and the correlation between the load instruction addresses and their actual values. In this work, we apply a selective approach; prediction will only apply to load instructions with a miss in the data cache. Doing so, we reduce mis-prediction costs, decrease hardware costs and significantly improve the performance-energy ratio of the microarchitecture (a lower hardware

Highly parameterised SLVP

We extended the SLVP-based superscalar architecture [14] with the following new important capabilities:

  • 1.

    Indexing the SLVP table, also using data memory address (memory-centric approach). As access is performed selectively (only upon a miss in the L1 data cache when the data memory address is already available), the SLVP can be indexed using the data memory address without causing delays in the pipeline structure. This comparative analysis is motivated by the fact that some studies have

Processor design knowledge expressed through fuzzy rules

The previous section presented our micro-architectural improvements. This section discusses an improved and enlarged processor design knowledge used during the optimisation process of the target microarchitecture for faster convergence and better solution spread and quality.

Fuzzy logic is based on fuzzy set theory [46]. In classical set theory, the characteristic function associated to a set S, for a certain element, is 1 or 0 depending on whether that element belongs or not to S, respectively.

Tools used in simulations

The FADSE framework integrates the jMetal library5 which provides many multi-objective heuristics. One of the implemented algorithms used in this work is NSGA-II, a multi-objective genetic algorithm described in [10]. FADSE has as inputs the metaheuristic (NSGA-II in this case) and specific parameters such as the following:

  • The name of the genetic algorithm parameters (crossover operator, mutation operator, selection operator, population size, maximum

Performance and energy consumption evaluations

First, we compared the Pareto fronts of our new extended architecture with and without fuzzy logic rules. Further we selected six configurations from the obtained Pareto front for thermal analysis.

Fig. 3 presents the Pareto front approximations discovered by our modified NSGA-II genetic algorithm after 52 generations (at the end of the exploration). Globally, the best results are obtained with PDK-FLR, showing a noticeable gain. We can observe that in the area with high energies, the DSE

Conclusion and further work

In this work, we presented an improved automatic DSE methodology which significantly enlarges an initial domain knowledge represented through fuzzy logic rules and other deterministic restrictions. We highly extended the SLVP microarchitecture and we also performed a thermal analysis and control which verifies the energy efficiency of the design and its temperature optimisation. These improvements led to a new, realistic and original methodology for performance, energy consumption, and

Acknowledgements

We would like to thank former M.Sc. student Claudiu Buduleci of the Embedded Systems Specialisation at ‘Lucian Blaga’ University of Sibiu, Romania, for offering his help in the technical development of the thermal analysis section.

References (47)

  • D. Brooks et al.

    Wattch: A framework for architectural-level power analysis and optimizations

    (2000)
  • A. Buyuktosunoglu et al.

    Energy efficient co-adaptive instruction fetch and issue

    Proceedings of the 30th Annual International Symposium on Computer Architecture, San Diego, CA, USA

    (2003)
  • H. Calborean

    Multi-Objective Optimization of Advanced Computer Architectures using Domain-Knowledge

    (2011)
  • H. Calborean et al.

    An automatic design space exploration framework for multicore architecture optimizations

    9th RoEduNet International Conference, Sibiu, Romania

    (2010)
  • J. Chen et al.

    Integrating complete-system and user-level performance/power simulators: the SimWattch approach

    IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, USA

    (2003)
  • P.L. De Angelis et al.

    Hybrid MPI/OpenMP application on multicore architectures: the case of profit-sharing life insurance policies valuation

    Appl. Math Sci.

    (2013)
  • K. Deb et al.

    A fast and elitist multiobjective genetic algorithm: NSGA-II

    IEEE Trans. Evol. Comput.

    (2002)
  • V. Desmet et al.

    Archexplorer.org: joint compiler/hardware exploration for fair comparison of architectures

    13th Workshop on Interaction between Compilers and Computer Architecture (Interact-13), Raleigh, NC, USA

    (2009)
  • S. Eyerman et al.

    Characterizing the branch misprediction penalty

    IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, USA

    (2006)
  • U. Fiore et al.

    Exploiting battery-drain vulnerabilities in mobile smart devices

    IEEE Trans. Sustainable Comput.

    (2017)
  • A. Gellert et al.

    Multi-objective optimisations for a superscalar architecture with selective value prediction

    IET Comput. Dig. Tech.

    (2012)
  • A. Gellert et al.

    Energy-performance design space exploration in SMT architectures exploiting selective load value predictions

    Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany

    (2010)
  • J.L. Hennessy et al.

    Computer Architecture: A Quantitative Approach

    (2017)
  • Cited by (17)

    • Meta-analytical comparison of energy consumed by two sorting algorithms

      2022, Information Sciences
      Citation Excerpt :

      Hardware and software work efficiently with this interaction [4]. Noticing the effort put by computer hardware researchers, software researchers conducted studies measuring the energy consumption of different methods and algorithms varying experimental environment [5]. The increase of conducted primary studies has raised the challenge of making conclusion from an ever growing number of primary studies having at times varying experimental setups and conclusions.

    • Peak temperature analysis and optimization for pipelined hard real-time systems

      2021, Information Sciences
      Citation Excerpt :

      Lee et al. addressed the problem of optimizing and improving the throughput of a multi-core processor with power and thermal constraints [9]. In [25], the authors tried to optimizie the processor microarchitecture with respect to performance, energy, and temperature using selective load value prediction (SLVP). New SLVP parameters related to these objectives and fussy rules are introduced.

    • Heat sinks with minichannels and flow distributors based on constructal law

      2021, International Communications in Heat and Mass Transfer
      Citation Excerpt :

      The Central Processing Unit (CPU) capability has been growing and becoming more and more powerful, being indispensable for new currentcomputer systems, which leads to large transistors amounts in the processor, as indicated by Moore's Law [1], see Fig. 1. Commercial processors are being built in smaller and smaller sizes, creating very localized hot spots that lead to poor performance and decrease the lifetime for these systems [2]. Over the past two decades, there have been several heat sinks breakthroughs for computer equipment research, both experimental and numerical.

    View all citing articles on Scopus
    View full text