# SystemC Model of Power Side-Channel Attacks Against AI Accelerators: Superstition or not?

Andrija Nešković<sup>1</sup>, Saleh Mulhem<sup>1</sup>, Alexander Treff<sup>2</sup>, Rainer Buchty<sup>1</sup>, Thomas Eisenbarth<sup>2</sup>, and Mladen Berekovic<sup>1</sup>

<sup>1</sup>Institute of Computer Engineering and <sup>2</sup>Institute for IT Security

University of Lübeck

Lübeck, Germany

{andrija.neskovic, saleh.mulhem, a.treff, rainer.buchty, thomas.eisenbarth, mladen.berekovic}@uni-luebeck.de

Abstract—As training artificial intelligence (AI) models is a lengthy and hence costly process, leakage of such a model's internal parameters is highly undesirable. In the case of AI accelerators, side-channel information leakage opens up the threat scenario of extracting the internal secrets of pre-trained models. Therefore, sufficiently elaborate methods for design verification as well as fault and security evaluation at the electronic system level are in demand.

In this paper, we propose estimating information leakage from the early design steps of AI accelerators to aid in a more robust architectural design. We first introduce the threat scenario before diving into SystemC as a standard method for early design evaluation and how this can be applied to threat modeling. We present two successful side-channel attack methods executed via SystemC-based power modeling: correlation power analysis and template attack, both leading to total information leakage. The presented models are verified against an industry-standard netlistlevel power estimation to prove general feasibility and determine accuracy. Consequently, we explore the impact of additive noise in our simulation to establish indicators for early threat evaluation. The presented approach is again validated via a model-vs-netlist comparison, showing high accuracy of the achieved results. This work hence is a solid step towards fast attack deployment and, subsequently, the design of attack-resilient AI accelerators.

*Index Terms*—Artificial Intelligence, Accelerators, Sidechannel Attacks, SystemC, Power Modeling.

## I. INTRODUCTION

The use of Artificial Intelligence (AI) is rapidly growing across all emerging technologies. One of the most important aspects is accelerating the AI inference process and building according hardware accelerators. An accelerator design's faulttolerance mechanisms and other safety features are usually evaluated in the pre-silicon phase, whereas evaluation of the accelerator's physical security is performed in the post-silicon phase. Side channel attacks, especially power attacks, are considered a serious security threat leading to a vulnerable AI hardware component. Recently, studies in the domain of AI accelerator design show that side-channel leakages of AI accelerators can be exploited to reveal industrial secrets such as the AI model architecture and its parameters [1]–[3]. For instance, power attacks are deployed successfully to reverse engineer the AI model [1], [4]. Power attacks lead to copying



Fig. 1. AI Accelerator IC Design Process (Adapted from [5]).

the AI model and then distributing it as counterfeit intellectual property (IP). Therefore, a huge need exists for evaluating an AI accelerator's side-channel attack resistance in the early design steps (EDS) of integrated circuit (IC) design, such as security evaluation at register-transfer level (RTL), using gatelevel netlists, or even earlier. Evaluation and investigation of security issues in EDS provide insight into the robustness of the later fabricated IC. Thus, design decisions made at high abstraction levels have significant impact on the whole design process.

Tools and platforms considering security evaluation in EDS were developed mainly to detect hardware Trojan circuitry [6] or to check the security rule in ICs [7]. Recent work confirms the importance of security evaluation in the EDS by demonstrating static and dynamic information flow analysis using Virtual Prototypes (VPs) [8], [9]. Simulators targeting side-channel evaluation have been studied in [10]. However, the reviewed tools consider only the software implementation of cryptographic algorithms executed on general purpose hardware (i.e. microcontrollers). *The evaluation of side-channel attacks (SCA) and its impact on ICs are still an open issue in EDS:* To the best of our knowledge, no simulator exists targeting SCA evaluation in EDS of IC is required to

This work has been funded by the German Ministry of Education and Research (BMBF) via the project VE-Jupiter (16ME0234).

ensure the IC's dependability, together with existing reliability and safety tests [11].

Our work demonstrates previously shown power SCA by utilizing a dedicated power estimation model with the goal to evaluate the worst-case resilience of an AI accelerator's design at electronic system level (ESL). This approach was positively verified by a comparison of the SystemC model's behavior with a technology-synthesized gate-level netlist.

## A. Why a SystemC Model?

SystemC is a solid candidate [12] for performing security evaluation in EDS, as it is one of the industry standards for hardware/software modelling at high abstraction levels. Particularly, SystemC is C++-based and was originally conceived for hardware/software co-design, simulation, and functional verification [13]. Over time, new design aspects such as fault evaluation [14] and power modeling [15], [16] were also addressed using SystemC. The security assessment of IC design has recently received more attention [7], especially by SystemC in EDS [12]. With proper power estimation models, SystemC can be utilized to simulate power attacks against ICs even at the ESL.

In order to clarify how to deploy SystemC in EDS, Fig. 1 shows the top-down hardware design process of AI accelerators (a modified version of the *Double roof model* described in [5], [17]). Starting from ESL, the requirements towards AI accelerators are specified and synthesized into a system model (most likely represented by a VP [18]). The requirements for lower abstraction levels are derived based on this specification and implementation. At every abstraction level, the specification is transformed into an implementation with a synthesis step. This work presents a SystemC model of an AI accelerator at ESL.

## B. Paper Contribution

In this paper, we show how to evaluate power attacks against AI accelerators in EDS. For this, we build a SystemC model of a systolic-array-based AI accelerator hardware at ESL. Using SystemC, the activation count of components can be annotated with a power-consumption model to generate power traces covering the hardware of AI accelerators. We demonstrate a correlation power analysis (CPA) and template attacks (TA) based on our SystemC model in ideal conditions and explore the limits of these attacks in a noisy environment. Finally, we show the comparison of our model-based traces against power traces from a state of the art netlist level simulation to demonstrate the feasibility of the proposed model.

## II. RELATED WORK

The target of this work is to evaluate the benefits of using SystemC models to analyze side-channel information leakage at ESL in EDS. Therefore, power side-channel attacks against AI accelerators are briefly discussed in this section, as well as modeling approaches utilizing SystemC in different areas.

# A. Power Attacks vs. AI Hardware Accelerator

Several power attack scenarios against AI hardware accelerators have been proposed [4], [19]. Power attacks exploit power consumption leakage from an accelerator executing a pre-trained AI model (simply AI model) to reveal its internal secrets. In particular, an attacker uses an evaluation board attached to a targeted device [20], i.e., the AI Hardware accelerator, and captures power consumption traces of some data input. The attacker applies statistical analysis, e.g., Simple Power Analysis (SPA), Differential Power Analysis (DPA), or Correlation Power Analysis (CPA), on the data input and the power traces to recover the internal secrets of the AI model. For instance, DPA was deployed to extract the AI's secret parameters in [19]. In [4], CPA was applied against the systolicarray-based hardware accelerator of Deep Neural Networks (DNNs).

#### B. Power Consumption Modeling Challenges

The power consumption of any CMOS computing platform includes two types: static (leakage) power consumption  $P_{static}$ and dynamic (switching) power consumption  $P_{dynamic}$  [21]. The total power for a computing platform can be modeled by:

$$P_{total} = P_{dynamic} + P_{static} \tag{1}$$

 $P_{static}$  is the product of leakage current and the supply voltage [22], and  $P_{dynamic}$  indicates and quantifies transistor switching. Thus,  $P_{dynamic}$  provides a distinctive current profile. Therefore, power attacks mainly rely on  $P_{dynamic}$ , which is considered the Achilles heel of any CMOS computing platforms [23].

SystemC can be utilized for the activation count of hardware components at different abstraction levels. It plays a crucial role in simulating power attacks in EDS. Using SystemC to estimate a computing platform's power consumption poses several challenges: SystemC was used in [24] to estimate the power consumption of different processor configurations based on pre-computed power values of its components, such as memories, register files, function units, etc. The proposed power model exhibited a 15% prediction error. In [15], a blackbox power model was introduced for digital signal processors (DSPs) in SystemC. The proposed power model does not require detailed insight into the individual components of the probed computing platform. The black-box power model exhibited a prediction error of less than 4%. This prediction error is caused by the lack of information about power dissipation P<sub>static</sub> of the targeted manufacturing technology. Therefore, the power consumption of a computing platform is rather difficult to model in SystemC. However, our work introduces a SystemC model that considers only dynamic power consumption  $P_{dunamic}$  to enable simulating power attacks.

## C. SystemC for Security Evaluation

Utilizing a SystemC VP to evaluate security-critical systems on chips has already been demonstrated in [16]. Beyond that, SystemC was proven successful for power-attack evaluation



Fig. 2. Block diagram of TPU including the threat model.

of cryptographic applications in [12], such as RSA-based public-key cryptosystems and elliptic-curve cryptography. This approach in [12] solely relies on a dedicated dynamic power consumption model, the so-called input-dependent model. This model covers the arithmetic operations by assuming that there is no difference between simulated hardware or C++ operators. The input-dependent model covers bit-shifts and comparisons as well, but lacks power modeling of registers, multiplexers, and other hardware components. In order to extend SystemC power attacks analysis beyond cryptographic applications, this paper introduces power estimation models to cover systolic arrays. By utilizing these power models, CPA and powertemplate attacks (TA) against AI accelerators are simulated.

In the following sections, we first build a system model of a systolic-array-based AI accelerator and extend and modify power-consumption models proposed in [12], [16] to also cover additional components present in an AI hardware accelerator. Then, we perform the proposed attacks. Finally, we verify our power estimation model and the validity of the attacked performed hereon against a state-of-the-art netlist-level power estimation tool.

# III. AI ACCELERATOR ESL MODEL

To perform a security evaluation in EDS, an ESL model of an AI accelerator is required. For our approach, we use SystemC as the modelling language and start with a looselytimed SystemC model of a systolic array for simulating the behaviour. By annotating this model with input-dependent power estimation capabilities, all components for SCA simulation can be provided.

#### A. Systolic Array for Acceleration

The inference process of AI applications requires frequent data access. Such data-read operations from memory are very costly and time-consuming and therefore should be avoided on edge accelerators in order to minimize power consumption and maximize performance. This can be addressed by using so-called systolic array architectures, featuring a number of benefits [25]. Instead of accessing memory after every arithmetic operation, the systolic design approach utilizes multiple processing elements (*PEs*) to avoid frequent memory access.



Fig. 3. SystemC Implementation Overview.

Each *PE* performs a multiply-accumulate operation (MAC) as shown in Fig. 2. The partial result of an *PE* is directly passed to another *PE* without memory access. The realization of an array of *PEs* in hardware can accelerate matrix multiplication, which is essential for accelerating the desired AI algorithm.

The matrix multiplication of  $A = (a_{ij})_{3\times 3}$  and  $B = (b_{ij})_{3\times 3}$  results in a matrix  $C = (c_{ij})_{3\times 3}$ . A systolic array accelerates such a matrix multiplication, where, a resulting element  $(c_{11})$  is, for instance, calculated sequentially over 3 clock-cycles by performing 3 MACs in 3 different *PEs* as follows [4]:

$$\begin{array}{ll} Reg_{11} &= a_{11} \times b_{11} + 0 & (t = 1) \\ Reg_{21} &= a_{12} \times b_{21} + Reg_{11} & \\ &= a_{12} \times b_{21} + a_{11} \times b_{11} & (t = 2) \\ Reg_{31} &= a_{13} \times b_{31} + Reg_{21} & \\ &= a_{13} \times b_{31} + a_{12} \times b_{21} + a_{11} \times b_{11} & (t = 3) \end{array}$$

$$(2$$

Where  $Reg_{ij}$  is a partial sum register of  $PE_{ij}$  as shown in Fig. 2. In our model, the weights and inputs are represented as 8 bit integers, and the partial-sum results as 18 bit integers.

## B. SystemC Model of the Accelerator

For our approach, we focus on a loosely-timed SystemC model. Here, the behaviour of the AI accelerator is represented by a SystemC module to perform accelerated calculations and mimic the timing and power characteristics of a real hardware accelerator. Fig. 2 shows the architecture of the modelled system. The SystemC model easily realizes the individual multiply-accumulate and register operations required during inference, by using dedicated data types. The matrix multiplication is performed over several cycles depending on the dimension of the matrix by utilizing all PEs in parallel. The result is therefore available in several parts across multiple cycles as described in Eq. 2. Furthermore, the proposed adversary is implemented as another SystemC module which is able to send input and receive output from the AI accelerator. Lastly, all activity during inference is tracked by a dedicated resource handler shown in Fig. 3, which implements a powerestimation model described in the following section.

## **IV. DYNAMIC POWER CONSUMPTION MODEL**

As the dynamic power consumption  $(P_{Dynamic})$  is the required power consumption during logical transitions [22], the dynamic power-estimation model of a systolic array can be built based on the operations performed by every *PE*. Here, the SystemC model should implement a dedicated resource handler to generate power traces while the calculation is performed [12]. From ESL perspective, every single operation performed in hardware consumes a certain amount of power measured by the so-called power expense, which depends on the hardware architecture, the type of operations, and the inputs of the operation. In the following, we utilize the input-dependent power model proposed in [12] and we extend this model to include hardware components such as registers and MAC components.

## A. Power Model of a Single Processing Element

The extended version of the input-dependent model relies on the input of the hardware components and its computational/storage efforts CE. If the inputs of a hardware component are zero, we consider its contribution to the dynamic power consumption as negligible and its computational/storage effort CE is zero. Otherwise, its contribution is not negligible, and its computational/storage effort CE relies on the number of ones in the input. The CE reflects the switching activity of the component and can be described by utilizing a bit-flipping power model. Several cases of single-bit flipping have to be considered, and a power expense for every case is assigned as follows: The transition  $0 \rightarrow 0$  and  $1 \rightarrow 1$  require zero power expense,  $0 \rightarrow 1$  requires one power expense, and  $1 \rightarrow 0$  requires 0 power expense as flipping one bit from 0 to 1 consumes much more power than from 1 to 0 [12]. The proposed dynamic power consumption model of a single PE estimates the power expense of the MAC component by breaking it down into arithmetic operations. Additionally, the expense of accessing the register is considered.

**MAC** Component Power Model: The computational expenses of the MAC component can be broken down into the switching activity of binary arithmetic operations performed during the calculation, namely multiplication and addition. Counting the flipping of single bits during the calculation provides an estimation of the power expense of the performed MAC operation. Binary arithmetic multiplication can be considered as a series of adders; therefore, the power model of the multiplication is based on the power expenses of a binary adder shown in Table I.

**Register Power Model:**  $PM_{Reg}$  denotes the power expenses of the register access power model, which can be modeled based on the bit-switching activity inside the register every time a new value is written. Therefore, the old and new states of the register ( $Reg_{old}$  and  $Reg_{new}$ ) are compared, and the number of switches is counted by using the Hamming Distance (HD):

$$PM_{Reg} = HD(Reg_{old} \oplus Reg_{new}). \tag{3}$$

TABLE I Power Expenses of Binary Adder

| Input Bits $(a, b, c)$ | $\begin{array}{c} \textbf{Output Bits} \\ (c,s) \end{array}$ | $state\ expense$ | CE | $PM_{BA}$ |
|------------------------|--------------------------------------------------------------|------------------|----|-----------|
| (0,0,0)                | (0,0)                                                        | 0                | 0  | 0         |
| (1,0,0)                | (0,1)                                                        | 0                | 1  | 1         |
| (0,1,0)                | (0,1)                                                        | 0                | 1  | 1         |
| (1,1,0)                | (1,0)                                                        | 1                | 2  | 3         |
| (0,0,1)                | (0,1)                                                        | 1                | 0  | 1         |
| (1,0,1)                | (1,0)                                                        | 0                | 1  | 1         |
| (0,1,1)                | (1,0)                                                        | 0                | 1  | 1         |
| (1,1,1)                | (1,1)                                                        | 0                | 2  | 2         |

The power consumption model of one PE: The total power consumption of a PE ( $PM_{PE}$ ) is the sum of the  $PM_{MAC}$  and the  $PM_{Reg}$ , i.e.,

$$PM_{PE} = PM_{MAC} + PM_{Reg}.$$
 (4)

#### B. Resource Handler

In the SystemC implementation, the power estimation is performed by the resource handler. The proposed resource handler relies on the total dynamic power consumption of all *PEs*, where the *PEs* consume power depending on the performed MAC operation and the register write operation. These operations are modeled separately and combined to produce a power trace of the whole calculation. We modify and add our *PE* power model and utilize the resource handler proposed in [12] to fit the AI accelerator model. Fig. 3 illustrates how the resource handler generates power traces of the AI accelerator during inference.

In the following sections, we will show how the power traces generated by the resource handler can be used by an adversary to perform power SCA.

## V. THREAT MODEL

The proposed threat model is equivalent to a practical one in which the adversary has physical access to an AI edge device [4]. Regarding the system model, we assume that the adversary has the following capabilities during the attack:

- The adversary has knowledge about the targeted platform or device.
- The adversary has knowledge about the internal structure of the AI accelerator.
- The adversary cannot directly access to or read the secret information (weights).
- The adversary can input any data into the AI accelerator.
- The adversary can observe the device's inference results and obtain power traces of the performed operations.

This scenario can be classified as a grey-box approach [26], where the target of the adversary is to reveal the *PEs*' parameters. These parameters are highly valuable, since they represent the weights of a trained NN.

Fig. 2 shows an overview of the threat model. The weight parameters are pre-loaded to the systolic array for inference.



The information leak is caused by the power trace of the inference calculations, thus the adversary can attack the weights via SCA.

## VI. SCA SIMULATION USING SYSTEMC

The described model of an AI accelerator extended with power estimation capabilities enables the modelling of SCA. Having defined the threat model, we can simulate side-channel attacks targeting the secret parameters of a trained neural network at the ESL. In order to simulate realistic scenarios, the CPA approach is considered as this approach has been proven successful on real hardware [4]. In addition, we revisit Template attacks, which are considered the most objective method to assess the leakage of a device under test [27], [28].

## A. Adversary Simulation in SystemC

The power estimation model of every PE is a combination of the power estimation models of the single operations performed by the PE. Since static power consumption of the device is of no interest for the above mentioned SCAs, the focus lies on dynamic power consumption. The adversary has access to the modeled power trace and thus can perform the attacks as if the hardware was real.

## B. Correlation Power Analysis

CPA-based attacks have been proven successful against hardware cryptographic functions [29]. Compared to less complex power analysis attacks, like SPA or DPA, CPA shows a more robust behavior. To perform a CPA, a leakage model needs to be defined. The most common approach is to calculate the correlation coefficient between power trace and Hamming Distance (HD) or Hamming Weight (HW) estimation of a certain calculation performed by the observed system.

Every  $PE_{ij}$  of the systolic performs a multiply-accumulate operation and stores the result of the operation into a register. An adversary assumes a correlation between the power traces and the HD model of *PEs* registers. To reveal a secret parameter, the adversary calculates the HD estimation  $(\hat{H}_{n,b_k})$ for all possible transitions of the  $Reg_{ij}$  register by

$$\hat{H}_{n,b_k} = HD\left(Reg_{ij}^t, Reg_{ij}^{t+1}\right).$$
(5)

The correlation coefficient  $(\rho(b_k))$  of all estimations and the recorded power traces is calculated as follows:

$$\rho(b_k) = \frac{\sum_{n=0}^{N-1} \left( P_n - \bar{P} \right) \left( \hat{H}_{n,b_k} - \bar{H}_{b_k} \right)}{\sqrt{\sum_{n=0}^{N-1} \left( P_n - \bar{P} \right)^2} \sqrt{\sum_{n=0}^{N-1} \left( \hat{H}_{n,b_k} - \bar{H}_{n,b_k} \right)^2}},$$
(6)

where  $P_n$  and  $\overline{P}$  are the power trace and its average value and  $\hat{H}_{n,b_k}$  and  $\overline{H}_{b_k}$  are the HD estimation and its average value. The true value of the parameter produces the highest correlation; thus the adversary can reveal it by comparing all of the correlation coefficients as follows:

$$\hat{b} = \arg\max_{b_k} \left( |\rho(b_k)| \right). \tag{7}$$

Since the HD model is not unique for all possible transitions, multiple candidates can provide a similar correlation coefficient (values with bit-shift difference from the true value, e.g. 23, 46, 92, etc.). This causes certain constraints when revealing the parameters since the attack produces multiple candidates as shown in Fig. 4. Nevertheless, the attack can reduce the search space drastically.

#### C. Template Attack

Template attacks are a very powerful type of side-channel analysis [30]. As a subset of profiling attacks, template attacks are composed of two phases: profiling and attack phase. In the profiling phase, the adversary profiles data-dependent power consumption and noise behavior of a target device handling sensitive data. Then, the adversary performs the attack in the attacking phase to reveal the sensitive data based on the prior knowledge of the device profile.

In the profiling phase of a template attack, the adversary has full control over a target device and can, e.g., arbitrarily set the weights. This scenario can be easily simulated with our implemented SystemC model.

Having created the templates for individual *PEs*, the adversary can launch the attack by iterating over individual *PEs*. In this attack a small number (10-20) of traces with unknown, but fixed weights leads to a successful recovery.

As the parameters for building the template differ from PE to PE, the adversary cannot re-use the same template to reveal all of the parameters. Nevertheless, the additional effort to reveal all of the parameters is only linked to building the template for each of the *PEs*. The acquired traces can be re-used, thus the adversary doesn't require additional power traces (neither for the profiling, nor for the attack).

#### VII. ATTACK RESULTS AND IMPACT OF ADDITIVE NOISE

Since the model does not consider any measurement noise, the attacker is able to reveal all hidden parameters of the systolic array with the CPA. Fig. 4 shows the simulation results of the attack against the first parameter. The multiple peaks observed are caused by bit-shifted true values. Since the HD of bit-shifted values is the same, these weight candidates cause very similar correlation levels. The CPA of a real computing platform is most certainly influenced by measurement noise,



Fig. 5. Correlation coefficient against additive noise.

TABLE II IMPACT OF ADDITIVE NOISE ON ATTACKS

| Revealed Parameters | Template Attack |                 | <b>Correlation Power Attack</b> |                                |  |
|---------------------|-----------------|-----------------|---------------------------------|--------------------------------|--|
|                     | SNR             | # Attack Traces | SNR                             | <b>Correlation Coefficient</b> |  |
| 9/9                 | $\geq 2.0$      | 15              | >4.0                            | 0.561 - 0.775                  |  |
| 8/9                 | -               | -               | 3.5 - 4.0                       | 0.444 - 0.561                  |  |
| 0/9                 | < 2.0           | -               | <3.5                            | < 0.444                        |  |

therefore, this results should be considered as the best-case scenario (from the attacker's perspective).

The template attack successfully recovers all nine weights from the processing elements with a very low number of attack traces (less than 15 attack traces). Since we assume an adversary in a chosen-plaintext scenario, the attacker can freely decide which inputs are sent to the systolic array. By setting entire input columns to zero, the impact of most processing elements which store the pre-loaded weights is eliminated. This allows the attacker to selectively enable only a small subset (i.e. single columns) of processing elements. Just like with the CPA, the bit-shifted values of the correct weight produce a high score. Therefore, it is possible to have multiple candidates as a result of the attack in the leftmost column. After recovering the weights from the leftmost columns, the attack can build templates including the recovered weights. This reduces the uncertainty when attacking the middle or rightmost PEs, thus bit-shifted values of the correct weights do not produce a false positive.

#### A. Impact of Additive Noise on CPA

In applied cryptography, analysing the impact of additive noise on power attacks is essential [31], [32]. The backbone of such an analysis is Signal-to-noise ratio (SNR) of the leaked information [20]. Here, additive noise is used. The measurement noise is modelled by adding random values  $R_n$  with an average  $\bar{R} = 0$  to the power trace  $P_n$  at each estimation point as  $P_n + R_n$ . It can be gradually increased to have a bigger impact on the power estimation value. Here, we can set a fixed SNR to produce noisy power traces.

With this model, a threshold evaluation of the CPA's success is possible. Multiple experiments with additive noise are performed to investigate the influence of noise on correlation



Fig. 6. Systolic Array Implemented for Verification.

coefficients. A comparison of the correlation with different amounts of additive noise is shown in Table II and illustrated in Fig. 5. The results show how an increasing noise level impacts the correlation coefficient, ultimately making the correct candidate indistinguishable from other candidates. For too low SNRs, the CPA cannot successfully reveal the weights from the AI accelerator. Increasing the number of traces an attacker acquires, increases the chance of a successful attack. This can give an indication to how many traces an attacker would require in a post-silicone attack.

## B. Impact of Additive Noise on Template Attacks

Several experiments have been conducted to study the impact of additive noise on template attacks, where both the profiling and attack traces are affected. SNR is also used to describe the magnitude of the noise. The experiments show that recovering the weights remains as easy as without noise. By increasing the impact of noise, i.e., decreasing the SNR to as low as 2.0, the template attacks proves to be successful with as little as 15 attack traces per targeted parameter.

Consequently, template attacks are applicable with lower SNR values, i.e., the template attacks are much less affected by noise if the same noise level is present during the profiling phase, as well as the attack phase. Template attacks therefore, pose a serious threat to implementations, even in a noisy environment. By taking advantage of input tuning (as a chosen plaintext attack), an adversary could theoretically attack systolic arrays of any size and reveal secret parameters. Here, a more noisy environment requires the attacker to use a larger number of traces when building the template.

#### VIII. MODEL VERIFICATION

It was previously shown [33], that a time-based power estimation with a gate-level netlist comes quite close to postsilicon measurements. To achieve industry-grade results, we also use the Synopsys tool suite for our experiments.

The verification starts with a Verilog implementation of a single PE, as well as the whole  $(3 \times 3)$  systolic array, as shown in 6. The design is synthesized using Synopsys



Fig. 7. Pearson's Correlation Coefficient between Trace Sets.

DesignCompiler [34] to generate a gate-level netlist. A predefined test bench is used to stimulate the netlist and gather a value-change dump (VCD) using Synopsys VCS [35]. Lastly, Synopsys PrimePower [33] creates a power trace based on the VCD in a time-based power analysis. These power traces are considered to be noise-free reference traces of the real hardware. We use these traces to verify the input-dependent power model utilized in our SystemC simulation of a systolic array. Consequently, a statistical comparison between the reference power traces and the power traces collected at the SystemC level is performed. The comparison is divided into four main experiments as follows:

## A. First Experiment

For a single PE with the same random inputs, we generate two sets of power traces (20000 traces) collected at SystemC level and gate-level netlist. Then, we use Pearson's correlation coefficient (PCC) to interpret if there is a linear correlation between them. PCC lies between [-1,+1], with PCC = 0indicating no linear correlation. The results show that there is a positive correlation between the traces, as seen in Fig. 7a. A value of PCC up to 0.65 confirms that the proposed SystemC model is linearly associated with the power consumption tendency of a real hardware implementation.

## B. Second Experiment

For a single PE with two *distinct* sets of random inputs, we generate two sets of power traces (20000 traces) collected at SystemC level and gate-level netlist. The goal of this experiment is to exclude a false-positive correlation for a single PE. Here, we observe a correlation coefficient close to 0, as shown in Fig. 7b. This confirms there is no false positive correlation between the power traces.

## C. Third Experiment

The design of the whole systolic array is more complex. A test bench with full coverage of all possible input/weight

combinations for all PEs would produce an enormous amount of traces. Therefore, we fixed the weights in the PEs and stimulated the systolic array by 20000 random input samples. An equivalent test bench is implemented in SystemC to produce comparable traces. As we expected, the two trace sets will be linearly correlated the most when modeling smaller pieces of hardware. Naturally, modelling bigger hardware at a high abstraction level will bring a drop in accuracy, and PCC will be lower, as shown in the in Fig. 7c. Therefore, Spearman's correlation coefficient (SCC) is used in this experiment to interpret the direction of the association between them. The sign of the SCC value indicates if the same trends are expected between the two trace sets. When evaluating SCC between the two sets, we observe a positive SCC coefficient of +0.27. This indicates a moderate monotonic (linear or non-linear) relationship between them. In other words, SCC shows that power traces collected at SystemC level tend to increase when reference power traces increase.

## D. Fourth Experiment

Similar to the second experiment, we aim to exclude a falsepositive correlation result between the traces of the whole systolic array. With two distinct sets of random inputs, we observe both PCC and SCC close to 0, as shown in Fig. 7d. This indicates there is no false positive in the power traces collected at SystemC level.

In conclusion, it can be said with high confidence that the proposed power estimation model follows the same trends as a state-of-the-art netlist-level power estimation.

#### IX. CONCLUSION

This paper presents power side-channel attacks against AI accelerator architectures at the electronic system level. Our approach features AI accelerator models with a corresponding dynamic power-consumption model to simulate the behaviour of systolic-array-based AI accelerators using SystemC. Our findings show that SystemC-based power attacks are possible and sufficiently resemble real-world threat scenarios. Our experiments successfully simulate SystemC-based power side-channel attacks against AI accelerators leading to full secret extraction: While correlation power analysis shows certain limitations in noisy conditions, template attacks pose a significant risk of being able to adapt to noise.

To verify the SystemC-power estimation model, several experiments were performed to compare power traces computed from synthesized netlists with the proposed model. The results show that the proposed model follows the same trends as a gate-level netlist power estimation. Our set goal of earliestpossible threat analysis and subsequent design suggestions was thus successfully achieved and demonstrated.

This work hence is one essential – and with regard to the presented methods and procedures to the best of our knowledge first – step in design-space exploration for security from a design/hardware perspective. In a future step and raising complexity, we would like to extend this approach from a systolic array to a full system model.

#### REFERENCES

- [1] L. Batina, S. Bhasin, D. Jap, and S. Picek, "CSI NN: Reverse Engineering of Neural Network Architectures through Electromagnetic Side Channel," in *Proceedings of the 28th USENIX Conference on Security Symposium*, ser. SEC'19. USA: USENIX Association, 2019, pp. 515– 532.
- [2] H. Chabanne, J.-L. Danger, L. Guiga, and U. Kühne, "Side channel attacks for architecture extraction of neural networks," *CAAI Transactions* on *Intelligence Technology*, vol. 6, no. 1, pp. 3–16, 2021.
- [3] Y.-S. Won, S. Chatterjee, D. Jap, A. Basu, and S. Bhasin, "WaC: First Results on Practical Side-Channel Attacks on Commercial Machine Learning Accelerator," in *Proceedings of the 5th Workshop on Attacks* and Solutions in Hardware Security, ser. ASHES '21. New York, NY, USA: Association for Computing Machinery, 2021, p. 111–114. [Online]. Available: https://doi.org/10.1145/3474376.3487284
- [4] K. Yoshida, M. Shiozaki, S. Okura, T. Kubota, and T. Fujino, "Model Reverse-Engineering Attack against Systolic-Array-Based DNN Accelerator Using Correlation Power Analysis," *IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences*, vol. E104.A, no. 1, pp. 152–161, 2021.
- [5] A. Gerstlauer, C. Haubelt, A. D. Pimentel, T. P. Stefanov, D. D. Gajski, and J. Teich, "Electronic system-level synthesis methodologies," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 28, no. 10, pp. 1517–1530, 2009.
- [6] H. Salmani and M. Tehranipoor, "Analyzing circuit vulnerability to hardware trojan insertion at the behavioral level," in 2013 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), 2013, pp. 190–195.
- [7] K. Xiao, A. Nahiyan, and M. Tehranipoor, "Security rule checking in ic design," *Computer*, vol. 49, no. 8, pp. 54–61, 2016.
- [8] M. Hassan, V. Herdt, H. M. Le, D. Große, and R. Drechsler, "Early soc security validation by vp-based static information flow analysis," in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2017, pp. 400–407.
- [9] M. Goli, M. Hassan, D. Große, and R. Drechsler, "Security validation of vp-based socs using dynamic information flow tracking," *it Information Technology*, vol. 61, no. 1, pp. 45–58, 2019. [Online]. Available: https://doi.org/10.1515/itit-2018-0027
- [10] N. Veshchikov and S. Guilley, "Use of simulators for side-channel analysis," in 2017 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2017, pp. 104–112.
- [11] B. Bauer, M. Ayache, S. Mulhem, M. Nitzan, J. Athavale, R. Buchty, and M. Berekovic, "On the dependability lifecycle of electrical/electronic product development: The dual-cone v-model," *Computer*, vol. 55, no. 09, pp. 99–106, sep 2022.
- [12] J. Treus and P. Herber, "Early analysis of security threats by modeling and simulating power attacks in systemc," in 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), 2020, pp. 1–5.
- [13] R. Drechsler, "Advanced formal verification," 01 2004.
- [14] B.-A. Tabacaru, M. Chaari, W. Ecker, T. Kruse, and C. Novello, "Fault-effect analysis on system-level hardware modeling using virtual prototypes," in 2016 Forum on Specification and Design Languages (FDL), 2016, pp. 1–7.
- [15] G. Onnebrink, S. Schürmans, F. Walbroel, R. Leupers, G. Ascheid, X. Chen, and Y. Harn, "Black box power estimation for digital signal processors using virtual platforms," in *Proceedings of the* 2016 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, ser. RAPIDO '16. New York, NY, USA: Association for Computing Machinery, 2016. [Online]. Available: https://doi.org/10.1145/2852339.2852345
- [16] F. Menichelli, R. Menicocci, M. Olivieri, and A. Trifiletti, "High-level side-channel attack modeling and simulation for security-critical systems on chips," *IEEE Transactions on Dependable and Secure Computing*, vol. 5, no. 3, pp. 164–176, 2008.
- [17] J. Teich, "Embedded System Synthesis and Optimization," 2000.
- [18] R. Leupers, G. Martin, R. Plyaskin, A. Herkersdorf, F. Schirrmeister, T. Kogel, and M. Vaupel, "Virtual platforms: Breaking new grounds," in 2012 Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pp. 685–690.
- [19] A. Dubey, R. Cammarota, and A. Aysu, "MaskedNet: The First Hardware Inference Engine Aiming Power Side-Channel Protection," in 2020 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). Los Alamitos, CA, USA: IEEE

Computer Society, dec 2020, pp. 197–208. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/HOST45689.2020.9300276

- [20] A. Biryukov, D. Dinu, and J. Großschädl, "Correlation Power Analysis of Lightweight Block Ciphers: From Theory to Practice," in *Applied Cryptography and Network Security*, M. Manulis, A.-R. Sadeghi, and S. Schneider, Eds. Cham: Springer International Publishing, 2016, pp. 537–557.
- [21] D. Harris and S. Harris, *Digital design and computer architecture*. Morgan Kaufmann, 2010.
- [22] B. Jacob, S. W. Ng, and D. T. Wang, "Chapter 29 power and leakage," in *Memory Systems*, B. Jacob, S. W. Ng, and D. T. Wang, Eds. San Francisco: Morgan Kaufmann, 2008, pp. 847–864. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ B978012379751350031X
- [23] R. Soares, V. Lima, R. Lellis, P. Finkenauer Jr, and V. Camargo, "Hardware countermeasures against power analysis attacks: a survey from past to present," *Journal of Integrated Circuits and Systems*, vol. 16, no. 2, pp. 1–12, 2021.
- [24] S. A. A. Shah, J. Wagner, T. Schuster, and M. Berekovic, "A lightweightsystem-level power and area estimation methodology for application specific instruction set processors," in 2014 24th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2014, pp. 1–5.
- [25] Kung, "Why systolic architectures?" Computer, vol. 15, no. 1, pp. 37–46, 1982.
- [26] Y. Xiang, Y. Xu, Y. Li, W. Ma, Q. Xuan, and Y. Liu, "Side-Channel Gray-Box Attack for DNNs," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 68, no. 1, pp. 501–505, 2021.
- [27] F. Durvaux, F. Standaert, and N. Veyrat-Charvillon, "How to certify the leakage of a chip?" in Advances in Cryptology - EUROCRYPT 2014 -33rd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Copenhagen, Denmark, May 11-15, 2014. Proceedings, ser. Lecture Notes in Computer Science, P. Q. Nguyen and E. Oswald, Eds., vol. 8441. Springer, 2014, pp. 459–476. [Online]. Available: https://doi.org/10.1007/978-3-642-55220-5\_26
- [28] O. Bronchain, J. M. Hendrickx, C. Massart, A. Olshevsky, and F. Standaert, "Leakage certification revisited: Bounding model errors in side-channel security evaluations," in Advances in Cryptology -CRYPTO 2019 - 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18-22, 2019, Proceedings, Part I, ser. Lecture Notes in Computer Science, A. Boldyreva and D. Micciancio, Eds., vol. 11692. Springer, 2019, pp. 713–737. [Online]. Available: https://doi.org/10.1007/978-3-030-26948-7\_25
- [29] E. Brier, C. Clavier, and F. Olivier, "Correlation Power Analysis with a Leakage Model," in *Cryptographic Hardware and Embedded Systems* - *CHES 2004*, M. Joye and J.-J. Quisquater, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, pp. 16–29.
- [30] S. Chari, J. R. Rao, and P. Rohatgi, "Template attacks," in *Cryptographic Hardware and Embedded Systems CHES 2002*, B. S. Kaliski, ç. K. Koç, and C. Paar, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003, pp. 13–28.
- [31] L. Lerman, N. Veshchikov, S. Picek, and O. Markowitch, "On the construction of side-channel attack resilient s-boxes," in *International Workshop on Constructive Side-Channel Analysis and Secure Design*. Springer, 2017, pp. 102–119.
- [32] F.-X. Standaert, T. G. Malkin, and M. Yung, "A unified framework for the analysis of side-channel key recovery attacks," in *Annual international conference on the theory and applications of cryptographic techniques*. Springer, 2009, pp. 443–461.
- [33] "Synopsys PrimePower," https://www.synopsys.com/ implementation-and-signoff/signoff/primepower.html, accessed: 2023-05-17.
- [34] "Synopsys DesignCompiler," https://www.synopsys.com/ implementation-and-signoff/rtl-synthesis-test/dc-ultra.html, accessed: 2023-05-17.
- [35] "Synopsys VCS," https://www.synopsys.com/verification/simulation/vcs. html, accessed: 2023-05-17.