Keywords

1 Introduction

Side channel attacks (SCA) [1] are built on the fact that cryptographic algorithms are implemented on a physical device such as FPGA, microcontroller, or ASIC. It can use all types of physical manifestations of the device such as current consumption, electromagnetic radiation, temperature variation, or time variations during the execution of different instructions. These side channels may leak information about secret data. Differential power attack (DPA) is a well-known and thoroughly studied threat for implementations of block ciphers, i.e., DES & AES, and public key algorithms, like RSA. It was introduced by Kocher [2]. In DPA an attacker generates a set of hypotheses (about some secret value or a partial key) and tries to identify the (unique) true hypothesis by finding the highest correlation between power consumption and changes in its internal states during the execution of the algorithm. The true hypothesis provides actual secret information which can be exploited by the attacker. Various other attacks like template attack [3], mutual information attack and fault attack [4] are also discussed in the literature, but DPA is a more generic and device-independent attack. DPA on stream ciphers [5] is a relatively less explored domain due to its complexity and structure. Stream cipher consists of two phases—(i) key scheduling algorithm (KSA): It takes initialization vector (IV) & key (K) to initialize the internal state of algorithm and then states are repeatedly changed for a predefined number of times. (ii) key sequence generation: The state is repeatedly updated with the clock and is used to generate the required number of key stream bits. In the case of Trivium the theoretical models of DPA discussed in the existing literature either require more than 10,000 of power traces or are not sufficient(valid for ≤90 nm technology-based FPGA) to retrieve the secret key. Therefore, we introduce a new technique which is based on the difference of correlation traces to retrieve the key in less number of power traces. This is the general approach and is valid for all stream ciphers. It consists of three steps and each gives partial information about secret key. In case of Trivium, our attack methodology targets the initialization phase. We using Pearson’s correlation coefficient for correlating the power consumption with hypothetical hamming distance during the execution of algorithm, to recover the actual key of stream cipher.

2 H/W Implementation and Collection of Power Traces

The Trivium algorithm is a hardware-efficient, synchronous stream cipher designed by De Canniere and Preneel [6]. The cipher makes use of 80-bit key and 80-bit IV; its secret state has 288 bits, consisting of three interconnected nonlinear feedback shift registers (NFSR) of length 93, 84, and 111 bits, respectively. The cipher operation consists of two phases: KSA and the key stream generation phase. During KSA shift registers are initialized with the Key & IV and then the algorithm is run for 4 × 288 steps of the clocking to increase nonlinearity. After that the key stream is generated sequentially. Three bits are computed using nonlinear functions (T 1, T 2 and T 3) in each clock cycle and fed to the NFSRs. The cipher output is generated by xoring the plain text and key stream. NFSRs are a combination of sequential and combinatorial logic. The architectural view of Trivium is shown in Fig. 1.

Fig. 1
figure 1

Trivium architecture

Initialization of Trivium

$$ \begin{array}{*{20}l} {{\text{NFSR}}1: \left( {S_{92} , \, \ldots \, . , \, S_{0} } \right)} \hfill & { \leftarrow \left( {0, \, \ldots \, , \, 0, \, K_{79} , \, \ldots \, ,K_{0} } \right)} \hfill & { \ldots {\text{length }} - { 93}} \hfill \\ {{\text{NFSR}}2: \left( {S_{176} , \, \ldots \, , \, S_{93} } \right)} \hfill & { \leftarrow \left( {0, \, 0, \, 0, \, 0,{\text{ IV}}_{79} , \, \ldots \, ,{\text{ IV}}_{0} } \right)} \hfill & { \ldots {\text{length }} - { 84}} \hfill \\ {{\text{NFSR}}3: \left( {S_{287} , \, \ldots , \, S_{177} } \right)} \hfill & { \leftarrow \left( { 1,{ 1},{ 1}, \, 0, \, \ldots , \, 0} \right)} \hfill & { \ldots {\text{length }} - { 111}} \hfill \\ \end{array} $$
(1)

Nonlinear Function Calculation

$$ \begin{aligned} & {\text{NF-}}1:T_{1} \leftarrow S_{65} \,^{ \wedge } \,S_{92} \,^{ \wedge } \,\left( {S_{90} \,\& \,S_{91} } \right)\,^{ \wedge } \,S_{170} \\ & {\text{NF-}}2:T_{2} \leftarrow S_{161} \,^{ \wedge } \,S_{176} \,^{ \wedge } \,\left( {S_{174} \,\& \,S_{175} } \right)\,^{ \wedge } \,S_{263} \\ & {\text{NF-}}3:T_{3} \leftarrow S_{242} \,^{ \wedge } \,S_{287} \,^{ \wedge } \,\left( {S_{285} \,\& \,S_{286} } \right)\,^{ \wedge } \,S_{68} \\ \end{aligned} $$
(2)

After 4 × 288 clock cycle the key stream generation will start:

$$ t_{1} = S_{65} \,^{ \wedge } \,S_{92} ;\quad t_{2} = S_{161} \,^{ \wedge } \,S_{176} ;\quad t_{3} = S_{242} \,^{ \wedge } \,S_{287} ;\quad Z_{i} = t_{1} \,^{ \wedge } \,t_{2} \,^{ \wedge } \,t_{3} $$
(3)

Algorithm was implemented on Xilinx FPGA-Spartan 3E XC3S1600E-4FG320C platform using Verilog. Due to the simplistic nature of the algorithm there is less possibility of different professional implementations for FPGA-based platform (except parallel implementation and countermeasure incorporation). Therefore the attack methodology will remain the same.

Figure 2a, b represents the setup for the acquisition of power traces. Figure 2a shows the detail of signal flow as well as the point of collection of power traces on board while Fig. 2b represents the pictorial view of lab setup. Basically, power traces consist of all instantaneous power consumed by the device during the execution of crypto algorithm and it can be measured with the help of current probe or differential voltage probe. Two boards are used for generation of control signals and execution of actual algorithm. The power traces are captured during the execution of algorithm inside FPGA at 4 MHz. Figure 3 shows the instantaneous power consumption during execution of the algorithm for random key and IV. Collection of power trace approach is mentioned below:

Fig. 2
figure 2

a Signal flow diagram. b Lab setup

Fig. 3
figure 3

Power trace of Trivium

  1. i.

    Generation of random IVs in FPGA using RNG: In Trivium, the initialization vector (IV0–IV79) changes at every resynchronization and key bits K 0K 79 are kept constant. So keeping this fact into mind, only the contents of NFSR-2 will change at every resynchronization. Change of IV should be random to avoid other mathematical attacks. To generate random IVs we have implemented 20 stages LFSR with GF2 primitive polynomial x 20 + x 17 + x 9 + x 7 + 1 as a feedback function.

  2. ii.

    Reinitialized the algorithm with the same key and random IVs at every 1.5 s and captured power traces by generating trigger signal. Duration of 1.5 s has been taken to ensure that generated power traces are stored properly.

  3. iii.

    Oscilloscope settings for capturing of power traces are sampling rate—250 MS/s; total number of power traces—5000; number of sample pt/trace—5100; used 20 MHz inbuilt oscilloscope filter;

3 Attack Model and Analysis

FPGA is a CMOS-based device. Whenever the device changes its states, it consumes some amount of power. Power consumption consists of the following two parts:

  1. (a)

    Static power consumption: Static power is required to maintain the CMOS state intact and it is very less.

  2. (b)

    Dynamic power consumption: During the switching, i.e., CMOS state changes from 1→0 or 0→1, FPGA consumes power, and estimation of power consumption is given as

$$ {\text{Power}} = V_{\text{DD}}^{\quad 2} \,*\,C\,*\,f\,*\,P_{0 \leftrightarrow 1} $$
\( C \) :

Capacitance of device

\( F \) :

Frequency of operation

\( P_{0 \leftrightarrow 1} \) :

Number of CMOS state changing from 1→0 or 0→1

\( V_{\text{DD}} \) :

Supply voltage

Dynamic power consumption plays an important role in the building of Attack Model and it is large compared to static power consumption.

In case of Trivium the dynamic power consumption occurs at the edge of clock (state of all NFSR’s changes with the clock), so total power consumption is

$$ P_{\text{total}} = \, P_{\text{NFSR1}} + \, P_{{{\text{NFSR}}2}} + \, P_{{{\text{NFSR}}3}} + \, \varOmega $$

P NFSR1, P NFSR2 and P NFSR3 are the power consumption of NFRS1, NFSR2, and NFSR3 respectively. Ω is noise content and P total is the total power consumption. Proposed DPA attack framework for Trivium consists of two important steps. First, we model power consumption in such a way that it involves secret parameters to be determined. Once this has been done a suitable efficient methodology is worked out for extraction of secret parameters. Detailed description is as follows.

3.1 Attack Model

Model is based on correlation of power traces \( t_{d,j} \) and corresponding hamming distances \( h_{d,i } \) of Trivium NFSR’s previous and present states. NFSR’s stages are changed with respect to the rising edge of clock and feedback is calculated on the basis of nonlinear functions (T 1, T 2 and T 3). Previous state: State of NFSRs at ‘t’ clock cycle. Present states: States of NFSRs at ‘t + 1’ clock cycle. Analysis is carried out using correlation between hamming distance and power consumed by device. Pearson’s coefficient r i,j (where h i is vector of hamming distances corresponding to different IVs when ‘i’ is assumed key bit’s value and x j is vector of jth points on all power traces) can be defined as

Given \( x_{d } = \left( {x_{d,1} ,_{{}} x_{d,2} ,_{{}} x_{d,3} ,_{ \ldots } ,x_{d,5100} } \right) \) where d = 1 to N (dth power trace)

$$ r_{i,j} = \frac{{\sum\nolimits_{d = 1}^{N} {\left( {h_{d,i} - \bar{H}_{i} } \right) \cdot \left( {x_{d,j} - \bar{x}_{j} } \right)} }}{{\sqrt {\sum\nolimits_{d = 1}^{N} {\left( {h_{d,i} - \bar{H}_{i} } \right)^{2} \cdot \sum\nolimits_{d = 1}^{N} {\left( {x_{d,j} - \bar{X}_{j} } \right)^{2} } } } }} $$
(4)
\( N \) :

Number of power traces (5000 used in our experimentation)

\( i \) :

Assumed key bit/Eq. value

\( j \) :

Position of sample pt within a power trace

h d,i :

Hamming distance between previous and present states of NFSRs for dth trace as per model for giver ‘i

\( \bar{H}_{i} \) :

\( \sum\nolimits_{d = 1}^{N} {h_{d,i} /N} \) (Mean of Hamming distances for given value of ‘i’)

X d,j :

Value of dth Power trace at jth position

\( \bar{X}_{j} \) :

\( \sum\nolimits_{d = 1}^{N} {x_{d,j} /N} \) (Mean value of all power trace at jth position)

\( r_{i } = \left( {r_{i,1} ,_{{}} r_{i,2} ,_{{}} r_{i,3} ,_{ \ldots } ,r_{i,5100} } \right) \) where i = 1 or 2 bit value (correlation trace)

  • Case 1 If ‘\( i \)’ is 1 bit value (‘0’ or ‘1’), two possible combinations of correlation traces are required to be computed, i.e., r 0 and r 1.

  • Case 2 If ‘\( i \)’ is 2 bit value (‘00’, ‘01’, ‘10’ or ‘11’), four possible combinations of correlation traces are required to be computed r 0, r 1, r 2 and r 3.

During the execution of Trivium the power consumption mainly depends on the changes in the NFSRs stages. In the whole experimentation we intend to focus on 1 or 2 specific stages among 288 stages of NFSRs involving targeted key bits. The calculation of correlation between total power consumption and the changes in fewer stages (assumed 1 or 2 stages and known stages) is revealing less information, because total power consumption is due to the change in all sequential elements of NFSRs. This increases the number of power traces required for extraction of correct key. It can be verified with Figs. 4a and 5a that correlation traces based on all assumed values are similar to each other. By using the difference of correlation trace technique we can remove the biasness of existing traces and compute the correct values among the assumed values.

Fig. 4
figure 4

a Correlation traces for Eq. 1 (assumed Eq. value ‘0’ and ‘1’). b Difference of correlation traces for Eq. 1. The value for Eq. 1 is ‘1’

Fig. 5
figure 5

Correlation traces for Eq. 67 (assumed Eq. values ‘00’, ‘01’, ‘10’ and ‘11’). b Difference of correlation traces for Eq. 67 and the value for Eq. 67 is ‘00’

3.2 Attack Methodology

In Trivium algorithm, NFSRs to be considered or not for DPA is decided by the presence of IV contents in NFSR’s stages. During the execution of Trivium algorithm, the stages of NFSRs may contain information about constant bits, IV bits, key bits, or combination of IV & key bits. But the stages containing only constant bits or key bits cannot be used to mount the attack because interaction of key bits with constant data bits will remain the same over different initializations. If the content of NFSRs stage is a function of key bit/bits and IV bit/bits, then it can leak the side channel information which can be used for mounting attack. This is the basic philosophy behind attack strategy. In our proposed attack determination of key bits is divided into three steps:

Step 1—Extraction of single key bit

Power consumption during initial 12 clock cycles (after initialization of algorithm) leaks information suitable for determining K 66K 55 bits sequentially. After the initialization of three NFSRs with key and IV the nonlinear functions are:

$$ \begin{array}{*{20}l} {T_{1} = K_{66} \,^{ \wedge } \,S_{92} \,^{ \wedge } \,\left( {S_{91} \,\& \,S_{90} } \right)\,^{ \wedge } \,{\text{IV}}_{78} } \hfill & { = K_{66} \,^{ \wedge } \,{\text{IV}}_{78} } \hfill & {({\text{combination}}\,{\text{of}}\,{\text{key}},\,{\text{const}}\,{\text{and}}\,{\text{IV}}\,{\text{bits}})} \hfill \\ {T_{2} = {\text{IV}}_{69} \,^{ \wedge } \,S_{176} \,^{ \wedge } \,\left( {S_{174} \,\& \,S_{175} } \right)\,^{ \wedge } \,S_{263} } \hfill & { = IV_{69} } \hfill & {\,\,({\text{combination}}\,{\text{of}}\,{\text{IV}}\,{\text{and}}\,{\text{constant}}\,{\text{bits}})} \hfill \\ {T_{3} = S_{242} \,^{ \wedge } \,S_{287} \,^{ \wedge } \,\left( {S_{285} \,\& \,S_{286} } \right)\,^{ \wedge } \,K_{69} } \hfill & { = K_{69} } \hfill & {({\text{combination}}\,{\text{of}}\,{\text{constant}}\,{\text{bits}}\,{\text{and}}\,{\text{key}}\,{\text{bit}})} \hfill \\ \end{array} $$

At first clock states of the NFSRs are changed in the following manner:

$$ \begin{array}{*{20}l} {{\text{NFSR}}1:} \hfill & {\left( {S_{92} ,\, \ldots .\,,\,S_{0} } \right)} \hfill & \leftarrow \hfill & {\left( {S_{91} ,\, \ldots \,,S_{0} ,T_{3} } \right)} \hfill & {\left( {{\text{all}}\,{\text{the}}\,{\text{stages}}\,{\text{are}}\,{\text{const}}\,{\text{and}}\,{\text{unknown}}} \right)} \hfill \\ {{\text{NFSR}}2:} \hfill & {\left( {S_{176} ,\, \ldots .\,,\,S_{93} } \right)} \hfill & \leftarrow \hfill & {\left( {S_{175} ,\, \ldots \,,S_{93} ,T_{1} } \right)} \hfill & {\left( {{\text{all}}\,{\text{the}}\,{\text{stage}}^{ ,} {\text{s}}\,{\text{contents}}\,{\text{are}}\,{\text{known}}} \right)} \hfill \\ {{\text{NFSR}}3:} \hfill & {\left( {S_{287} ,\, \ldots .\,,\,S_{177} } \right)} \hfill & \leftarrow \hfill & {\left( {S_{286} ,\, \ldots \,,S_{177} ,T_{2} } \right)} \hfill & {\left( {{\text{all}}\,{\text{the}}\,{\text{stage}}^{ ,} {\text{s}}\,{\text{contents}}\,{\text{are}}\,{\text{known}}} \right)} \hfill \\ \end{array} $$

Hamming distances for three shift registers will be given by

$$ \begin{array}{*{20}l} {\{ S_{92} (t)\,^{ \wedge } \,S_{92} (t - 1),S_{91} (t)\,^{ \wedge } \,S_{91} (t - 1),\, \ldots ,\,S_{0} (t - 1)\,^{ \wedge } \,T_{3} \} } \hfill & { \ldots {\text{for}}\,{\text{NFRS}}1} \hfill \\ {\{ S_{176} (t)\,^{ \wedge } \,S_{176} (t - 1),S_{175} (t)\,^{ \wedge } \,S_{175} (t - 1),\, \ldots ,\,S_{93} (t - 1)\,^{ \wedge } \,T_{1} \} } \hfill & { \ldots {\text{for}}\,{\text{NFRS2}}} \hfill \\ {\{ S_{287} (t)\,^{ \wedge } \,S_{287} (t - 1),S_{286} (t)\,^{ \wedge } \,S_{286} (t - 1),\, \ldots,\,S_{177} (t - 1)\,^{ \wedge } \,T_{2} \} } \hfill & { \ldots {\text{for}}\,{\text{NFRS3}}} \hfill \\ \end{array} $$

S k (t − 1) represents the value of kth stage at t − 1 time instance and S k (t) will denote the value of the same stage of NFSR after one clock cycle. As we can see from the expression of T 3 it is the function of key bit (K 69) only. Hence, it is not affected by the change of IV at every reinitialization of Trivium. Its value remains same at every reinitialization for the same key and is fed to NFSR1. It shows that all the states of NFSR1 are constant and are not used to mount the attack. But T 1 and T 2 values keep changing along with the change in IV. So, only NFSR2 and NFSR3 are used to mount DPA. There is only single key bit (K 66) involved in computation of T 1. Therefore, corresponding to two different assumed values of K 66, i.e., 0 and 1, the value of T 1 will be given by ‘0^IV78’ & ‘1^IV78’. The values of Pearson’s correlation coefficient corresponding to both assumed values are used to determine actual key bit value. The process of computation of value of key bit is mentioned in Algorithm 1.

Correlation curves for assumed equation values (‘0’ and ‘1’) are shown in Fig. 4a. These curves have been drawn using 5000 power traces based on Pearson’s correlation coefficient. As we can see in Fig. 4a both correlation traces are similar to each other. During the experimentation it was observed that the peaks in correlation curves were always in negative direction and its position remained the same even when attack was mounted for initial 12 key bits, which indicates that the conclusion about actual key bit is not correct. In our technique, we cancel out the common portion and trying to figure out the instance where difference is maximum in the correlation traces. The instances of maximum difference should be shifted toward the right side on the X-axis because the key bits are determined sequentially. Figure 4b shows the difference of correlation curves (r 0 – r 1). The direction of the peak value defines the value of key bit. If the peak is above X-axis, it implies that key bit is ‘1’ otherwise key bit is ‘0’. This strategy is used to determine 12 key bits, i.e., from K 64 to K 53 at different clock cycle. This is done sequentially using previously computed key bits.

Step 2—Extraction of equation value (consists of multi key bits)

After completion of 12th clock cycle nonlinear functions are:

$$ \begin{array}{*{20}l} {T_{1} = K_{54} \,^{ \wedge } \,S_{80} \,^{ \wedge } \,\left( {K_{79} \,\& \,K_{80} } \right)\,^{ \wedge } \,{\text{IV}}_{66} } \hfill & { = K_{54} \,^{ \wedge } \,\left( {K_{79} \,\& \,K_{80} } \right)\,^{ \wedge } \,{\text{IV}}_{66} } \hfill \\ {T_{2} = {\text{IV}}_{57} \,^{ \wedge } \,{\text{IV}}_{72} \,^{ \wedge } \,\left( {{\text{IV}}_{70} \,\& \,{\text{IV}}_{71} } \right)\,^{ \wedge } \,S_{251} } \hfill & { = {\text{IV}}_{57} \,^{ \wedge } \,{\text{IV}}_{72} \,^{ \wedge } \,\left( {E_{70} \,\& \,{\text{IV}}_{71} } \right)} \hfill \\ {T_{3} = S_{230} \,^{ \wedge } \,S_{275} \,^{ \wedge } \,\left( {S_{273} \,\& \,S_{274} } \right)\,^{ \wedge } \,K_{57} } \hfill & { = K_{57} } \hfill \\ \end{array} $$

Power consumption during 13th to 66th clock cycles (after initialization of algorithm) leaks information about the combination of key bits. Therefore, our methodology for extraction of key bits needs to be changed. It is evident from the expression of T 3 (the feedback for NFSR1) is independent of IV during these clock cycles as shown in Step1. Whereas both T 1 & T 2 are IV dependent, only NFSR2 and NFSR3 are used to mount DPA. As T1 is a combination of multiple key bits, constant bit and IV bit, therefore, our attack proceeds by assuming the values corresponding to the combination of key bits.

$$ \begin{aligned} & {\text{Case-}}1\,K_{54} \,^{ \wedge } \,\left( {K_{79} \,\& \,K_{80} } \right) = \,^{\prime \prime } 0^{\prime \prime } \\ & {\text{Case-2}}\,K_{54} \,^{ \wedge } \,\left( {K_{79} \,\& \,K_{80} } \right) = \,^{\prime \prime } 1^{\prime \prime } \\ \end{aligned} $$
(5)

Therefore, T 1 will be computed as ‘0 ^ IV66’ and ‘1^ IV66’ and corresponding correlation trace for both assumed values r 0 and r 1. The process of computation of equation values in the form of 1’s and 0’s is mentioned in Algorithm 2.

The information about combination of key bits for clock cycle 13th to 66th can be easily retrieved in a similar manner, as shown in Fig. 4a, b.

Step 3—Extraction of 2 bits

After completion of 66th clock cycle nonlinear functions are:

$$ \begin{array}{*{20}l} {T_{1} = K_{69} \,^{ \wedge } \,K_{27} \,^{ \wedge } \,\left( {K_{25} \,\& \,K_{26} } \right)\,^{ \wedge } \,{\text{IV}}_{12} } \hfill & { = K_{69} \,^{ \wedge } \,K_{27} \,^{ \wedge } \,\left( {K_{25} \,\& \,K_{26} } \right)\,^{ \wedge } \,{\text{IV}}_{12} } \hfill \\ {T_{2} = {\text{IV}}_{3} \,^{ \wedge } \,{\text{IV}}_{18} \,^{ \wedge } \,\left( {{\text{IV}}_{16} \,\& \,{\text{IV}}_{17} } \right)\,^{ \wedge } \,S_{197} } \hfill & { = {\text{IV}}_{3} \,^{ \wedge } \,{\text{IV}}_{18} \,^{ \wedge } \,\left( {{\text{IV}}_{16} \,\& \,{\text{IV}}_{17} } \right)} \hfill \\ {T_{3} = {\text{IV}}_{69} \,^{ \wedge } \,S_{221} \,^{ \wedge } \,\left( {S_{219} \,\& \,S_{220} } \right)\,^{ \wedge } \,K_{3} } \hfill & { = {\text{IV}}_{69} \,^{ \wedge } \,K_{3} } \hfill \\ \end{array} $$

The structure of all three nonlinear functions evidence that power consumption during 67th–78th clock cycles (after initialization of algorithm) leaks two types of information, combination of key bit value in an equation form and single key bit value. We can observe that all nonlinear functions are dependent on IVs. Therefore, all three NFSRs (NFSR2 and NFSR3 completely and NFSR1 partially) are used in the calculation of hamming distance. T 3 contains single bit of key, so as to give 1 bit of key information. T 1 is the combination of key bits so it gives the multiple key bits information in equation form. Four combinations are possible:

$$ \begin{array}{*{20}l} {{\text{Case}}{\text{-}}1} \hfill & {\,^{\prime \prime } K_{69} \,^{ \wedge } \,K_{27} \,^{ \wedge } \,\left( {K_{25} \,\& \,K_{26} } \right)^{\prime \prime } } \hfill & {\text{and}} \hfill & {\,^{\prime \prime } K_{3}^{\prime \prime } = \,^{\prime \prime } 00^{\prime \prime } } \hfill \\ {{\text{Case}}{\text{-}} 2} \hfill & {\,^{\prime \prime } K_{69} \,^{ \wedge } \,K_{27} \,^{ \wedge } \,\left( {K_{25} \,\& \,K_{26} } \right)^{\prime \prime } } \hfill & {\text{and}} \hfill & {\,^{\prime \prime } K_{3}^{\prime \prime } = \,^{\prime \prime } 01^{\prime \prime } } \hfill \\ {{\text{Case}}{\text{-}} 3} \hfill & {\,^{\prime \prime } K_{69} \,^{ \wedge } \,K_{27} \,^{ \wedge } \,\left( {K_{25} \,\& \,K_{26} } \right)^{\prime \prime } } \hfill & {\text{and}} \hfill & {\,^{\prime \prime } K_{3}^{\prime \prime } = \,^{\prime \prime } 10^{\prime \prime } } \hfill \\ {{\text{Case}} {\text{-}} 4} \hfill & {\,^{\prime \prime } K_{69} \,^{ \wedge } \,K_{27} \,^{ \wedge } \,\left( {K_{25} \,\& \,K_{26} } \right)^{\prime \prime } } \hfill & {\text{and}} \hfill & {\,^{\prime \prime } K_{3}^{\prime \prime } = \,^{\prime \prime } 11^{\prime \prime } } \hfill \\ \end{array} $$
(6)

T 3 is used in the calculation of hamming distance and the previous T3 value is also required for this purpose. The previous T 3 (after 65th clock cycle) is

$$ T_{3} = \underbrace {{S_{117} \,^{ \wedge } \,S_{222} \,^{ \wedge } \,\left( {S_{220} \,\& \,S_{221} } \right)}}_{{\,^{\prime } 0^{\prime } }}\,^{ \wedge } \,\underbrace {{K_{4} }}_{{{\text{assume}}\,\,^{\prime } 0^{\prime } \,{\text{or}}\,\,^{\prime } 1^{\prime } }} $$
(7)

After 66th clock cycle, K 4 is the first bit of the NFSR1 and is either assumed to be ‘0’ or ‘1’. In the next clock NFSR1 value becomes IV69 ^ K 3. At that instance, the hamming distance between the previous state and the present state of NFSR2, NFSR3, and first bit of NFSR1 is computed by assuming all four combinations of “K 69 ^ K 27 ^ (K 25 & K 26)” and “K 3”. The correlation traces values are computed for all four combinations (r 0, r 1, r 2 and r 3) using power traces. Multiple key bits information in equation form and 1 bit key can be computed using Algorithm 3.

All four correlation curves of assumed 2 bits equation values (‘00’, ‘01’, ‘10’ and ‘11’) are shown in Fig. 5a. The differences of correlation curves for all combinations of Eq. 67 are shown in Fig. 5b. and the direction of the peak value also defines the 2 bit equation values. The curve with the absolute peak value among the entire curves represents the 2-bit equation values. For example, the absolute peak value occurred at curve no. 3 (“Corr. Diff (00–11)”), which is the difference of correlation traces when assumed equation values are ‘00’ and ‘11’. In this case the peak is below 0 level, which means the correct key value is ‘00’. In the same way, the equation values for Eqs. 68–78 can be retrieved by using the previously computed equation values to compute the next equation values.

4 Experimental Results

Our experimental results are carried out using all three steps. Each step computes the partial key bits information in equation form. All the 80 bits of key can be computed by solving these equations as shown below:

Figure 6 represents the difference of correlation traces (20 in number using different color) corresponding to the correctly retrieved equation values. The peak for initial 20 clock cycles is represented by the ‘*’ sign and show the maximum correlation with the retrieved equations values. Numbering associated with ‘*’ sign indicates the sequencing of computed equation. In Fig. 6, the peaks are shifted toward the right side. It indicates that the equations values are computed one by one with respect to the different clock cycles. Similarly, the plots for all 78 ‘difference of correlation traces’ can be computed. This technique gives more appropriate results in lesser number of power traces if each power trace is divided in such a manner that every divided trace or local trace contains only single clock cycle information. Then apply DPA on local traces to extract the secret information. But this process requires more information about implementation like frequency of operation of algorithm and this is not a generic approach in the real-world scenario.

Fig. 6
figure 6

Difference of correlation traces for initial 20 equations

The experimentations were conducted for several key and IV pairs. All 80 bits of key were retrieved successfully every time in 78 clock cycles by using 5000 power traces, whereas normal DPA technique requires more than 10,000 traces.

5 Conclusion and Future Work

We have presented differential power analysis on Trivium, which is one of the ciphers in final eSTREAM portfolio. A novel technique, the difference of all Pearson’s correlation coefficient traces based on the assumed equations values is introduced to find the secret key bits information. It eliminates the noise of the power traces and provides pure side-channel information. Basically, it computes the maximum gap in the difference of correlation traces corresponding to the correct and wrong assumed value. Shifting in peak values corresponds to different correlation traces with time axis as shown in Fig. 6, which indicates that the correct guess values retrieve with the respective clock cycles, which satisfies our attack methodology. Maximum power traces requirement can be varied with the mechanism of measurement of power traces, implementation of algorithm, and the FPGA family (technology of FPGA, i.e., 90, 65 and 28 nm). The number of power trace requirements can be reduced significantly if each power trace is divided in such a manner that every divided trace or local trace contains only single clock cycle information and applies DPA on local traces to extract the secret information. Difference of correlation traces technique can be used in any nonlinear feedback shift resistor (NFSR)-based stream cipher, especially in lightweight cryptosystems. In lightweight cryptosystems the computational phase is less compared to other stream ciphers. Therefore, SCAs are a serious threat for these kinds of cryptosystems and counter measures are mandatory.

In the future work we will focus on the development of countermeasures for this attack. We will also extend the attack to parallel implementation of Trivium, where 32 or 64 bits of sequences are computed in parallel. We will also try to reduce the number of power traces requirement and the number of sample points per trace using statistical analysis techniques like principle component analysis (PCA) and signal to noise ratio (SNR) analysis; soft computing techniques like neural networks (NN), genetic algorithms, and fuzzy logic.