Keywords

1 Introduction

1.1 Backgrounds and the GMR-2 Cipher

Nowadays, mobile communication systems have revolutionized the way we interact with each other, and there have been built many cellular mobile network such as GSM, UMTS, CDMA2000, or 3GPP LTE. These cellular mobile networks all require a so called cell site to create a cell within the network, which provides all the necessary equipment for transmitting and receiving radio signals from mobile handsets and the radio network. However, in some cases, such as the crew on oil rig or ships on open sea, researchers on a field trip in a desert, or people living in remote areas or areas that are affected by a natural disaster, it is not always to be close to a mobile phone cellular network, then these residents, military and government systems need to use satellite phones to communicate.

Satellite phone is a type of mobile phone that connects to orbiting satellites instead of terrestrial cell sites. They provide similar functionality to terrestrial mobile telephones such as the voice, short messaging service etc. Currently, there are two major satellite phone standards both developed by ETSI, namely the GMR-1 standard and the GMR-2 (aka GMR-2+) standard. For instance, Thuraya phone implements the GMR-1 standard, while the GMR-2 standard is mainly used by InmarsatFootnote 1.

As we all know, security plays a significant role for satellite phones, yet from ETSI, we can only obtain the specifications of those two standards without any information about implementation details of security aspects. In fact, these two standards employ two different encryption algorithms called GMR-1 cipher and GMR-2 cipher, whose details had not been publicly known until [7] was reported in January 2012.

According to [7], the GMR-1 cipher is an improved version of A5/2 which belongs to the GSM encryption standard. Thus the methods of analyzing A5/2 as introduced in [3, 5] can almost be applied to the GMR-1 cipher [8]. The GMR-2 cipher is a newly designed stream cipher, and at present, only [7] presents a known plaintext attack against GMR-2 cipher which is based on the read-collision technique. This method needs approximately 50–65 bytes (5–6 frames) of the keystream to recover the full key, and the computational complexity is about \(2^{18}\).

1.2 Main Contribution and the Outline

In this paper, we propose a low data complexity attack on the GMR-2 cipher using the guess and determine approach. Guess-and-determine attack is a common cryptanalytic approach against stream ciphers [1, 2, 4, 6, 912, 15]. Its basic idea is to guess some parts of the internal state and derive other part through the relationship between the keystream and the internal state introduced by the keystream generation process. The validity of a guessed and determined internal state is checked by running the cipher forward from that state. If the generated keystream matches the intercepted keystream, we accept it. Otherwise, we discard the current candidate and try the attack again.

Fig. 1.
figure 1

Overall structure of the GMR-2 cipher

The general guess-and-determine attack assumes that the guessed part and the corresponding determined part of the internal state are known to the adversary prior to mounting the attack procedure. However, this approach cannot directly applied to the GMR-2 cipher due to its special structure. Considering this, we present a new strategy for guess-and-determine attack which we call the dynamic guess-and-determine. In this strategy, the evolution of guessing part of the internal state is changed dynamically according to the intermediate process, i.e., the new guessing part depends on both the previous guessed and determined parts of the internal state. We show how this kind of attack can be used to analyze the GMR-2 stream cipher. Our theoretical analysis demonstrates that, using the proposed attack, the 64-bit session key could be recovered by guessing no more than \(32\) bits when 15 bytes (1 frame) of the keystream are available. The experimental results also confirm our analysis, and the number of candidates for exhaustive search is about \(2^{28}\) on average.

The rest of this paper is organized as follows: In Sect. 2, we recall the GMR-2 cipher briefly. Section 3 gives some properties of the components of the cipher and Sect. 4 gives basic analysis of the cipher. Section 5 presents our low data complexity attack on GMR-2 cipher in detail and finally Sect. 6 concludes this paper.

2 Description of the GMR-2 Cipher

2.1 Overall Structure of the GMR-2 Cipher

The GMR-2 cipher uses a 64-bit encryption-key, denoted as \(K=\{K_7\), \(K_6\), \(\cdots \), \(K_0\}\) and operates on bytes. When the cipher is clocked, it generates one byte of the keystream denoted by \(Z_l\), where \(l\) represents the number of clockings. The cipher exhibits an 8-byte state register \(S=(S_7,S_6,\cdots ,S_0)\), three major components \(\mathcal {F}\), \(\mathcal {G}\), \(\mathcal {H}\), a 3-bit counter \(c\in \{0,1,\cdots ,7\}\) and a toggle-bit \(t\in \{0,1\}\). A schematic overview of the overall structure is depicted in Fig. 1.

The \(\mathcal {F}\)-component combines two bytes of the encryption-key with the previous output (a keystream byte), the \(\mathcal {G}\)-component is a linear function for mixing purpose, and the \(\mathcal {H}\)-component consists of two DES S-boxes as a nonlinear filter. In the following subsections, we will describe the three major components in detail.

Fig. 2.
figure 2

The structure of \(\mathcal {F}\)-component

2.2 \(\mathcal {F}\)-Component

The \(\mathcal {F}\)-component is the most interesting part of the cipher, and Fig. 2 shows its internal structure. The 64-bit encryption-key \(K\)=\((K_7\), \(K_6\), \(\cdots \), \(K_0)\) is fed into a 64-bit resister and it is unchanged during the execution of the cipher. At each clock, the \(\mathcal {F}\)-component just selects two key bytes \(K_c\) and \(K_{\tau _1(\alpha )}\) from the lower side and the upper side, which can be described formally as follows.

Assume the cipher is at the \(l\)-th clock, besides the encryption-key \(K\), the inputs of the \(\mathcal {F}\)-component contain \(t\), \(c\) and \(p\), where \(c=l\mod \,8\) is a counter ranging from \(0\) to \(7\) sequentially and repeatedly, \(t=c\mod \,2\) is a toggle bit, and \(p=(p_7,p_6,\cdots ,p_0)\in \{0,1\}^{8}\) is one byte of the keystream that has already been generated in the last clock. We will simply use \(p=Z_{l-1}\) to denote one byte of the keystream that has already been generated. The outputs of \(\mathcal {F}\)-component contain an 8-bit \(O_0\) and a 4-bit \(O_1\) of the following form

$$ \begin{aligned} \left\{ \begin{aligned} O_0 =&(K_{\tau _1(\alpha )}\ggg \tau _2(\tau _1(\alpha )))_8;\\ O_1 =&((((K_c\oplus p)\gg 4) \& \mathsf{0xF })\oplus ((K_c\oplus p) \& \mathsf{0xF }))_4. \end{aligned} \right. \end{aligned}$$
(1)

where \(\tau _1:\{0,1\}^4\longrightarrow \{0,1\}^3\) and \(\tau _2:\{0,1\}^3\longrightarrow \{0,1\}^3\) are defined by table-lookups as shown in Table 1, and \(\alpha \) is defined as

$$ \begin{aligned} \alpha =\mathcal {N}(t, K_c\oplus P) =\left\{ \begin{aligned}&((K_c\oplus p) \& \mathsf{0xF }))_4,&\quad \text {if}\quad t= 0; \\&(((K_c\oplus p)\gg 4) \& \mathsf{0xF })_4,&\quad \text {if}\quad t=1, \end{aligned}\right. \end{aligned}$$
(2)

which can also be expressed using the following simple form

$$ \alpha =[(K_c\oplus p)\gg 4\times (c\ \mathrm {mod}\ 2)]\, \& \,\mathsf{0xF }. $$
Table 1. Definition of \(\tau _1\) and \(\tau _2\)
Fig. 3.
figure 3

The structure of \(\mathcal {G}\)-component (the upper lines indicates lower bits)

2.3 \(\mathcal {G}\)-Component

As demonstrated in Fig. 3, the \(\mathcal {G}\)-component gets the output of the \(\mathcal {F}\)-component and one byte \(S_0\) of the state as its input. It employs three sub-components, denoted by \(\mathcal {B}_1\), \(\mathcal {B}_2\), \(\mathcal {B}_3\), all work on 4-bit input and returns 4-bit output with the following definitions

$$\begin{aligned} \left\{ \begin{aligned}&\mathcal {B}_1:(x_3,x_2,x_1,x_0)\mapsto (x_3\oplus x_0,x_3\oplus x_2\oplus x_0,x_3,x_1);\\&\mathcal {B}_2:(x_3,x_2,x_1,x_0)\mapsto (x_1,x_3,x_0,x_2);\\&\mathcal {B}_3:(x_3,x_2,x_1,x_0)\mapsto (x_2,x_0,x_3\oplus x_1\oplus x_0,x_3\oplus x_0). \end{aligned} \right. \end{aligned}$$

Since each \(\mathcal {B}_i\) is linear, and all the other operations are just transposition or XOR, the \(\mathcal {G}\)-component is an entirely linear transformation, and we can express the two 6-bit outputs \(O_0'\) and \(O_1'\) as linear functions of the input by Eq. (3)

$$\begin{aligned} \left\{ \begin{array}{ll} O_0'= &{} (O_{0,7}\oplus O_{0,4}\oplus S_{0,5},\ O_{0,7}\oplus O_{0,6}\oplus O_{0,4}\oplus S_{0,7},\ O_{0,7}\oplus S_{0,4},\\ &{}\ O_{0,5}\oplus S_{0,6},\ O_{1,3}\oplus O_{1,1}\oplus O_{1,0},\ O_{1,3}\oplus O_{1,0}) \\ O_1'= &{} (O_{0,3}\oplus O_{0,0}\oplus S_{0,1},\ O_{0,3}\oplus O_{0,2}\oplus O_{0,0}\oplus S_{0,3},\ O_{0,3}\oplus S_{0,0},\\ &{}\ O_{0,1}\oplus S_{0,2},\ O_{1,2},\ O_{1,0}). \end{array} \right. \end{aligned}$$
(3)

2.4 \(\mathcal {H}\)-Component

The input of the \(\mathcal {H}\)-component as shown in Fig. 4, is the outputs of \(\mathcal {G}\)-component \(O'_0\), \(O'_1\) and a toggle-bit \(t\).

Fig. 4.
figure 4

The structure of \(\mathcal {H}\)-component

Table 2. The S-box \(\mathbb {S}_2\)

\(\mathcal {H}\)-component contains two S-boxes \(\mathbb {S}_2\) and \(\mathbb {S}_6\), where \(\mathbb {S}_2\) is the second S-box of DES and \(\mathbb {S}_6\) is the sixth S-box of DES. See Tables 2 and 3 for a reference. However, these two S-boxes have been reordered to account for the different addressing.

Assume the input of S-box is \((x_6,x_5,x_4,x_3,x_2,x_1)\), then in this cipher, the least-significant bits \((x_2,x_1)\) select the S-box row and the four most-significant bits \((x_6,x_5,x_4,x_3)\) select the S-box column. Now depending on the value of \(t\), the output of \(\mathcal {H}\)-component, which is the \(l\)-th byte of the keystream, can be defined by

$$\begin{aligned} Z_l=\left\{ \begin{aligned}&(\mathbb {S}_2(O_1'),\mathbb {S}_6(O_0'))_8,\quad \text {if}\quad t= 0; \\&(\mathbb {S}_2(O_0'),\mathbb {S}_6(O_1'))_8, \quad \text {if}\quad t= 1. \end{aligned}\right. \end{aligned}$$
(4)
Table 3. The S-box \(\mathbb {S}_6\)

2.5 Mode of Operation

Now, we can describe the mode of operation [7] for the GMR-2 cipher. When the cipher is clocked for the \(l\)-th time, the following happens:

  • Based on the current state of the state-register \(S\), the counter \(c\), and the toggle-bit \(t\), the cipher generates one byte \(Z_l\) of keystream.

  • The counter \(c\) is incremented by one and the toggle-bit is computed as \(t=c\ \mathrm {mod}\ 2\). When 8 is reached for \(c\), then \(c\) is reset to \(0\).

  • The state-register \(S\) is shifted by 8 bits to the right: \(S_i=S_{i+1}\), \(i=0,1,\ldots ,6\), and \(S_7=Z_l\). Meanwhile, \(p=Z_l\) is also passed to the \(\mathcal {F}\)-component as input for the next iteration (the (\(l+1\))-th clock).

The cipher is operated in two modes, the initialization mode and the generation mode.

Initialization Mode. In the initialization phase, the following steps are performed:

  • The counter \(c=0\) and the toggle-bit \(t=0\).

  • The 64-bit encryption-key is written into the resister in the \(\mathcal {F}\)-component.

  • The state-register \(S\) is initialized with the 22-bit frame-number \(N\), and this procedure is not detailed here as it is irrelevant with our attack. After \(c\), \(t\), \(S\) have been initialized, the cipher is clocked eight times, but the resulting keystream is discarded.

Generation Mode. Footnote 2 After the initialization is finished, the cipher is clocked to generate and output actual keystream bytes. We use \(Z_l^{(N)}\) to denote the \(l\)-th (\(l\ge 0\)) byte of keystream generated after initialization with frame-number \(N\). The frame-number is always incremented after 15 bytes of keystream, which forces a re-initialization of the cipher. Therefore the keystream \(Z'\) that is actually used for \(N\in \{0,1,\cdots \}\) is made up of blocks of 15 bytes that are concatenated as follows:

$$\begin{aligned} Z' = (Z_0^{(0)},Z_1^{(0)},\cdots ,Z_{14}^{(0)},Z_0^{(1)},Z_1^{(1)},\cdots ,Z_{14}^{(1)},\cdots ). \end{aligned}$$

3 Properties of the Components of the GMR-2 Cipher

In this section, we carefully study the characteristic of the GMR-2 cipher and propose several properties of its components which are related to our later analysis.

3.1 Property of the \(\mathcal {F}\)-Component

We first note that after the 64-bit encryption key \(K\) is fed into the \(\mathcal {F}\)-component, it remains unchanged not only in the phase of the initialization, but also in the phrase of the keystream generation. Since the \(\mathcal {F}\)-component is used to select two key bytes \(K_c\) and \(K_{\tau _1(\alpha )}\) from \(K\), and the counter \(c\) is changed sequentially from 0–7, we only need to know how \(K_{\tau _1(\alpha )}\) is selected.

Property of \(\alpha \). By Eq. (2), \(\alpha \) can be expressed as:

$$\begin{aligned} \alpha&= \mathcal {N}(t,K_c\oplus p) \\&= \left\{ \begin{aligned}&(K_{c,3}\oplus p_3,\ K_{c,2}\oplus p_2,\ K_{c,1}\oplus p_1, \ K_{c,0}\oplus p_0)_4,\quad \text {if}\quad t=0; \\&(K_{c,7}\oplus p_7,\ K_{c,6}\oplus p_6,\ K_{c,5}\oplus p_5,\ K_{c,4}\oplus p_4)_4, \quad \text {if}\quad t=1. \end{aligned}\right. \end{aligned}$$

This tells us that if \(p\) is known, then at each clock, we can get the value of \(\alpha \) only by the four least-significant bits of \(K_c\) when \(t=0\) (\(c\) is even) or the four most-significant bits of \(K_c\) when \(t=1\) (\(c\) is odd). Thus, the key byte \(K_{\tau _1(\alpha )}\) selected by the upper side can be determined by the value of the most (least) significant \(4\)-bit of \(K_c\) provided \(p\) is known.

Properties of \(\tau _1 \ and\ \tau _2\). From Table 1, we know that \(\tau _1\) maps \(4\)-bit to \(3\)-bit, thus a collision always exists. For instance, \(\tau _1(0,0,1,0)=\tau _1(1,0,0,1)=0\), and \(\tau _1(0,1,1,0)=\tau _1(1,1,1,0)=4\), this observation combined with \(\tau _2(0)=\tau _2(4)=4\) lead to the efficient read-collision based attack in [7]. Note that \(\tau _2(\cdot )\) maps \(3\)-bit to \(3\)-bit, but it is non-surjective. Since one of the output of \(\mathcal {F}\)-component is \(O_0=K_{\tau _1(\alpha )}\ggg \tau _2(\tau _1(\alpha ))\), we guess the reason why the designers choose a non-surjective table for \(\tau _2(\cdot )\), he just want to make the right rotation parameter always being non-zero. Currently, we do not know whether this kind of non-uniformity could lead to some other potential attacks.

3.2 Property of the \(\mathcal {G}\)-Component

According to Eq. (3), the link between the input and output of the \(\mathcal {G}\)-component can be expressed by

(5)

based on which we can obtain the following three linear equation systems:

$$\begin{aligned} \mathbf {y} =&\,\mathbf {W}\cdot \mathbf {x}\oplus \mathbf {v}, \end{aligned}$$
(6)
$$\begin{aligned} \mathbf {y_1} =&\,\mathbf {W_1}\cdot \mathbf {x_1}\oplus \mathbf {v_1}, \end{aligned}$$
(7)
$$\begin{aligned} \mathbf {y_2} =&\,\mathbf {W_2}\cdot \mathbf {x_2}\oplus \mathbf {v_2}, \end{aligned}$$
(8)

where

$$\begin{aligned} \mathbf {W}=\left( \begin{array}{ccc} \mathbf {A} &{} \mathbf {0} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {A} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {0} &{} \mathbf {B} \\ \end{array} \right) , \quad \mathbf {W_1}=\left( \begin{array}{cc} \mathbf {A} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {A} \\ \end{array} \right) , \quad \mathbf {W_2}=\left( \begin{array}{c} \mathbf {B} \\ \end{array} \right) , \end{aligned}$$
$$\begin{aligned} \mathbf {A}= \begin{pmatrix} 1&{}0&{}0&{}1\\ 1&{}1&{}0&{}1\\ 1&{}0&{}0&{}0\\ 0&{}0&{}1&{}0\\ \end{pmatrix},\quad \mathbf {B} = \begin{pmatrix} 1&{}0&{}1&{}1\\ 1&{}0&{}0&{}1\\ 0&{}1&{}0&{}0\\ 0&{}0&{}0&{}1\\ \end{pmatrix},\quad \mathbf 0 = \begin{pmatrix} 0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0\\ 0&{}0&{}0&{}0\\ \end{pmatrix}, \end{aligned}$$

and

$$\begin{aligned} \left\{ \begin{array}{ll} \mathbf {y_1}&{}=(O'_{0,5},O'_{0,4},O'_{0,3},O'_{0,2},O'_{1,5},O'_{1,4},O'_{1,3},O'_{1,2})^T \\ \mathbf {y_2}&{}=(O'_{0,1},O'_{0,0},O'_{1,1},O'_{1,0})^T \\ \mathbf {y} &{}=(\mathbf {y_1},\mathbf {y_2}) \\ \end{array} \right. , \end{aligned}$$
$$\begin{aligned} \left\{ \begin{array}{ll} \mathbf {x_1} &{}= (O_{0,7},O_{0,6},O_{0,5},O_{0,4},O_{0,3},O_{0,2},O_{0,1},O_{0,0})^T \\ \mathbf {x_2} &{}= (O_{1,3},O_{1,2},O_{1,1},O_{1,0})^T \\ \mathbf {x} &{}= (\mathbf {x_1}, \mathbf {x_2}) \end{array} \right. , \end{aligned}$$
$$\begin{aligned} \left\{ \begin{array}{ll} \mathbf {v_1} &{}=(S_{0,5},S_{0,7},S_{0,4},S_{0,6},S_{0,1},S_{0,3},S_{0,0},S_{0,2})^T \\ \mathbf {v_2} &{}=(0,0,0,0)^T \\ \mathbf {v} &{}=(\mathbf {v_1},\mathbf {v_2}) \\ \end{array} \right. . \end{aligned}$$

Further, let \(K_c=(\mathbf {k_h},\mathbf {k_l})\), where \(\mathbf {k_h}=(K_{c,7},K_{c,6},K_{c,5},K_{c,4})^T\) denotes the most significant 4-bit of \(K_c\), and \(\mathbf {k_l}=(K_{c,3},K_{c,2},K_{c,1},K_{c,0})^T\) denotes the least significant 4-bit of \(K_c\). Similarly let \(p=(\mathbf {p_h},\mathbf {p_l})\), where \(\mathbf {p_h}=(p_7,p_6,p_5,p_4)^T\), \(\mathbf {p_l}=(p_3,p_2,p_1,p_0)^T\), and define \(\mathbf {u}=\mathbf {p_h}\oplus \mathbf {p_l}\), then Eq. (1) implies the following two linear systems

$$\begin{aligned} {\mathbf {x_1}}&= K_{\tau _1(\alpha )}\ggg \tau _2(\tau _1(\alpha ))\end{aligned}$$
(9)
$$\begin{aligned} {\mathbf {x_2}}&= {\mathbf {k_h}}\oplus {\mathbf {k_l}}\oplus \mathbf {u} \end{aligned}$$
(10)

In the following attack on the GMR-2 cipher, we will always use one of the above linear systems, and we can guarantee that both the exact values of \(\mathbf {u}\) and \(\mathbf {v}\) are known to us. We have the following observations:

Observation 1. Since \(\mathbf {A}\) and \(\mathbf {B}\) are invertible, so are \(\mathbf {W}\), \(\mathbf {W_1}\) and \(\mathbf {W_2}\), then from Eqs. (6)–(8), we can obtain the value of \(\mathbf {y}\) (\(\mathbf {y_i}\)) from \(\mathbf {x}\) (\(\mathbf {x_i}\)) easily, and vice vera.

Observation 2. If both \(\mathbf {y_1}\) and \(\alpha \) are known, then from observation 1, we can get the value of \(\mathbf {x_1}\), and further from Eq. (9), we can calculate \(K_{\tau _1(\alpha )}=\mathbf {x_1}\lll \tau _2(\tau _1(\alpha ))\).

Observation 3. If both \(\mathbf {y_2}\) and \(\mathbf {k_h}\) (\(\mathbf {k_l}\)) are known, then from observation 1,we can get the value of \(\mathbf {x_2}\), and further from Eq. (10), we can calculate \(\mathbf {k_l}=\mathbf {x_2}\oplus \mathbf {k_h}\oplus \mathbf {u}\) (\(\mathbf {k_h}=\mathbf {x_2}\oplus \mathbf {k_l}\oplus \mathbf {u}\)).

Observation 4. The column indices of the two S-boxes \(\mathbb {S}_2\) and \(\mathbb {S}_6\) are selected by \(\mathbf {y_1}\), and the row indices are selected by \(\mathbf {y_2}\). This relationship is depicted in Fig. 5.

Fig. 5.
figure 5

The links between the input and output of the \(\mathcal {G}\)-component (the upper lines indicates lower bits). Note that \( \alpha = [(K_c\oplus p)\gg 4\times (c\mod \,2)]\ \& \ \mathsf{0xF }\)

3.3 Property of the \(\mathcal {H}\)-Component

According to Eq. (4) and the definition of the two S-boxes, we have the following three results:

  • If the row index and the output of an S-box are known, then we will get the column index uniquely.

  • If the column index and the output of an S-box are known, then we will also get the row index uniquely except for \(\mathbb {S}_6\) when the column index is 4 and the output is 9, in this situation, the row index can be either \(0\) or \(3\).

  • If only the outputs of both S-boxes are known, then we will get \(4\times 4=16\) possible inputs for \(\mathcal {H}\)-component.

The above three results indicate that by intercepting the keystream of the GMR-2 cipher (the output of the two S-boxes) and combining the guessed/determined values of the row or column indices, we can “invert” these two S-boxes to obtain the corresponding (partial) input candidates.

4 Basic Analysis of the GMR-2 Cipher

The previous section presents some properties of the three components of the GMR-2 cipher. In this section, we show how these components interact with each other.

Given the frame number \(N\), let \(S_i^{(l)}\) denote the state of \(S_i\) at the \(l\)-th (\(0\le l\le 14\)) clock in the keystream generation phrase, then for \(8\le l\le 14\) we have

$$ S_0^{(l)}=Z_{l-8}^{(N)} \quad \text {and}\quad p=S_7^{(l)}=Z_{l-1}^{(N)}, $$

which indicates that for \(8\le l\le 14\), both \(S_0^{(l)}\) and \(p\) are known to us, thus the vectors \(\mathbf {v}\), \(\mathbf {v_1}\), \(\mathbf {v_2}\) and \(\mathbf {u}\) as defined in the previous section are also known to us. To simply our analysis, in the following of this section, we only focus on the cipher at the \((c+8)\)-th clock with \(0\le c\le 6\).

Note that at the \((c+8)\)-th clock, the \(\mathcal {F}\)-component just selects two key bytes \(K_c\) and \(K_{\tau _1(\alpha )}\) from the lower side and the upper side. According to the property of the \(\mathcal {F}\)-component, just by guessing the half value of \(K_c=(\mathbf {k_h},\mathbf {k_l})\), we can determine the value of \(\alpha \) and then know which key byte the \(\mathcal {F}\)-component will select.

Now, based on the fact that the link between the input and output of the \(\mathcal {G}\)-component can be expressed by a well-structured matrix \(\mathbf {W}\), we present the following four rules for the guessing strategy when applying the dynamic guess-and-determine attack as described in the next section.

Rule 1. Let \(K_c=(\mathbf {k_h},\mathbf {k_l})\), assume \(c\) is odd, and given a guessed value for \(\mathbf {k_h}\), if \(\tau _1(\alpha )=c\), then from \(Z_{c+8}^{(N)}\), either the guessed value of \(\mathbf {k_h}\) is wrong or the candidate value of \(\mathbf {k_l}\) can be determined; Similarly, assume \(c\) is even, and given a guessed value for \(\mathbf {k_l}\), if \(\tau _1(\alpha )=c\), then from \(Z_{c+8}^{(N)}\), either the guessed value of \(\mathbf {k_l}\) is wrong or the candidate value of \(\mathbf {k_h}\) can be determined.

Proof

We only give the proof for the first case, the other case is similar, and thus the detail is omitted.

From \(\tau _1(\alpha )=c\), we have \(K_{\tau _1(\alpha )}=K_c\), thus

$$ \mathbf {x_1}=K_{\tau _1(\alpha )}\ggg \tau _2(\tau _1(\alpha ))= (\mathbf {k_h},\mathbf {k_l})\ggg \tau _2(\tau _1(\alpha )). $$

Noting that

$$ \mathbf {x_2}=\mathbf {k_h}\oplus \mathbf {k_l}\oplus \mathbf {u} \quad \text {and}\quad \mathbf {x}=(\mathbf {x_1},\mathbf {x_2}), $$

thus if \(\mathbf {k_h}\) is known, then for each possible \(\mathbf {y}\) (whose value is calculated later), Eq. (6) can be converted into another linear equation system (which is related to the guessed value of \(\mathbf {k_h}\)) with \(12\) equations and \(4\) indeterminate variables representing \(\mathbf {k_l}\).

According to the properties of the \(\mathcal {H}\)-component, there will be 16 different values for \(\mathbf {y}=(\mathbf {y_1},\mathbf {y_2})\) from \(Z^{(N)}_{c+8}\). Thus, in total, 16 linear equation systems for \(\mathbf {k_l}\) can be obtained.

If the guessed value of \(\mathbf {k_h}\) is the actual value, at least one of the above 16 linear systems will have a solution that can be find through Gaussian elimination method. While if \(\mathbf {k_h}\) is a random guessed value, then based on the theory of Linear Consistency Test [14], the probability that each linear equation system has solutions is no more than

$$ \frac{1}{2^{12-4}}\times \left( 1+\frac{1}{2^{12+1}}\right) ^4\thickapprox 2^{-8}. $$

Thus, the probability that the above 16 linear equation systems have solutions is upper bounded by \(16\times 2^{-8}=2^{-4}\). In other words, this indicates that the number of candidates for \(\mathbf {k_l}\) is small.\(\quad \square \)

Rule 2. Let \(K_c=(\mathbf {k_h},\mathbf {k_l})\), assume \(c\) is odd (even), and given a guessed value for \(\mathbf {k_h}\) (\(\mathbf {k_l}\)), if \(\tau _1(\alpha )\ne c\), we further guess the value of \(\mathbf {k_l}\) (\(\mathbf {k_h}\)), in this situation, we have a guessed value for \(K_c\), and then \(K_{\tau _1(\alpha )}\) can be determined by \(Z_{c+8}^{(N)}\).

Proof

Since \(K_c=(\mathbf {k_h},\mathbf {k_l})\) is known by guess, \(\mathbf {x_2}=\mathbf {k_h}\oplus \mathbf {k_l}\oplus \mathbf {u}\) is known, according to observation 1, \(\mathbf {y_2}\) can be calculated. By observation 4, the row indices for the two S-boxes are known, then from \(Z_{c+8}^{(N)}\), the value of \(\mathbf {y_1}\) which corresponds to the column indices for the two S-boxes can be uniquely determined. By observation 2, the value of \(K_{\tau _1(\alpha )}\) can be obtained.\(\quad \square \)

Rule 3. Let \(K_c=(\mathbf {k_h},\mathbf {k_l})\), assume \(c\) is odd, and given guessed value for \(\mathbf {k_h}\), if \(K_{\tau _1(\alpha )}\) had already been guessed or determined previously, then \(\mathbf {k_l}\) can be determined by \(Z_{c+8}^{(N)}\); Similarly, assume \(c\) is even, and given guessed value for \(\mathbf {k_l}\), if \(K_{\tau _1(\alpha )}\) had already been guessed or determined previously, then \(\mathbf {k_h}\) can be determined by \(Z_{c+8}^{(N)}\).

Proof

Since \(K_{\tau _1(\alpha )}\) is known, \(\mathbf {x_1}\) is known, by observation 1, the value of \(\mathbf {y_1}\) can be obtained. Noting that \(\mathbf {y_1}\) corresponds to the column indices for S-boxes, thus \(\mathbf {y_2}\) which represents the row indices for S-boxes can be obtained from \(Z_{c+8}^{(N)}\). According to observation 3, \(\mathbf {k_h}\) (\(\mathbf {k_l}\)) can be calculated with known \(\mathbf {k_l}\) (\(\mathbf {k_h}\)).\(\quad \square \)

Remark 1

We remind here that, from \(\mathbf {y_1}\) and \(Z_{c+8}^{(N)}\), we cannot always uniquely deduce \(\mathbf {y_2}\) as explained in Sect. 3.3, thus we will sometimes obtain two candidates for \(\mathbf {y_2}\).

Rule 4. Assume that the values for \(K_c\) and \(K_{\tau _1(\alpha )}\) had already been guessed or determined previously, then we can judge whether those guessed or determined values are wrong by \(Z^{(N)}_{c+8}\).

Proof

Since \(K_c\) and \(K_{\tau _1(\alpha )}\) are known, they can pass through the three components to generate a keystream byte at the \((c+8)\)-th clock, then we can compare it with \(Z^{(N)}_{c+8}\). If they are not matched, the guessed values for \(K_c\) and \(K_{\tau _1(\alpha )}\) are wrong.\(\quad \square \)

Remark 2

When applying dynamic guess-and-determine attack on the GMR-2 cipher in the next section, in fact, at each step, we just adopt Rule 1–Rule 3 to guess or determine some parts of \(K\), or adopt Rule 4 to verify whether the guessed or determined value is wrong. If Rule 4 indicates some inconsistency at the current clock, then the guessed value for the nearest clock is wrong, in this situation, we must backtrack to this position, and try another guessed value.

5 Low Data Complexity Attack on the GMR-2 Cipher

As discussed in the introduction, the general guess-and-determine attack assumes that both the guessed part and the corresponding determined part of the internal state are known to the adversary prior to mounting the attack. However, considering the mechanism of the GMR-2 cipher, we cannot directly applied the general guess-and-determine attack on it. Thus, we introduce a new strategy for guess-and-determine attack which we call the Dynamic Guess-and-Determine. The main feature is that we cannot decide which parts must be guessed and which parts have to be determined in prior, what we can do is just dynamically guessing some parts of the internal state. The idea can be further described as follows.

First, we guess some part of the internal state of the target cipher, and then according to the guessed value, we determine some other parts of the internal state through the intercepted keystream. Next, we continue to guess some new part of the internal state, but this time the guessed part depends on both the previous guessed and determined parts. Do this process until all parts of the internal state are deduced. This indicates that we need to dynamically build the candidates for \(K\) by backtracking.

Now we can adopt the above strategy to present a low data complexity attack on the GMR-2 cipher. Our attack only needs one frame (15-byte) of the keystream, and without loss of generality, we assume \(N=0\). The attack contains the following two major stepsFootnote 3:

  • In the first step, from the known keystream \(Z^{(0)}_0\sim Z^{(0)}_{14}\), we adopt the dynamic guess-and-determine method to analyze the cipher at the \((c+8)\)-th clock, where \(0\le c\le 6\), and this can reduce the candidates for the \(64\)-bit encryption-key \(K\) from \(2^{64}\) to no more than \(2^{32}\).

  • In the second step, we test the candidates for \(K\) from the first step by comparing the keystream generated from these candidates with the exact keystream \(Z^{(0)}_0\sim Z^{(0)}_7\), thus we obtain the unique value for \(K\).

Since the second step of our attack is just doing exhaustive search operations for the candidate set, we only discuss the first step in detail in the following subsection.

5.1 The Attack Procedure

As explained before, to guarantee that the values of \(p\) and \(S_0^{(l)}\) are known for us at the \(l\)-th clock, we should analyze the cipher at the \((c+8)\)-th clock with \(0\le c\le 6\).

Before introducing the proposed attack, we first define an index set

$$ \varGamma \subseteq \{0,1,\cdots ,7\} $$

to save the byte indices for the encryption key \(K\) that had already been known by guessing or determining before the \((c+8)\)-th clock. \(\varGamma \) is initialized with \(\varnothing \) at the 8th clock, and is changed during the attack process.

Now let’s analyze the GMR-2 cipher at the \((c+8)\)-th clock with \(0\le c\le 6\). At each clock, we calculate the following values:

$$ c,\ t,\ p=Z^{(0)}_{c+7},\ S_0^{(c+8)}=Z^{(0)}_{c},\ \text {and}\ \varGamma , $$

and judge whether \(c\in \varGamma \):

  • If \(c\in \varGamma \), then \(K_c\) had been known, we could calculate \(\alpha \) and judge whether \(\tau _1(\alpha )\in \varGamma \):

    • If \(\tau _1(\alpha )\in \varGamma \), then \(K_{\tau _1(\alpha )}\) had been known, thus we can adopt Rule 4 to determine whether \(K_c\) and \(K_{\tau _1(\alpha )}\) are wrong. If they are incorrect (i.e., the guessed and determined values are wrong), then we trace back to the nearest clock (at which the guessed value indicates such inconsistency) to re-analyze the cipher.

    • If \(\tau _1(\alpha )\not \in \varGamma \), then \(K_{\tau _1(\alpha )}\) had not been known, we can adopt Rule 2to obtain \(K_{\tau _1(\alpha )}\), and meanwhile set \(\varGamma \leftarrow \varGamma \cup \{\tau _1(\alpha )\}\).

  • If \(c\not \in \varGamma \), then \(K_c=(\mathbf {k_h},\mathbf {k_l})\) had not been known, now we decide to guess \(\mathbf {k_l}\) if \(c\) is even, and \(\mathbf {k_h}\) if \(c\) is odd. Next, we calculate \(\alpha \) and judge whether \(\tau _1(\alpha )\in \varGamma \):

    • If \(\tau _1(\alpha )\in \varGamma \), then \(K_{\tau _1(\alpha )}\) had been known, we can adopt Rule 3to get \(\mathbf {k_h}\) if \(c\) is even, and \(\mathbf {k_l}\), if \(c\) is odd, and meanwhile set \(\varGamma \leftarrow \varGamma \cup \{c\}\).

    • If \(\tau _1(\alpha )\not \in \varGamma \), then \(K_{\tau _1(\alpha )}\) had not been known. We further judge whether \(c=\tau _1(\alpha )\):

      • If \(c=\tau _1(\alpha )\), then we can adopt Rule 1 to either get the rest bits of \(K_c\), and set \(\varGamma \leftarrow \varGamma \cup \{c\}\), or deduce that the guessed value of \(\mathbf {k_l}\) (\(\mathbf {k_h}\)) is wrong if \(c\) is even (odd), and then we guess another value for \(\mathbf {k_l}\) (\(\mathbf {k_h}\)).

      • If \(c\ne \tau _1(\alpha )\), then we guess the other four bits of \(K_c\), and we can adopt Rule 2 to get \(K_{\tau _1(\alpha )}\), and meanwhile set \(\varGamma \leftarrow \varGamma \cup \{c,\tau _1(\alpha )\}\).

The above process sequentially executes on the GMR-2 cipher from the \(8\)th clock to the \(14\)th clock. When it is finished, there will be a candidate for the 64-bit \(K\), then we test whether it is the right key by \(Z^{(0)}_0\sim Z^{(0)}_7\). If not, we discard this candidate, and then we modify the guessed values to obtain another candidate. This process is repeated until the right key is found at last.

5.2 Complexity Analysis and Experimental Results

From the attack procedure, especially from Rule 1– Rule 3, it is shown that if we guess \(8\) bits, then we will obtain other \(8\) bits; while if we guess \(4\) bits, then we will also deduce other \(4\) bits. Furthermore, Rule 4 can be further used to filter the wrong guessed values. We thus conclude that for a 64-bit key \(K\), we only need to guess at most \(32\) bits on average, and the other 32 bits can be determined. This estimation is rough, however, it seems difficult and even impossible to calculate the exact time complexity of our attack in theory. So we do some experiments for different frames and random keys. Our experimental results almost confirm our analysis, and the number of candidates is a little better, it is about \(2^{28}\) on average.

More specifically, we perform a non-optimized Footnote 4 realization of the above attack 1000 times on a 3.2 GHz PC, and the result demonstrates that the 64-bit encryption-key can be obtained in around 700 seconds on average, where 580 seconds are consumed to deduce the \(2^{28}\) candidates, and 120 seconds are consumed to exhaustively search the candidates. Figure 6 is the frequence distribution of the exhaustive bits (the logarithm of the number of candidates) from 1000 experimental results.

The data complexity of the attack is just a frame of the keystream, i.e., 15-byte keystream. The dynamic guess-and-determine phase only analyze \(8\)th\(\sim \) \(14\)th clock, because \(S_7,S_6,\ldots ,S_0\) must be known in this phase. While for the exhaustive search phase, \(Z^{(0)}_{0}\sim Z^{(0)}_7\) can be used to distinguish the right key from the \(2^{28}\) candidates.

Fig. 6.
figure 6

The frequence distribution of exhaustive bits from 1000 experimental results

6 Conclusion

The GMR-2 cipher has been widely used in the satellite phones communications, and thus it is of special significant to analyze its security. The design methodology of GMR-2 cipher seems new and more complex, yet an efficient low data complexity attack based on the strategy of dynamic guess-and-determine could be mounted. This kind of attack needs only \(1\) frame (15-byte) of the keystream, and it can recover the 64-bit session key by testing about \(2^{28}\) candidates on average. Table 4 is the comparison between the known cryptanalytic result and ours. Our proposed attack can also be implemented on a single PC, which again demonstrates that the design methodology of the GMR-2 cipher is really far from what is “state of the art” in stream ciphers.

Table 4. Cryptanalytic results on the GMR-2 cipher