1 Introduction

The \(\textsf{PV}\) Knapsack problem, previously called the partial Fourier recovery problem, was introduced in [11] as a new lattice-based assumption for post-quantum cryptography. The efficiency and rich algebraic properties underlying the \(\textsf{PV}\) Knapsack problem make it an attractive choice. As a result, the problem has been used as a building block for various primitives, such as encryptions [5, 12], signatures [11, 14], and aggregatable signatures [9].

Let \(\mathcal {R}_q:=\mathbb {Z}_q[x]/(\boldsymbol{g}(\boldsymbol{x}))\) be a quotient polynomial ring, where \(\boldsymbol{g}\) splits linearly over \(\mathbb {Z}_q\) for some prime q. In the literature, \(\boldsymbol{g}\) is commonly either \(\boldsymbol{x}^{n}-1\) with prime n or \(\boldsymbol{x}^{n}+1\) with power-of-two n. In the rest of the paper, we assume that \(\boldsymbol{g}\) corresponds to one of these choices. Denote by \(\varOmega \) the set of all the primitive roots of \(\boldsymbol{g}\) over \(\mathbb {Z}_q\). Consider \(\varOmega _t\), a uniformly selected random subset of \(\varOmega \) of size t. The \(\textsf{PV}\) Knapsack problem (informally) states the following:

It is hard to recover a uniform ternary \(\boldsymbol{f}(\boldsymbol{x})\in \mathcal {R}_q\), given only the evaluations of \(\boldsymbol{f}(\boldsymbol{x})\) at \(\omega \in \varOmega _t\) when \(t \approx \lfloor n/2 \rfloor \).

The \(\textsf{PV}\) Knapsack problem also has a decisional version, which asks to distinguish between the evaluations of arbitrary \(\boldsymbol{f}(\boldsymbol{x})\) and ternary \(\boldsymbol{f}(\boldsymbol{x})\) at \(\omega \in \varOmega _t\).

The main approach to solving the problem has been the lattice reduction algorithms [11]. Recently, the authors of [4] proposed an algebraic method that reduces the cost of the lattice reduction for solving the decisional problem.

The distinguishing attack of [4] doesn’t lead to a key recovery attack in general. In Sect. 5 of [4], key recovery is only obtained for a small number of worst-case keys. Furthermore, their paper states: We note however that this does not fully invalidate the claim made in [14], since the 128 bit-security is claimed against search attackers, and not distinguishing attackers.

This quote is the starting motivation for this work. Indeed, it might be worthwhile – from an attacker’s viewpoint – to find a search attack against the \(\textsf{PV}\) Knapsack. As far as we know, there are some lattice-based assumptions where the search problem remains intractable, even though the decision problem is easy; one example is the \(\textsf{FFI}\) problem [8].

2 Preliminaries

2.1 Notations

For any integer \(N>1\), we write \(\mathbb {Z}_N\) to denote the ring of integers modulo N and \(\mathbb {Z}_N^{*}\) to denote the multiplicative subgroup of its units. In particular, when q is prime, \(\mathbb {Z}_q\) is the finite field with q elements. We assume that q is odd and, in that case, we represent elements of \(\mathbb {Z}_q\) by the unique representative belonging to the interval \([-(q-1)/2,(q-1)/2]\).

We let \(\mathcal {R}_q=\mathbb {Z}_q[x]/(\boldsymbol{g})\) denote the quotient polynomial ring of \(\mathbb {Z}_q[x]\) by \(\boldsymbol{g}(\boldsymbol{x})\), where \(\boldsymbol{g}(\boldsymbol{x})\) is either \(\boldsymbol{x}^{n}-1\) with n a prime or \(\boldsymbol{x}^{n}+1\) with n a power of two. We insist that \(\boldsymbol{g}\) splits into linear factors over \(\mathbb {Z}_q\). We denote by \(\varOmega \) the set of all primitive roots of \(\boldsymbol{g}\) in \(\mathbb {Z}_q\). When n is a prime, \(\varOmega \) contains all roots of \(\boldsymbol{g}\) except 1; when n is a power of two, \(\varOmega \) contains all roots of \(\boldsymbol{g}\). In both cases, for any \(\omega \in \varOmega \) and any root \(\omega '\) of \(\boldsymbol{g}\) in \(\mathbb {Z}_q\), \(\omega '\) can be written as a power of \(\omega \), say \(\omega ^{i_{\omega '}}.\) In particular, when n is prime \(1=\omega ^0\) in \(\mathbb {Z}_q\). Note that for a prime value of n, the exponent \(i_{\omega '}\) can take all values in \(\mathbb {Z}_n\). When n is a power of two, the exponent \(i_{\omega '}\) takes all odd values in \(\mathbb {Z}_{2n}\). As a consequence, if we exclude the non-primitive root 1, the exponents \(i_{\omega '}\) belong to \(\mathbb {Z}_n^{*}\) when n is prime and \(\mathbb {Z}_{2n}^{*}\) when n is a power of two. To lighten notations, we use U(n) as a shorthand for \(\mathbb {Z}_n^{*}\) when n is prime and \(\mathbb {Z}_{2n}^{*}\) when n is a power of two.

As a consequence, it is convenient to choose an arbitrary primitive root \(\omega _1\) of \(\boldsymbol{g}\) in \(\mathbb {Z}_q\) and write:

$$ \varOmega =\{ \omega _i=\omega _1^{i}\ |\ i \in U(n)\}. $$

Remark 1

The condition that \(\boldsymbol{g}\) splits in \(\mathbb {Z}_q\) implies that \(q= 1 \bmod {n}\) when n is prime and that \(q= 1 \bmod {2n}\) when n is a power of two.

To represent polynomial in \(\mathcal {R}_q\), we use the polynomial basis \(\{1,\boldsymbol{x},\dots ,\boldsymbol{x}^{n-1}\}.\) Since we are working modulo \(\boldsymbol{g}\), for any polynomial \(\boldsymbol{f}\) in \(\mathcal {R}_q\), we can interpret \(\boldsymbol{f}(1/\boldsymbol{x})\) as a polynomial. More precisely, if \(\boldsymbol{f}(\boldsymbol{x})=f_0+f_1\boldsymbol{x}+\dots +f_{n-1} \boldsymbol{x}^{n-1},\) we define:

$$\begin{aligned} \boldsymbol{f}(1/\boldsymbol{x})= {\left\{ \begin{array}{ll} f_0+f_1\boldsymbol{x}^{n-1}+\dots +f_{n-1} \boldsymbol{x}\quad \text{ when } \text{ n } \text{ is } \text{ prime } \\ f_0-f_1\boldsymbol{x}^{n-1}-\dots -f_{n-1} \boldsymbol{x}\quad \text{ when } \text{ n } \text{ is } \text{ a } \text{ power } \text{ of } \text{ two } \end{array}\right. } \end{aligned}$$

It is easy to verify that for any root \(\omega \) of \(\boldsymbol{g}\), the evaluation of \(\boldsymbol{f}(\boldsymbol{x})\) at \(\omega ^{-1}\) coincides with the evaluation of \(\boldsymbol{f}(1/\boldsymbol{x})\) at \(\omega \), thus justifying our definition.

Since we have specified a basis for polynomials in \(\mathcal {R}_q\), we can identify a polynomial with the vector of its coefficients in this basis. We use this identification extensively in the descriptions of the various attacks. For any vector \(\boldsymbol{v}\) (or polynomial using the vector identification), we write \(\Vert \boldsymbol{v}\Vert \) (resp. \(\Vert \boldsymbol{v}\Vert _\infty \)) to denote the \(\ell _2\) norm (resp. \(\ell _\infty \) norm) of \(\boldsymbol{v}\). We also write \(\boldsymbol{A}=(\boldsymbol{A}_1|\boldsymbol{A}_2)\) to denote the concatenation of two matrices \(\boldsymbol{A}_1\) and \(\boldsymbol{A}_2\), with the same number of rows.

2.2 The \(\textsf{PV}\) Knapsack Problem

Let \(\varOmega _t\) be a subset of \(t \le \lfloor n/2 \rfloor \) distinct random elements from \(\varOmega \). Let \(\boldsymbol{f}(\boldsymbol{x})\) be a polynomial in \(\mathcal {R}_q\) whose coefficients are sampled uniformly at random from the set \(\{{-1,0,1\}}\).

Definition 1

(\(\textsf{PV}\) Knapsack problem). Given \(\mathcal {R}_q\) and \(\{(\omega ,\boldsymbol{f}(\omega ))\ |\ \omega \in \varOmega _t \},\) recover \(\boldsymbol{f}(\boldsymbol{x})\).

Instead of identifying a \(\textsf{PV}\) Knapsack instance by \(\varOmega _t\), it is often simpler to identify it by the corresponding index set \(S_t\subset U(n)\). The \(\textsf{PV}\) Knapsack instance then becomes \(\{{(i,\boldsymbol{f}(\omega _1^i)\ | \ i \in S_t}\}\) for some arbitrary primitive root \(\omega _1\) of \(\boldsymbol{g}\).

Remark 2

When n is prime, we have to assume the evaluation of \(\boldsymbol{f}\) at 1 is never included, as this provides a simple distinguishing attack on the \(\textsf{PV}\) Knapsack problem. This explains why we choose \(\varOmega \) to only contain primitive roots.

2.3 Lattice Reduction

Any (full rank) matrix \(\boldsymbol{B} \in \mathbb {Z}^{n \times n}\) generates a lattice \(\mathcal {L}\) of dimension n, which is the set \(\mathcal {L}(\boldsymbol{B})=\{{ \boldsymbol{Bz}: \boldsymbol{z} \in \mathbb {Z}^{n}\}}\). A lattice is called q-ary when it contains \(q\mathbb {Z}^n\) as a sublattice. The volume of the lattice \(\mathcal {L}(\boldsymbol{B})\) is defined as \(\textsf {Vol}=|\det (\boldsymbol{B})|\).

The key computational problem involving lattices is to find the shortest non-zero vector (\(\textsf{SVP}\)) in the lattice \(\mathcal {L}\). Minkowski’s theorem yields the following upper bounds on the norms of the shortest non-zero vector \(\boldsymbol{v}\) of any lattice of dimension n and volume \(\textsf {Vol}\):

$$ \Vert \boldsymbol{v}\Vert _\infty \le \textsf {Vol}^{1/n} \quad \text{ and }\quad \Vert \boldsymbol{v}\Vert \le \sqrt{n}\,\textsf {Vol}^{1/n}. $$

Definition 2

(q -ary Kernel lattice). Let \(\boldsymbol{A}\in \mathbb {Z}^{t \times n}_q\) be any (full rank) matrix with \(n >t\). We define the q-ary Kernel lattice of \(\boldsymbol{A}\) as

$$ \mathcal {L}^{\perp }_{\boldsymbol{A},q}=\{{\boldsymbol{v} \in \mathbb {Z}^{ n}: \boldsymbol{A}\boldsymbol{v}=0 \bmod q\}}. $$

If we write \(\boldsymbol{A}=(\boldsymbol{A}_1|\boldsymbol{A}_2)\), where \(\boldsymbol{A}_1 \in \mathbb {Z}^{t \times t}_q, \boldsymbol{A}_2 \in \mathbb {Z}^{t \times n-t}_q\), then assuming that \(\boldsymbol{A}_1\) is invertible, \(\mathcal {L}^{\perp }_{\boldsymbol{A},q}\) has a basis

$$ \begin{pmatrix} q\boldsymbol{I}_t &{}&{} -\boldsymbol{A}^{-1}_1\boldsymbol{A}_2\\ \boldsymbol{0} &{} &{} \boldsymbol{I}_{n-t} \end{pmatrix}. $$

If \(\boldsymbol{A}_1\) is not invertible, we can simply re-order the columns to make \(\boldsymbol{A}\) start with a \(t\times t\) invertible matrix. The lattice \(\mathcal {L}^{\perp }_{\boldsymbol{A},q}\) has dimension n and volume \(q^t\). Finding a short vector in this lattice, i.e., a short element in the kernel of \(\boldsymbol{A}\), is usually referred to as the short integer solution (\(\textsf{SIS}\)) problem.

Let \(\lambda _i\) denotes the smallest radius of a closed ball containing at least i linearly independent vectors in the lattice \(\mathcal {L}\). If \(\lambda _2 >\gamma \lambda _1\) for some \(\gamma \ge 1\), then the lattice contains a \(\gamma \)-unique \(\textsf{SVP}\) \((\textsf{uSVP})\) solution.

Definition 3

(\(\textsf{uSVP}_\gamma \)problem). Given a lattice \(\mathcal {L}\), with the promise that \(\lambda _2 >\gamma \lambda _1\) for \(\gamma \ge 1\), the \(\textsf{uSVP}_{\gamma }\) problem asks to find \(\boldsymbol{v}\) such that \(\Vert \boldsymbol{v}\Vert =\lambda _1\). The \(\gamma \) is referred to as the uniqueness gap of the \(\textsf{uSVP}\) problem.

In order to find short solutions in a lattice, we rely on the lattice reduction algorithms. LLL [13] is a polynomial time algorithm, but only gives an exponential approximation solution. For cryptanalysis, it is often required for better solutions, which is done using stronger (and so slower) lattice reduction algorithms. For our purpose, we use the implementation of the Blockwise Korkine-Zolotarev (BKZ) algorithm [16] given in the fplll software [17].

According to the analysis of [10], the \(\textsf{uSVP}\) problem, with uniqueness gap \(\gamma \) in dimension n, can be solved using a lattice reduction algorithm that achieves a root Hermite factor close to \(\delta =\gamma ^{1/n}\). In particular, when \(\gamma \) is large enough, the value of \(\delta \) becomes achievable with practical lattice reduction [1, 3]. With high enough values of \(\delta \), the \(\textsf{uSVP}\) problem becomes efficiently solvable.

3 Previous Attacks

In this section, we briefly describe the attacks that were considered in prior works. Based on the approach of the attacks, we can characterise them as primal and dual attacks in the context of the \(\textsf{PV}\) Knapsack problem.

3.1 Direct Primal Attack [11]

The problem can be expressed as a structured variant of the low-density inhomogeneous \(\textsf{SIS}\) (or \(\textsf{LWE}\)) problem by expressing the evaluation \(\boldsymbol{f}(\omega )\) in terms of powers of \(\omega \), which is stated below.

Given a (partial) Vandermonde matrix \(\boldsymbol{V}\in \mathbb {Z}^{t \times n}_q\) (with rows generated by powers of \(\omega \) for \(\omega \in \varOmega _t\)) and \(\boldsymbol{b}\in \mathbb {Z}^{t}_q \) (with elements \(\boldsymbol{f}(\omega )\)), find \(\boldsymbol{f}\) with \(\Vert \boldsymbol{f}\Vert _\infty \le 1\) such that

$$\begin{aligned} \boldsymbol{V}\boldsymbol{f}=\boldsymbol{b}\bmod q, \end{aligned}$$
(1)

The authors proposed the strategy of finding the \(\textsf{uSVP}\) solution (following Kannan’s embedding [15]) on the kernel lattice

$$ \mathcal {L}^{\perp }_{\boldsymbol{V}',q}=\{{\boldsymbol{v} \in \mathbb {Z}^{ n+1}:\boldsymbol{V}' \boldsymbol{v}=\boldsymbol{0} \bmod q\}} $$

where \(\boldsymbol{V}'=(\boldsymbol{V}|\boldsymbol{b})\). Note that, \((\boldsymbol{f}|-1)^T\) is a vector in the lattice \(\mathcal {L}^{\perp }_{\boldsymbol{V}'q}\), which is a solution to the \(\textsf{uSVP}\) problemFootnote 1. In practice, this direct attack is used as a baseline to choose the parameters, with the understanding that they should be selected to ensure that finding the \(\textsf{uSVP}\) solution remains intractable both on classical and quantum computers.

3.2 Dual Attack [4]

Here, we give a simplified version of the attack proposed in [4]. For this purpose, we need to restate the \(\textsf{PV}\) Knapsack problem as an instance of the Bounded Distance Decoding (\(\textsf{BDD}\)) problem in the following manner. Let \(\boldsymbol{z}\in \mathbb {Z}^n\) be any solution to the system of linear equations satisfying \(\boldsymbol{V}\boldsymbol{z}=\boldsymbol{b}\mod q\), then the \(\textsf{PV}\) Knapsack problem asks to find \(\boldsymbol{u}\in \mathcal {L}^{\perp }_{\boldsymbol{V},q}\) (i.e., \(\boldsymbol{V}\boldsymbol{u}=\boldsymbol{0} \bmod q\)) such that \(\Vert \boldsymbol{u}-\boldsymbol{z}\Vert _\infty \le 1\), i.e., the vector \(\boldsymbol{u}-\boldsymbol{z}=\boldsymbol{f}\). Algebraically, the element \(\boldsymbol{u}(\boldsymbol{x})\) belongs to the ideal \(I_{\varOmega _t}\) of \(\mathcal {R}_q\), where \(I_{\varOmega _t}=\prod _{\omega \in \varOmega _t} (\boldsymbol{x}-\omega )\). So, the \(\textsf{PV}\) Knapsack problem can be considered a \(\textsf{BDD}\) problem in the ideal \(I_{\varOmega _t}\).

Let \(\boldsymbol{u}'(x)\) be an element in the ideal \(I_{\varOmega \setminus \varOmega _t}=\prod _{\omega ' \in \varOmega \setminus \varOmega _t} (\boldsymbol{x}-\omega ')\) with “somewhat” small norm.Footnote 2 Then the product \(\boldsymbol{u}'(\boldsymbol{x})\boldsymbol{z}(\boldsymbol{x})=\boldsymbol{u}'(\boldsymbol{x}) \boldsymbol{f}(\boldsymbol{x})\) in \(\mathcal {R}_q\) is expected to be smallFootnote 3 (with coefficients \(< q/4\)) for a \(\textsf{PV}\) Knapsack instance – a highly unlikely event for a uniform instance. This gives a key distinguishing attack. The cost of the dual attack depends on finding the small \(\boldsymbol{u}'(x)\), which can be improved using the algebraic methods.

Let \(\varOmega _{2t_1}\) be the largest subset of \(\varOmega _t\) that remains invariant under the computation of inverses. In other words, \(\varOmega _{2t_1}\) contains \(t_1\) pairs \((\omega , \omega ^{-1})\) with both \(\omega \) and \(\omega ^{-1}\) in \(\varOmega _t\). This set \(\varOmega _{2t_1}\) is easily constructed by removing any element of \(\varOmega _t\) whose inverse is not in \(\varOmega _t\). Thus, \(2t_1\le t\) and \(t_1 \le \lfloor t/2 \rfloor \). By our choice of \(\boldsymbol{g}\), all the roots in \(\varOmega \) can be paired with their inverse. As a consequence, the complement set \(\varOmega \setminus \varOmega _{2t_1}\) is also made of such pairs.

This symmetry can be leveraged to find a small element in the ideal \(I_{\varOmega \setminus \varOmega _{2t_1}}\), by looking for a small polynomial \(\boldsymbol{u}'(\boldsymbol{x}) \in I_{\varOmega \setminus \varOmega _{2t_1}}\) with the extra requirement that:

$$ \text{ for } \text{ all }\ \omega \in \varOmega \ :\quad \boldsymbol{u}'(\omega )=\boldsymbol{u}'(\omega ^{-1}). $$

This is easily achieved by creating \(\boldsymbol{u}'(\boldsymbol{x})\) using a basis of halved dimension obtained from the symmetrisation of \(\{{1,\boldsymbol{x},\dots ,\boldsymbol{x}^{\lfloor n/2 \rfloor }}\}\). For such a polynomial, when \(\boldsymbol{x}\) is a root, so is \(1/\boldsymbol{x}\). Thus we can guarantee that \(\boldsymbol{u}'\) vanishes on \(\varOmega _{2t_1}\) using only \(t_1\) linear conditions. As a consequence, a small \(\boldsymbol{u}'(\boldsymbol{x})\) can be found using lattice reduction in a (Kernel) lattice of reduced dimension.

Furthermore, if \(t_1\) is not too small, the \(\textsf{PV}\) Knapsack problem still reduces to a \(\textsf{BDD}\) problem in the ideal \(I_{\varOmega _{2t_1}}\). The condition of \(t_1\) not being too small comes from considering the volume of the lattice, which decreases with \(t_1\) and needs to be large enough for the reduction to \(\textsf{BDD}\) work. When \(t_1\) is sufficient, considering the product \(\boldsymbol{u}'(\boldsymbol{x})\boldsymbol{z}(\boldsymbol{x})\) again gives a distinguishing attack. The authors of [4] (experimentally) show that this occurs with non-negligible probability and thus improves the cost of solving the decisional \(\textsf{PV}\) Knapsack problem.

As an extension of this attack, and for some choices of n, one can also aim at exploiting higher order symmetries to reduce to a lattice problem of even smaller dimension. Unfortunately, in general, this reduces the number of evaluations at the roots after symmetrisation too much. So the reduction to \(\textsf{BDD}\) no longer works. However, if \(\varOmega _t\) can be adversarially chosen, we obtain a degraded version of the \(\textsf{PV}\) Knapsack problem. This is called a worst-case \(\varOmega _t\) in [4]. In this worst-case, \(\varOmega _t \) contains a large subset \(\varOmega _{rt_0}\), which remains invariant under a symmetry of order r (instead of 2).Footnote 4 With such a forced symmetry inside, this allows the \(\textsf{PV}\) Knapsack to remain a \(\textsf{BDD}\) instance in the ideal \(I_{\varOmega _{rt_0}}\).

Like before, the set \(\varOmega \setminus \varOmega _{rt_0}\) also remains invariant under the transformation, so the problem of finding a short solution \(\boldsymbol{u}'\) in the ideal \(I_{\varOmega /\varOmega _{rt_0}}\) can be reduced to a lattice of (very) small dimension. Since finding an \(\textsf{SVP}\) solution is known in such a small dimension, e.g., using LLL lattice reduction algorithm, then, (hopefully) the product \(\boldsymbol{u}'(\boldsymbol{x})\boldsymbol{z}(\boldsymbol{x}) \in \mathcal {R}_q\) has all coefficients \(<q/2\) in absolute value (i.e., no wrap-around modulo q happens for the product polynomial), which also gives a key recovery attack.

Because of this worst-case, it appears that the uniformly random choice of \(\varOmega _t\) makes more sense in the definition of the \(\textsf{PV}\) Knapsack problem. This approach is used in [5, 14], while [11] doesn’t explicitly mention the choice of \(\varOmega _t\). In the rest of the paper, we concentrate on the key recovery attack for a uniformly random \(\varOmega _t\).

4 Our Contribution

Our main goal is to find an alternative dimension reduction strategy working with the primal attack instead of the dual attack. Indeed, the primal attack corresponds to a \(\textsf{uSVP}\) instance, which is believed to be comparatively easier to solve than an \(\textsf{SVP}\) instance, both in theory [15], and in practice [1, 3, 10].

We achieve this goal by proposing a new dimension reduction primal attack on the \(\textsf{PV}\) Knapsack problem. For this, we exploit the symmetries of the ring \(\mathcal {R}_q\) in a new way. This allows us to solve several \(\textsf{PV}\) Knapsack instances from the literature in a reasonable time, faster than what was previously thought to be possible.

As in [4], we consider the largest subset \(\varOmega _{2t_1}\) of \(\varOmega _t\) that remains invariant under the computation of inverses. For any \(\omega \) in \(\varOmega _{2t_1}\), we know the evaluation of \(\boldsymbol{f}\) both at \(\omega \) and \(\omega ^{-1}\). Hence we can compute \(\boldsymbol{f}(\omega )\pm \boldsymbol{f}(\omega ^{-1})\). This gives \(t_1\) distinct evaluations of the two polynomials \(\boldsymbol{f}(\boldsymbol{x})\pm \boldsymbol{f}(1/\boldsymbol{x})\) at \(\omega \in \varOmega _{2t_1}\). We aim to recover \(\boldsymbol{f}(\boldsymbol{x}) \pm \boldsymbol{f}(1/\boldsymbol{x})\) as \(\textsf{uSVP}\) solutions from lattices of smaller dimensions and do the linear algebra to recover the secret \(\boldsymbol{f}(\boldsymbol{x})\).

Let \(\boldsymbol{\psi }_{+}(\boldsymbol{x})=\boldsymbol{f}(\boldsymbol{x})+ \boldsymbol{f}(1/\boldsymbol{x})\) and \(\boldsymbol{\psi }_{-}(\boldsymbol{x})=\boldsymbol{f}(\boldsymbol{x})- \boldsymbol{f}(1/\boldsymbol{x})\). The polynomials \(\boldsymbol{\psi }_{+}(\boldsymbol{x})\), and \(\boldsymbol{\psi }_{-}(\boldsymbol{x})\) can be generated by a basis of order \(n_{+}=\lceil n/2 \rceil \) and \(n_{-}=\lfloor n/2 \rfloor \), respectively. These bases are easy to compute from the polynomial basis. Also, if \(\boldsymbol{f}(\boldsymbol{x})\) has coefficients in the set \(\{{-1,0,1\}}\), \(\boldsymbol{\psi }_{\pm }(\boldsymbol{x})\) has coefficients in the set \(\{{-2,-1,0,1,2\}}\). Then the \(\textsf{PV}\) Knapsack problem reduces to two independent problems of finding \(\boldsymbol{\psi }_{\pm }(\boldsymbol{x})\) from \(t_1\) evaluations. This can be achieved by recovering \(\textsf{uSVP}\) solutions in lattices of dimensions \(n_{\pm }\).

There are a few important observations from the above attack.

  1. 1.

    The cost of recovering \(\boldsymbol{\psi }_{\pm }\) as a \(\textsf{uSVP}\) solution (using lattice reduction algorithm) depends on the volume of the lattice in reduced dimension. The volume is proportional to the number of distinct evaluations \(t_1\), which makes the problem easier as \(t_1\) increases. When \(\varOmega _t\) is randomly chosen, the value of \(t_1\) is randomised. If the system is used by many users, each one with its own set \(\varOmega _t\) , some of them will pick weak keys, i.e., weak sets \(\varOmega _t\), which are easier to attack because of their larger value of \(t_1\). To analyse our attack, two ingredients are needed: an attack that works when \(t_1\) is large enough and a probability analysis of this weak-key event.

  2. 2.

    Note that, when a \(\textsf{PV}\) Knapsack instance is given, an adversary can compute \(t_1\) easily. This only requires reading \(\varOmega _t\) to detect pairs of the form \((\omega , \omega ^{-1})\). As a consequence, the adversary can focus on the keys that are easy enough to attack with lattice techniques. This can be done, for example, by using \(\textsf{LWE}\) estimators [2, 7] before starting the attack.

  3. 3.

    Since the two \(\textsf{uSVP}\) problems for finding \(\boldsymbol{\psi }_{\pm }\) are independent, the corresponding lattice reductions can be performed in parallel. Hence, the running time of the attack is directly obtained by estimating the cost of the largest of the two \(\textsf{uSVP}\) instances.

  4. 4.

    We also study symmetries of order \(>2\) and their application to a direct attack to solve \(\textsf{PV}\) Knapsack. Unfortunately, for random choices of \(\varOmega _t\), it turns out that the symmetry of order 2 is optimal for the parameters proposed in the literature.

In Sect. 5, we formally describe the attack sketched above. In Sect. 6, we provide experimental results that indicate that several proposed instances of the \(\textsf{PV}\) Knapsack problem can be solved in practice. In Sect. 7, we give a generalized version of the attack using symmetries of higher order. We hope that despite their inefficiency for the random case, their analysis can be of independent interest.

5 Proposed Attack

In this section, we propose a new key recovery attack on the \(\textsf{PV}\) Knapsack problem. The key idea is to use symmetry in a new way, thanks to the following lemma.

Lemma 1

Let \(\boldsymbol{f}(\boldsymbol{x})\) be any polynomial in \(\mathcal {R}_q\), then \(\boldsymbol{\psi }_{\pm }(\boldsymbol{x})=\boldsymbol{f}(\boldsymbol{x})\pm \boldsymbol{f}(1/\boldsymbol{x})\) can be generated by a basis of order \(n_\pm \), where \(n_{+}=\lceil n/2 \rceil \) and \(n_{-}=\lfloor n/2 \rfloor \). Moreover, if the coefficients of \(\boldsymbol{f}(\boldsymbol{x})\) are sampled uniformly at random from the set \(\{{-1,0,1\}}\), then the expected squared-norm of \(\boldsymbol{\psi }_{\pm }(\boldsymbol{x})\) is upper-bounded by \(4n_\pm /3\) in the new basis representation.

Proof

The mapping

$$ \boldsymbol{x}^i \rightarrow \boldsymbol{x}^i+1/\boldsymbol{x}^i \text { for } 0 \le i \le \lfloor n/2 \rfloor $$

is well-defined. Hence, by linearity, the polynomial \(\boldsymbol{\psi }_{+}(\boldsymbol{x})=\boldsymbol{f}(\boldsymbol{x})+\boldsymbol{f}(1/\boldsymbol{x})\) can be generated by a basis of order \(n_{+}\), as required. In particular, for prime n, since \(1/\boldsymbol{x}=\boldsymbol{x}^{n-1}\), \(\boldsymbol{\psi }_{+}(\boldsymbol{x})\) is generated by the basis

$$\{{2,(\boldsymbol{x}+\boldsymbol{x}^{n-1}),\dots ,(\boldsymbol{x}^{\lfloor n/2 \rfloor }+\boldsymbol{x}^{{\lfloor n/2 \rfloor }+1})\}}$$

For power of two n, since \(1/\boldsymbol{x}=-\boldsymbol{x}^{n-1}\), \(\boldsymbol{\psi }_{+}(\boldsymbol{x})\) is generated by the basis

$$\{{2,(\boldsymbol{x}-\boldsymbol{x}^{n-1}),\dots ,(\boldsymbol{x}^{(n/2-1)}-\boldsymbol{x}^{ (n/2 +1)}}\}$$

Similarly, the mapping

$$ \boldsymbol{x}^i \rightarrow \boldsymbol{x}^i-1/\boldsymbol{x}^i \text { for } 1 \le i \le \lfloor n/2 \rfloor $$

is well-defined. Hence, by linearity, the polynomial \(\boldsymbol{\psi }_{-}(\boldsymbol{x})=\boldsymbol{f}(\boldsymbol{x})-\boldsymbol{f}(1/\boldsymbol{x})\) can be generated by a basis of order \(n_{-}\), as required. In particular, for prime n, \(\boldsymbol{\psi }_{-}(\boldsymbol{x})\) is generated by the basis

$$\{{(\boldsymbol{x}-\boldsymbol{x}^{n-1}),\dots ,(\boldsymbol{x}^{\lfloor n/2 \rfloor }-\boldsymbol{x}^{{\lfloor n/2 \rfloor }+1})\}}$$

For power of two n, \(\boldsymbol{\psi }_{-}(\boldsymbol{x})\) is generated by the basis

$$\{{(\boldsymbol{x}+\boldsymbol{x}^{n-1}),\dots ,(\boldsymbol{x}^{ n/2 -1}+\boldsymbol{x}^{ n/2 +1}),2 \boldsymbol{x}^{n/2}\}}$$

If individual coefficients of \(\boldsymbol{f}\) are uniformly sampled from \(\{{-1,0,1\}}\), then sums of symmetric coefficients \(f_i+f_{n-i}\) are in \(\{{-2,-1,0,1,2\}}\) and follow the probability distribution given in Table 1.

Table 1. Probability distribution of \(f_i+f_{n-i}\)

Now, if \(\boldsymbol{f}(\boldsymbol{x})\) is sampled uniformly with ternary coefficients, most coefficients of \(\boldsymbol{\psi }_\pm (\boldsymbol{x})\) follow the distribution of \(f_i+f_{n-i}\). The exceptions being the special coefficients associated to 2 and \(2 \boldsymbol{x}^{n/2}\) which follow the initial uniform distribution in \(\{{-1,0,1\}}\) and have a lower expectation of their squares. Hence, by linearity of expectations, the expected squared-norm of \(\boldsymbol{\psi }_\pm \) in the new basis representation is upper-bounded by \({4n_{\pm }/3}\).

This allows us to design a new low-density inhomogeneous \(\textsf{SIS}\) problem corresponding to the evaluation of \(\boldsymbol{\psi }_\pm \) at \(t_1\) values. In order to do this, let us create a matrix \(\boldsymbol{W}_{\pm }\) with \(t_1\) rows and \(n_{\pm }\) columns, whose entries are the evaluations of each of the \(n_{\pm }\) monomials at an arbitrary choice of \(t_1\) representatives for the pairs \((\omega ,\omega ^{-1})\) that occur in \(\varOmega _{2t_1}.\) We also create a vector \(\boldsymbol{b}_{\pm }\) whose coefficients are the known evaluations of \(\boldsymbol{\psi }_\pm \) at each of the representative. With these notations, we look for a solution of:

$$\begin{aligned} \boldsymbol{W}_{\pm } \boldsymbol{\psi }_\pm =\boldsymbol{b}_{\pm } \bmod q. \end{aligned}$$
(2)

Following the same strategy as in the direct primal attack, we search for a short vector in the kernel lattice:

$$ \mathcal {L}^{\perp }_{\boldsymbol{W}_{\pm }',q}=\{{\boldsymbol{v} \in \mathbb {Z}^{ n+1}:\boldsymbol{W}_{\pm }' \boldsymbol{v}=\boldsymbol{0} \bmod q\}} $$

where \(\boldsymbol{W}_{\pm }'=(\boldsymbol{W}_{\pm }|\boldsymbol{b}_{\pm })\). As before, \((\boldsymbol{\psi }_\pm |-1)^T\) is a very short vector in the lattice and we expect that it yields a \(\textsf{uSVP}\) solution.

5.1 Analysis of the New Attack

As we already mentioned, to analyse the attack, we need two ingredients. First, given \(t_1\), the number of pairs in \(\varOmega _t\), we need to estimate the cost of successfully conducting the \(\textsf{uSVP}\) computation. Second, for a random set \(\varOmega _t\), we need to compute the probability of occurrence of a given value of \(t_1\). Since from a public key, \(t_1\) can be computed extremely efficiently, this probability directly corresponds to the fraction of users that can be attacked with the corresponding \(\textsf{uSVP}\) problem.

Cost of the \(\boldsymbol{\textsf{uSVP}}\) resolution. Thanks to the analysis of [10], we know that the cost of solving a \(\textsf{uSVP}\) problem mostly depends on the root Hermite factor that can be computed from the uniqueness gap \(\gamma \). We recall that this factor is \(\delta =\gamma ^{1/n}\).

In our attack, we do not really have a promise problem. However, since the lattices we consider come from a cryptographic problem, we can follow a standard heuristic approach and assume that they behave as randomly as it can. More precisely, both for the direct primal attack of [11] and for our new attack with dimension reduction, we consider a lattice in which a vector of short length is guaranteed. The heuristic we use is to consider that other (linearly independent) vectors in the lattice have a length which can be estimated from Minkowski’s bound. In other words, given its volume V and dimension d, we estimate the value of \(\lambda _2\) to be \(\sqrt{d}\,V^{1/d}\). To estimate \(\lambda _1\), we use the square-root of the expected squared-norm. Putting the two estimations together, it just remains to compute \(\gamma =\lambda _2/\lambda _1\) and take its d-th root to obtain the corresponding \(\delta \).

Recall, in the (full) primal attack, the \(\textsf{PV}\) Knapsack gives a \(\textsf{uSVP}\) instance in dimension \(n+1\), and volume \(q^{t}\). We also have a short vector of expected squared-norm \(2n/3+1\). As a consequence, the corresponding root Hermite factor can be estimated by:

$$ \delta _{\text{ full }}= \left( \frac{\sqrt{n+1}\,q^{t/(n+1)}}{\sqrt{2n/3+1}} \right) ^{1/(n+1)}. $$

Similarly, in our attack, we get two lattices of dimensions \(n_{\pm }+1\) and volume \(q^{t_1}\). In that case, the expected squared-norm of the shortest vector is \(4n_{\pm }/3+1.\) Thus, we get an estimation of:

$$ \delta _{\pm \text{ new }}=\left( \frac{\sqrt{n_{\pm }+1}\,q^{t_1/(n_{\pm }+1)}}{\sqrt{4n_{\pm }/3+1}} \right) ^{1/(n_{\pm }+1)}. $$

Following [10], we need to compare \(\delta _{\text{ full }}\) and \(\delta _{\pm \text{ new }}\) to know when the new attack beats the full primal attack. To do the comparison, we slightly simplify the expression, replacing \(n_{\pm }\) by n/2 and any instance of \(n+1\) by n or \(n_{\pm }+1\) by \(n_{\pm }\). After the simplification, we expect the new attack to become faster as soon as:

$$ \left( \sqrt{3/2} q^{t/n}\right) ^{1/n}< \left( \sqrt{3/4}q^{2t_1/n}\right) ^{2/n}. $$

Ignoring the small constants, this happens when \(4 t_1>t\).

This estimation is somewhat pessimistic. Indeed, the dimension of the lattice also counts when using lattice reduction, so even when the root Hermite factors are equal, the newer lattice should be easier to reduce due to its smaller dimension.

Distribution of \(\boldsymbol{t}_{\boldsymbol{1}}\). To study the probability distribution of \(t_1\), we perform a standard combinatorial analysis. The total number of sets \(\varOmega _t\) of t elements chosen from the primitive roots is:

$$\begin{aligned} {\left( {\begin{array}{c} 2\, \lfloor n/2 \rfloor \\ t\end{array}}\right) }. \end{aligned}$$

When \(t_1\) is a fixed integer in \([0,\lfloor t/2 \rfloor ]\), to choose a set of size t with exactly \(t_1\) pairs, we need to take \(t_1\) pairs from the \(\lfloor n/2 \rfloor \), followed by \(t-2t_1\) unpaired elements in the remaining pairs. Thus, the total number of possibilities is:

$$\left( {\begin{array}{c}\lfloor n/2 \rfloor \\ t_1\end{array}}\right) \left( {\begin{array}{c}\lfloor n/2 \rfloor -t_1\\ t-2t_1\end{array}}\right) 2^{t-2t_1}.$$

As a consequence, the probability of getting \(t_1\) for a random \(\varOmega _t\) is:

$$ \pi _1(t_1)=\frac{\left( {\begin{array}{c}\lfloor n/2 \rfloor \\ t_1\end{array}}\right) \left( {\begin{array}{c}\lfloor n/2 \rfloor -t_1\\ t-2t_1\end{array}}\right) 2^{t-2t_1} }{\left( {\begin{array}{c} 2\, \lfloor n/2 \rfloor \\ t\end{array}}\right) }. $$

When \(t=n/2\), the distribution of the values of \(t_1\) is strongly concentrated around t/4, which is precisely the tipping point between the direct primal attack and our new attack. This is illustrated by Fig. 1. However, we see that for this typical case, \(t_1\) can deviate from t/4. This explains the existence of weak instances vulnerable to our attack.

Fig. 1.
figure 1

\(\pi _1(t_1)\) for \(n=512,t=256\)

6 Experimental Results

In this section, we analyse the effect of our attack on the concrete hardness of the problem used in the literature. We ran all our experiments on an Intel Xeon CPU E5-2683 v4 @ 2.10GHz 1200 MHz processor. The attack only depends on the value of \(t_1\), and not on the choice of elements in \(\varOmega _{2t_1}\). So to perform experiments on our attack, we first fix a value of \(t_1\). Then we sample a uniform primitive root \(\omega _1\) of \(\boldsymbol{g}\) and characterise \(\varOmega _{2t_1}\) by a random index set \(S_{t_1} \subset U(n)\) of size \(t_1\) (distinct up to negation). The lattice reduction algorithms are performed in parallel with fplll software [17].

The running time of a BKZ lattice reduction algorithm is exponential on the blocksize. In [6], the authors experimentally observed that most of the progress is made in the initial rounds of the BKZ reductions for the (relatively) large blocksize. In our experiments, the running time is the time taken by the lattice reduction algorithm to discover the secret.

Since it is not feasible to run lattice reductions for every parameter, following the common practice, we use \(\textsf{LWE}\) estimators [2, 7] to predict the running time of several instances. The \(\textsf{LWE}\) estimators heuristically predict the lattice reduction strength (which is characterised by the block size of the BKZ algorithm) required to find the secret in the primal attack.

Table 2. Parameters: \(\mathsf {PASS_{RS}}\) [11]

6.1 \(\mathsf {PASS_{RS}}\) Signature from [11]

In this paper, the authors proposed \(\mathsf {PASS_{RS}}\) signature scheme following the Fiat-Shamir with aborts strategy on the hardness of the \(\textsf{PV}\) Knapsack problem. The scheme is defined for the prime n case of the problem. The proposed parameters are given in Table 2.

The \(\lambda \) in the Table is the claimed bit security in the proposal. The \(\lambda ^*\) is the re-evaluated bit security in the direct primal attack using the \(\textsf{LWE}\) estimator [2],Footnote 5 except for HPSSW1.

Table 3. Experimental results of our attack on the weak keys of HPSSW2.
Table 4. Predicted cost of our attack on the weak keys of HPSSW3 using the \(\textsf{LWE}\) estimator [2].

For HPSSW1, the bit security is achieved experimentally; we recovered the secret within 25 h (\(2^{47}\)-bits operation) using BKZ block size 55. For this reason, we have excluded it in our attack analysis.

Table 5. Predicted cost of our attack on the weak keys of HPSSW4 using the \(\textsf{LWE}\) estimator [2].

We ran experiments of our attack on the HPSSW2 weak keys. The experimental results are given in Table 3. For the other parameters, we use the \(\textsf{LWE}\) estimator from [2], the details are given in Table 4 and Table 5.

Table 6. Parameters: Signature scheme [14]

6.2 Signature Scheme from [14]

In this paper, the authors proposed a signature scheme on the hardness of the \(\textsf{PV}\) Knapsack problem following the \(\mathsf {PASS_{RS}}\) signature scheme, but for the power-of-two n case. The proposed parameters are given in Table 6. The \(\lambda ^*\) is computed using the \(\textsf{LWE}\) estimator [2].

Remark 3

Because of the huge difference between \(\lambda \) and \(\lambda ^*\), it is important to look for the source of the discrepancy. The best explanation we found is that the analysis in [14] apparently considers the dimension of the lattice in the direct primal attack as \(n+t+1\) (Sect. 4 [14]), instead of \(n+1\).

We ran experiments of our attack on the LZA1 weak keys. The experimental results are given Table 7. For LZA2, we use the \(\textsf{LWE}\) estimator from [2], the details are given in Table 8.

Table 7. Experimental results of our attack on the weak keys of LZA1.
Table 8. Predicted cost of our attack on the weak keys of LZA2 using the \(\textsf{LWE}\) estimator [2].

6.3 \(\textsf{PASS Encrypt}\), \(\textsf{PV Regev Encrypt}\) Schemes from [5]

In this paper, the authors proposed \(\textsf{PASS Encrypt}\), \(\textsf{PV Regev Encrypt}\) encryption schemes based on the hardness of the \(\textsf{PV}\) Knapsack problem. The schemes are defined for the power-of-two n case of the problem. While \(\textsf{PASS Encrypt}\) is a modified version of the encryption scheme proposed in [12], \(\textsf{PV Regev Encrypt}\) is a (partial) Vandermonde variant of the Regev-style encryption scheme. The proposed parameters are given in Table 9.

Table 9. Parameters: \(\textsf{PASS Encrypt}\), \(\textsf{PV Regev Encrypt}\) [5]

The concrete hardness of the parameters is computed using the \(\textsf{LWE}\) Leaky estimator [7]. The BKZ algorithm with block size \(\beta \) uses an \(\textsf{SVP}\) oracle in dimension \(\beta \); the running time is evaluated using the core \(\textsf{SVP}\) hardness, which is only the cost of one call to an \(\textsf{SVP}\) oracle in dimension \(\beta \). They further considered one \(\textsf{SVP}\) call cost \(2^{0.265 \beta }\) using a quantum algorithm. We also used the same estimation model for analysing the hardness of the weak keys. The details are given in Table 10 and Table 11.

Table 10. Predicted cost of our attack on the weak keys of BSS1 using the \(\textsf{LWE}\) Leaky estimator [7].
Table 11. Predicted cost of our attack on the weak keys of BSS2 using the \(\textsf{LWE}\) Leaky estimator [7].

7 Symmetries of Higher Order

In this section, we illustrate a generalized version of our attack to symmetries of higher order by going to order 3. It is straightforward to go to other orders. The case of order 3 naturally arises from the concrete parameters of [11]. Indeed, they use a prime n satisfying \(n-1=0\bmod 3\) to do the fast Fourier transformation.

Lemma 2

Let n be a prime satisfying \(n-1=0 \bmod 3\), and let \(\theta \) be an element of order 3 in U(n), i.e., \(\theta ^3=1\) in U(n). For any polynomial \(\boldsymbol{f}(\boldsymbol{x})\in \mathcal {R}_q=\mathbb {Z}_q[x]/(\boldsymbol{x}^n-1)\), \(\boldsymbol{\psi }_1(\boldsymbol{x})=\boldsymbol{f}(\boldsymbol{x})+\boldsymbol{f}(\boldsymbol{x}^{\theta })+\boldsymbol{f}(\boldsymbol{x}^{\theta ^2})\) can be generated by a basis of order \(n_{\theta }=\lceil n/3 \rceil \). Moreover, if the coefficients of \(\boldsymbol{f}(\boldsymbol{x})\) are sampled uniformly at random from the set \(\{{-1,0,1\}}\), then the expected squared-norm of \(\boldsymbol{\psi }_1(\boldsymbol{x})\) is upper-bounded by \({2n_{\theta }}\) in the new basis representation.

Proof

Let a be any primitive element of the group U(n) (i.e., a generator of U(n)). Note that there are \(\phi (\phi (n))\) many such elements, where \(\phi (.)\) is Euler phi-function; we can pick any of those. Let \(k=(n-1)/3\) and \(\theta =a^k \in U(n)\). Then \(\theta \) is an element of order 3.

Note that the mapping

$$ \boldsymbol{x}^{a^i} \rightarrow \boldsymbol{x}^{a^i}+\boldsymbol{x}^{a^i \theta }+\boldsymbol{x}^{a^i \theta ^2}\quad \text{ for }\ 0 \le i \le k-1 $$

is well-defined, since U(n) is a disjoint union of:

$$ \{a^i\ |\ 0 \le i \le k-1\},\quad \{a^i\theta \ |\ 0 \le i \le k-1\},\quad \text{ and }\ \{a^i\theta ^2\ |\ 0 \le i \le k-1\}. $$

Hence, by linearity, the polynomial \(\boldsymbol{\psi }_1(\boldsymbol{x})=\boldsymbol{f}(\boldsymbol{x})+\boldsymbol{f}(\boldsymbol{x}^{\theta })+\boldsymbol{f}(\boldsymbol{x}^{\theta ^2})\) is generated by the basis \(\{{3, \boldsymbol{x}^{a^i}+\boldsymbol{x}^{a^i \theta }+\boldsymbol{x}^{a^i \theta ^2}\}} \text { for } 0 \le i \le k-1\) of order \(n_{\theta }=1+k\), as claimed.

If individual coefficients of \(\boldsymbol{f}\) are uniformly sampled from \(\{{-1,0,1\}}\), then sums of symmetric coefficients \(f_{a^i}+f_{a^i \theta }+f_{a^i \theta ^2}\) are in \(\{{-3,-2,-1,0,1,2,3\}}\) and follow the probability distribution given in Table 12. So the coefficients of \(\boldsymbol{\psi }_1(\boldsymbol{x})\) follow the distribution of \(f_{a^i}+f_{a^i \theta }+f_{a^i \theta ^2}\), except for the special coefficient associated to 3, which has a lower expectation of the square. Hence, by linearity of expectations, the expected squared-norm of \(\boldsymbol{\psi }_1\) in the new basis representation is upper-bounded by \({2n_{\theta }}\).

Table 12. Probability distribution of \(f_{a^i}+f_{a^i \theta }+f_{a^i \theta ^2}\)

However, this only gives us one polynomial \(\boldsymbol{\psi }_{1}\) in reduced dimension, which is essentially the equivalent of \(\boldsymbol{\psi }_{+}\) in the order 2 attack. We cannot directly construct an equivalent of \(\boldsymbol{\psi }_{-}\), so we use a different approach to get two other polynomials in reduced dimension.

Let us define \(\boldsymbol{f}_2(\boldsymbol{x})=\boldsymbol{x}\boldsymbol{f}(\boldsymbol{x})\), \(\boldsymbol{f}_3(\boldsymbol{x})=\boldsymbol{x}^2\boldsymbol{f}(\boldsymbol{x})\), and \(\boldsymbol{\psi }_2(\boldsymbol{x})=\boldsymbol{f}_2(\boldsymbol{x})+\boldsymbol{f}_2(\boldsymbol{x}^{\theta })+\boldsymbol{f}_2(\boldsymbol{x}^{\theta ^2})\), \(\boldsymbol{\psi }_3(\boldsymbol{x})=\boldsymbol{f}_3(\boldsymbol{x})+\boldsymbol{f}_3(\boldsymbol{x}^{\theta })+\boldsymbol{f}_3(\boldsymbol{x}^{\theta ^2})\). Then, if the coefficients of \(\boldsymbol{f}(\boldsymbol{x})\) are sampled uniformly at random from the ternary set, each \(\boldsymbol{\psi }_i\) has an expected squared-norm bounded by \({2n_{\theta }}\) in the new basis representation. Indeed, the choice of \(\boldsymbol{g}\) makes the coefficients of \(\boldsymbol{f}\), \(\boldsymbol{f}_2\), and \(\boldsymbol{f}_3\) only different shifts in the polynomial basis representation. As a result, by linearity and from the distribution of the sums of symmetric coefficients, each \(\boldsymbol{\psi }_i\) provides the same expected squared-norm in the new basis representation.

For the \(\textsf{PV}\) Knapsack problem, let \(\varOmega _{3t_2}\) is the largest subset of \(\varOmega _t\) that remains invariant under the transformation of \(\theta \). In other words, \(\varOmega _{3t_2}\) contains \(t_2\) triplets \((\omega , \omega ^{\theta },\omega ^{\theta ^2})\) with all \(\omega \), \(\omega ^{\theta }\), and \(\omega ^{\theta ^2}\) in \(\varOmega _t\), where \(t_2 \le \lfloor t/3 \rfloor \). For any \(\omega \) in \(\varOmega _{3t_2}\), we know the evaluations of \(\boldsymbol{f}\) at \(\omega \), \(\omega ^{\theta }\), and \(\omega ^{\theta ^2}\). Hence we can compute \(t_2\) distinct evaluations of each of the polynomials \(\boldsymbol{\psi }_i(\boldsymbol{x})\) at \(\omega \in \varOmega _{3t_2}\). This follows by writing \(\boldsymbol{\psi }_2(\boldsymbol{x})=\boldsymbol{x}\boldsymbol{f}(\boldsymbol{x})+\boldsymbol{x}^\theta \boldsymbol{f}(\boldsymbol{x}^{\theta })+\boldsymbol{x}^{\theta ^2}\boldsymbol{f}(\boldsymbol{x}^{\theta ^2})\) and \(\boldsymbol{\psi }_3(\boldsymbol{x})=\boldsymbol{x}^2 \boldsymbol{f}(\boldsymbol{x})+\boldsymbol{x}^{2\theta } \boldsymbol{f}(\boldsymbol{x}^{\theta })+\boldsymbol{x}^{2\theta ^2}\boldsymbol{f}(\boldsymbol{x}^{\theta ^2})\).

This also allows to design a low-density inhomogeneous \(\textsf{SIS}\) problems to solve \(\textsf{PV}\) Knapsack problem. We create a matrix \(\boldsymbol{W}_{\theta }\) with \(t_2\) rows and \(n_{\theta }\) columns, whose entries are the evaluations of each of the \(n_{\theta }\) monomials at an arbitrary choice of \(t_2\) representatives for the triplets in \(\varOmega _{3t_2}\). We create a vector \(\boldsymbol{b}_{i}\) whose coefficients are the known evaluations of \(\boldsymbol{\psi }_i\) at each representative. We look for a solution of:

$$\begin{aligned} \boldsymbol{W}_{\theta } \boldsymbol{\psi }_i=\boldsymbol{b}_{i} \bmod q. \end{aligned}$$
(3)

Like before, we search for a short vector in the Kernel lattice

$$ \mathcal {L}^{\perp }_{\boldsymbol{W}_{i}',q}=\{{\boldsymbol{v} \in \mathbb {Z}^{ n+1}:\boldsymbol{W}_{i}' \boldsymbol{v}=\boldsymbol{0} \bmod q\}} $$

where \(\boldsymbol{W}_{i}'=(\boldsymbol{W}_{\theta }|\boldsymbol{b}_{i})\), and we expect \((\boldsymbol{\psi }_i|-1)^T\) yields a \(\textsf{uSVP}\) solution. The knowledge of each \(\boldsymbol{\psi }_i\) gives \(n_{\theta }\) (independent) linear equations of the (unknown) coefficients of \(\boldsymbol{f}(\boldsymbol{x})\). So by doing linear algebra, we recover \(\boldsymbol{f}(\boldsymbol{x})\).

Distribution of \(\boldsymbol{t}_{\boldsymbol{2}}\). The total number of sets \(\varOmega _t\) of t elements chosen from the primitive roots is:

$$\begin{aligned} {\left( {\begin{array}{c} 3\, \lfloor n/3 \rfloor \\ t\end{array}}\right) }. \end{aligned}$$

When \(t_2\) is a fixed integer in \([0,\lfloor t/3 \rfloor ]\), to choose a set of size t with exactly \(t_2\) triplets, we need to take \(t_2\) triplets from the set of \(\lfloor n/3 \rfloor \) triplets, followed by \(t-3t_2\) non-triplets from the remaining triplets. Now, a non-triplet element can come as a combination of both pair and unpair. Thus, the total number of possibilities is:

$$\left( {\begin{array}{c}\lfloor n/3 \rfloor \\ t_2\end{array}}\right) \sum _{i=0}^{s} \left( {\begin{array}{c}\lfloor n/3 \rfloor -t_2\\ i\end{array}}\right) \left( {\begin{array}{c}\lfloor n/3 \rfloor -t_2-i\\ t-3t_2-2i\end{array}}\right) 3^{t-3{t_2}-i} .$$

where \(s=\min \{{\lfloor (t-3t_2)/2 \rfloor , \lfloor n/3 \rfloor -t_2\}}\). So the probability of getting \(t_2\) for a random \(\varOmega _t\) is:

$$ \pi _2(t_2)=\frac{\left( {\begin{array}{c}\lfloor n/3 \rfloor \\ t_2\end{array}}\right) \sum _{i=0}^{s} \left( {\begin{array}{c}\lfloor n/3 \rfloor -t_2\\ i\end{array}}\right) \left( {\begin{array}{c}\lfloor n/3 \rfloor -t_2-i\\ t-3t_2-2i\end{array}}\right) 3^{t-3{t_2}-i}}{\left( {\begin{array}{c} 3\, \lfloor n/3 \rfloor \\ t\end{array}}\right) }. $$

Comparison with Symmetries of Order \(\boldsymbol{2}\). We first would like to note that the worst-case keys, which are fully symmetric of higher order, clearly outperforms the order 2 symmetry attack. This is even clearer if one adversarially selects a key with symmetry of order 3 but no symmetry of order 2.

We keep this in mind; we now aim to compare the higher order symmetry with the order 2 symmetry for randomly selected keys. Let us start by comparing concrete examples of the attacks with symmetries of order 2 and 3.

For HPSSW2, when the value of \(t_2=42\), we have \(\pi _2(t_2)=2^{-26}\). In this case, we recovered the secret in 111 h (\(2^{49}\)-bits operation) using BKZ block size 68. With \(\pi _1(t_1)=2^{-26}\), we get the value of \(t_1=92\). In this case, we recovered the secret in 6.5 h (\(2^{45}\)-bits operation) using BKZ block size 58. Unfortunately, we never recovered the secret for a smaller value of \(t_2\) running lattice reductions for 7 days.

Similarly, we can do a comparison for HPSSW3, HPSSW4 by using the \(\textsf{LWE}\) estimator [2], it is shown in Fig. 2.

Fig. 2.
figure 2

Comparison: Predicted bits operation vs \(\pi _i(t_i)^{-1}\) (in \(\log _2\) scale) for the weak keys of HPSSW3 and HPSSW4.

On these three examples, it is clear that the order 2 attack performs better than the order 3 version. To understand why, let us consider a variant of the \(\textsf{PV}\) Knapsack problem, where the number of evaluation points t is close to \(p\,n\), instead of n/2. Here p is an element in (0, 1).

In that case, the direct attack involves a lattice of dimension \(n+1\) and volume \(q^{p\,n}.\) For the order 2 attack, we estimate the average of pairs to be \(p^2\,n/2\). As a consequence, the attack involves a lattice of dimension \(\lceil n/2\rceil +1\) and volume \(q^{p^2\,n/2}.\) For the order 3, we estimate the average of triplets to be \(p^3\,n/3\) and get an attack involving dimension \(\lceil n/3\rceil +1\) and volume \(q^{p^3\,n/3}.\) Ignoring constants, we can compare the root Hermite factors of the three attacks by looking at the three numbers:

$$\begin{aligned} p, \quad 2\,p^2, \text { and }\ 3\,p^3. \end{aligned}$$

The case \(p=1/2\) that we previously considered is the crossover point between the direct attack and the order 2 symmetry attack. Similarly, the crossover point between the direct attack and the order 3 symmetry attack is \(p=1/\sqrt{3}\approx 0.58\). Finally, the crossover point between the order 2 and order 3 attacks is \(p=2/3\approx 0.67\).

As a consequence, the higher symmetries only become worthwhile for random keys when the number of evaluation points in the \(\textsf{PV}\) Knapsack problem is much larger than what appears in practical parameters. We also see that by reducing the number of evaluation points below n/2, one can circumvent the gain provided by our main attack with symmetries of order 2.

Yet, since checking for symmetries is really fast, it cannot hurt a dedicated adversary to check their existence before launching the lattice reduction part of the attack.