1 Introduction

Implementations of cryptographic algorithms are typically optimized for one or multiple criteria, such as latency, throughput, power consumption, memory consumption, etc., but also criteria such as the cost of adding masking countermeasures to protect against side-channel attacks. It is worthwhile to spend time on this optimization, as the implementations are typically used many times. It is usually a hard problem to find an implementation that is actually theoretically minimal with respect to the criteria, e.g., general circuit minimization is \(\sum _2^P\)-complete [10]. However, for small functions this is still possible, using, for instance, SAT solvers. Especially for building blocks that can be used in multiple cryptographic algorithms, such as S-boxes, it is useful to look at methods for finding minimal implementations with respect to some given criteria.

In Sect. 2, we first discuss the simpler problem of finding minimal implementations of linear functions. We give a brief overview of methods for finding the shortest linear straight-line program.

We then move towards S-boxes and in Sect. 3 we consider known methods [13, 20] that manage to find minimal implementations for the relevant optimization criteria of multiplicative complexity [9], bitslice gate complexity [12], and gate complexity. The definitions of these criteria are given in Sect. 3. We study how feasible the methods actually are by applying them to S-boxes that are used in recent cryptographic algorithms, such as several candidates in the CAESAR competition and lightweight block ciphers. Additionally, we provide tools that allow anyone to conveniently do the same to other small S-boxes.

Then we look at another optimization criterion: the circuit depth complexity. This is relevant in hardware implementations to decrease the delay and to be able to increase the clock frequency. We suggest a new method for encoding the circuit depth complexity decision problem in SAT and we show how feasible this method is in practice by providing efficient low-depth S-box implementations for Joltik [17], Piccolo [22], LAC [23], Prøst [18], and RECTANGLE [24] in Sect. 3.5.

Finally, in Sect. 4 it is discussed how several optimization criteria can be combined, by first optimizing the S-box used by the PRIMATEs [2] for multiplicative complexity and then for gate complexity. This is done by taking the intermediate result after optimizing for multiplicative complexity, identifying the linear parts of this, and by treating these as instances of the shortest linear straight-line program problem.

Contributions of This Paper. To summarize, the contributions of this paper are

  • implementations of the S-boxes in Ascon, ICEPOLE, Joltik/Piccolo, Keccak/Ketje/Keyak, LAC, Minalpher, Prøst, and RECTANGLE with a provably minimal number of nonlinear gates;

  • a new method for encoding the circuit depth complexity decision problem as an instance of SAT;

  • optimized and sometimes even provably minimal implementations of the S-boxes in Joltik/Piccolo, LAC, Prøst, and RECTANGLE with respect to bitslice gate complexity, gate complexity, and circuit depth complexity;

  • a method to combine multiple optimization criteria;

  • an implementation of the S-box used by the PRIMATEs that is first optimized for multiplicative complexity and then for (bitslice) gate complexity;

  • tools and documentation to optimize implementations of small nonlinear functions such as S-boxes using SAT solvers, with respect to multiplicative complexity, bitslice gate complexity, gate complexity, or circuit depth complexity, are put into the public domain. These tools are available online.

2 The Shortest Linear Straight-Line Program Problem

Before tackling the optimization of S-boxes, let us restrict ourselves to linear functions and let us consider the Shortest Linear Program (SLP) problem over \(GF(2)\). Let \({\varvec{A}}\) be an \(m \times n\) matrix of constants over \(GF(2)\) and let \({\varvec{x}}\) be a vector of \(n\) variables over \(GF(2)\). The SLP problem is to find the program with the smallest number of lines that computes \({\varvec{A}}{\varvec{x}}\), where every program line is of a certain form.

Let \(Z\) be a set of variables over \(GF(2)\), that initially contains the input variables \(\{x_0,\cdots ,x_{n-1}\}\). Let \(z_i,z_j \in Z\). Then every program line is of the form

$$ z' := z_i + z_j. $$

After executing this program line, the new variable \(z'\) is added to the set, \(Z := Z \cup \{z'\}\). The new variable \(z'\) can therefore be used in the next program line. The program is said to compute \({\varvec{A}}{\varvec{x}}\) when \(\exists (z_1,\cdots ,z_m)\in Z^m \left\{ {\varvec{A}}{\varvec{x}} = (z_1,\cdots ,z_m)^\intercal \right\} \) holds.

Being able to find the shortest straight-line linear program has obvious applications to cryptology. Solving the SLP over \(GF(2)\) is equivalent to finding the shortest circuit to compute a function using only XOR gates. Optimizing implementations of linear operations, such as MixColumns in AES and the linear transformation in certain implementations of SubBytes, can therefore be seen as instances of the SLP problem over \(GF(2)\). However, this method does not apply to nonlinear operations such as S-boxes. We show in Sect. 3 what kind of methods can be used in such cases.

Solving the SLP Problem. Boyar, Matthews, and Peralta showed in [7] that the SLP problem over \(GF(2)\) is NP-hard. Off-the-shelf SAT solvers can be used to find solutions for small instances of this problem. Fuhs and Schneider-Kamp presented a method [16] to encode the SLP problem as an instance of SAT and they show how this can be used to optimize the affine transformation of AES’s SubBytes [15, 16].

For larger instances, exact methods will quickly become infeasible. Alternatively, Boyar and Peralta published an approach to solve the SLP problem over \(GF(2)\) based on a heuristic [8]. In short, the heuristic method uses a base vector set \(S\), initialized with unit vectors for all variables in \({\varvec{x}}\), and a distance vector Dist[] that keeps track of the minimal Hamming distance to \(S\) for each row in \({\varvec{A}}\). Repeatedly, the sum of the pair of base vectors in \(S\) that minimizes the sum of Dist[] is added to \(S\) and Dist[] is updated, until Dist[] is the all-zero vector. If there is a tie between two pairs of base vectors, the pair that maximizes the Euclidean length of the new Dist[] vector is chosen. This algorithm makes it possible to find solutions to larger instances of the SLP problem.

3 Optimizing S-Box Implementations using SAT-Solvers

For nonlinear functions such as S-boxes, known approaches based on heuristics [8] all exploit additional algebraic structure that may be available, e.g., as for the S-box of AES. However, in general this additional structure may not exist and one may need to fall back to generic methods such as SAT solvers.

S-box implementations in both software and hardware can be optimized with SAT solvers according to several criteria. In this paper we consider the following optimization goals:

  • Multiplicative complexity. The multiplicative complexity of a function [9] is defined as the smallest number of nonlinear gates with fan-in 2 required to compute this function. If we restrict our S-box implementations to the \(\{\texttt {AND},\texttt {OR},\texttt {XOR},\texttt {NOT} \}\) operations, we only need to consider the number of AND s and OR s. Optimizing for this goal is useful in the case of protecting against side-channel attacks using random masks, where nonlinear gates are typically more expensive to mask. There are also applications in multi-party computation and fully homomorphic encryption, where the cost of nonlinear operations is even more significant [1].

  • Bitslice gate complexity. The bitslice gate complexity of a function [12] is defined as the smallest number of operations in \(\{\texttt {AND},\texttt {OR},\texttt {XOR},\texttt {NOT} \}\) required to compute this function. This translates directly to efficient bitsliced software implementations, as on most common CPU architectures, there are no instructions for computing NAND, NOR, or XNOR immediately.

  • Gate complexity. The gate complexity of a function is defined as the smallest number of logic gates required to compute this function. Unlike for bitslice gate complexity, NAND, NOR, and XNOR gates are now also allowed. This translates to efficient hardware implementations, although the different amounts of area required by these types of gates and the different delays still need to be taken into account. Note that we only consider gates with a fan-in of at most 2.

  • Circuit depth complexity. The depth of a circuit is defined as the length of the longest paths from an input gate to an output gate. Every function can be computed by a circuit with depth 2, e.g., by expressing the function in conjunctive or disjunctive normal form. However, this can lead to very wide circuits with a lot of gates, which is typically not desirable. There is somewhat of a trade-off between circuit depth and number of gates. Still, optimizing for this goal is useful in the case of hardware implementations, to be able to decrease the total delay and therefore to be able to increase the clock frequency. Again, only gates with a fan-in of at most 2 are considered.

These criteria come with corresponding decision problems. For example, given a function \(f\) and some positive integer \(k\), the multiplicative complexity decision problem is defined as:

“Is there a circuit that implements \(f\) and that uses at most \(k\) nonlinear operations?”

The decision problems for the other three optimization goals can be defined analogously. Off-the-shelf SAT solvers can be used to solve these decision problems. When a SAT solver successfully finds a circuit for some value \(k\) but outputs UNSAT for \(k-1\), it is proven that \(k\) is the minimum value. Note that when a SAT solver outputs SAT for some value \(k\), it also provides a satisfying valuation that can be used to reconstruct an implementation of \(f\).

In order to use SAT solvers to solve these decision problems, the problems first have to be encoded in logical formulas in conjunctive normal form (CNF), because that is the input format that the SAT solver requires.

3.1 Notation

For the encoding, we use the notation of [20]. We consider systems of multivariate equations over \(GF(2)\). In these equations, let:

  • \(x_i\) be variables representing S-box inputs;

  • \(y_i\) be variables representing S-box outputs;

  • \(q_i\) be variables representing gate inputs;

  • \(t_i\) be variables representing gate outputs;

  • \(a_i\) be variables representing wiring between gates;

  • \(b_i\) be variables representing wiring ‘inside’ gates. This will become more clear when they are first used in Sect. 3.3.

In the implementations the logical connectives are used to denote the types of operations, i.e., let \(\wedge \), \(\vee \), \(\oplus \), \(\lnot \) denote AND, OR, XOR, NOT, respectively, and let \(\uparrow \), \(\downarrow \), \(\leftrightarrow \) denote NAND, NOR, XNOR, respectively.

3.2 Optimizing for Multiplicative Complexity

Courtois, Mourouzis and Hulme [13, 20] suggested a method to encode the multiplicative complexity decision problem. Let \(f : \mathbb {F} _2 ^n \rightarrow \mathbb {F} _2 ^m\) be an S-box and let \(k\) be the multiplicative complexity that we want to test for. Then first create a set of equations \(C\) in ANF consisting of:

  • \(\forall i \in \{0,\cdots ,k-1\}\): \(t_i = q_{2i} \cdot q_{2i+1}\), to encode the \(k\) AND gates.

  • \(\forall i \in \{0,\cdots ,2k-1\}\): \(q_i = a_{l} + \left( \sum _{j=0}^{n-1} a_{l + j + 1} \cdot x_j\right) + \left( \sum _{j=0}^{\left\lfloor \frac{i}{2}\right\rfloor - 1} a_{l + n + j + 1} \cdot t_j\right) \), where \(l = i(n+1) + \left\lfloor \frac{i^2-2i+1}{4}\right\rfloor \), to encode that the inputs of the AND gates can be any linear combination of S-box inputs and previous AND gate outputs. The single \(a\) represents an optional NOT gate.

  • \(\forall i \in \{0,\cdots ,m-1\}\): \(y_i = \left( \sum _{j=0}^{n-1} a_{s + j} \cdot x_j\right) + \left( \sum _{j=0}^{k-1} a_{s + n + j} \cdot t_j\right) \), where \(s = 2k(n+1) + k(k-1) + i(n+k)\), to encode that the S-box outputs can be any linear combination of S-box inputs and AND gate outputs.

For example, when \(n = m = 4\) and \(k = 3\), this leads to the following set of equations \(C\):

$$\begin{aligned} q_0&= a_0 + a_1 \cdot x_0 + a_2 \cdot x_1 + a_3 \cdot x_2 + a_4 \cdot x_3 \\ q_1&= a_5 + a_6 \cdot x_0 + a_7 \cdot x_1 + a_8 \cdot x_2 + a_9 \cdot x_3 \\ t_0&= q_0 \cdot q_1 \\ q_2&= a_{10} + a_{11} \cdot x_0 + a_{12} \cdot x_1 + a_{13} \cdot x_2 + a_{14} \cdot x_3 + a_{15} \cdot t_0 \\ q_3&= a_{16} + a_{17} \cdot x_0 + a_{18} \cdot x_1 + a_{19} \cdot x_2 + a_{20} \cdot x_3 + a_{21} \cdot t_0 \\ t_1&= q_2 \cdot q_3 \\ q_4&= a_{22} + a_{23} \cdot x_0 + a_{24} \cdot x_1 + a_{25} \cdot x_2 + a_{26} \cdot x_3 + a_{27} \cdot t_0 + a_{28} \cdot t_1 \\ q_5&= a_{29} + a_{30} \cdot x_0 + a_{31} \cdot x_1 + a_{32} \cdot x_2 + a_{33} \cdot x_3 + a_{34} \cdot t_0 + a_{35} \cdot t_1 \\ t_2&= q_4 \cdot q_5 \\ y_0&= a_{36} \cdot x_0 + a_{37} \cdot x_1 + a_{38} \cdot x_2 + a_{39} \cdot x_3 + a_{40} \cdot t_0 + a_{41} \cdot t_1 + a_{42} \cdot t_2 \\ y_1&= a_{43} \cdot x_0 + a_{44} \cdot x_1 + a_{45} \cdot x_2 + a_{46} \cdot x_3 + a_{47} \cdot t_0 + a_{48} \cdot t_1 + a_{49} \cdot t_2 \\ y_2&= a_{50} \cdot x_0 + a_{51} \cdot x_1 + a_{52} \cdot x_2 + a_{53} \cdot x_3 + a_{54} \cdot t_0 + a_{55} \cdot t_1 + a_{56} \cdot t_2 \\ y_3&= a_{57} \cdot x_0 + a_{58} \cdot x_1 + a_{59} \cdot x_2 + a_{60} \cdot x_3 + a_{61} \cdot t_0 + a_{62} \cdot t_1 + a_{63} \cdot t_2 \end{aligned}$$

This set of equations does not depend on \(f\) yet, but only on the values of \(n\) and \(m\). The equations in \(C\) have to be satisfied for all possible S-box inputs. An equation set \(C'\) is created that contains \(2^n\) copies of the equations in \(C\), in which all \(x_i, y_i, q_i, t_i\) are renumbered, but in which all \(a_i, b_i\) remain the same. \(f\) is ‘bound’ to the problem description by adding its truth table as \(2^n (n+m)\) constant equations, i.e., one for every bit in both the S-box input and the S-box output, to \(C'\).

\(C'\) is in ANF. The method by Bard, Courtois, and Jefferson [3] for converting sparse systems of low-degree multivariate polynomials over \(GF(2)\) is used to convert \(C'\) to CNF, such that it is understood by the SAT solver.

Results. This method makes it feasible to find the multiplicative complexity of several 4-bit and 5-bit S-boxes. Finding the multiplicative complexity comes with an actual implementation that uses this minimal number of nonlinear gates. After Courtois, Hulme, and Mourouzis applied this method to the S-boxes of PRESENT and GOST [12], we show that we can also find results for more recently introduced 4-bit and 5-bit S-boxes.

We consider the S-boxes, and if applicable, their inverses (denoted by \(^{-1}\)), in Ascon [14], ICEPOLE [19], Keccak [4]/Ketje [5]/Keyak [6], all PRIMATEs [2], Joltik [17]/Piccolo [22], LAC [23], Minalpher [21], Prøst [18], and RECTANGLE [24]. Minalpher’s and Prøst’s S-boxes are involutory, which is why their inverses are not listed separately. The inverse S-boxes in Ascon, ICEPOLE, Keccak, Ketje, and Keyak are not actually used in decryption and are therefore not considered.

For all S-boxes except the one used by the PRIMATEs we are able to prove the multiplicative complexity. The results are summarized in Table 1. The actual implementations can be found in Appendix A, but note that these should not be used by themselves as we are being very generous with XOR gates. The linear parts should be optimized separately, as we will demonstrate in Sect. 4.

Table 1. Multiplicative complexity of S-boxes

These and subsequent results are obtained using MiniSat 2.2.0 Footnote 1 and CryptoMiniSat 2.9.10 Footnote 2 using default parameters on a single core of an Intel Xeon E7-4870 v2 running at 2.30 GHz.

For the PRIMATEs S-box and inverse S-box, we find solutions for \(k=7\) and \(k=10\), respectively. Furthermore, we find for both S-boxes that the case for \(k=5\) yields UNSAT. We have started several attempts to find a decisive answer for \(k=6\), including

  • reducing the CNF, e.g., using NICESAT [11];

  • fine-tuning SAT solver parameters;

  • trying other SAT solvers;

  • trying other SAT solvers that can run in parallel on many cores, such as Plingeling and Treengeling Footnote 3; and

  • letting all of this run for several months on a machine with 120 cores and 3 TB of RAM.

Unfortunately, none of these attempts resulted in an answer as no solver instance has terminated yet. As these SAT solvers typically have much more difficulty with proving the UNSAT case than proving the SAT case, and as the SAT proof for \(k=7\) was found in less than 40 hours, we expect the \(k=6\) case to yield UNSAT and we therefore conjecture the multiplicative complexity of the PRIMATEs S-box to be 7. In Sect. 4 we go into more detail on optimizing the PRIMATEs S-box. For the inverse S-box, we did not manage to find solutions for \(k \in \{6,7,8,9\}\).

3.3 Optimizing for Bitslice Gate Complexity

In [13, 20], a method is also given to optimize for bitslice gate complexity. However, it is only applied on the small CTC2 toy cipher and therefore it remains unclear how practical this method is for real-world ciphers. We investigate this by applying the method to the same S-boxes as in the previous section.

The encoding scheme for the bitslice gate complexity decision problem is slightly different compared to the multiplicative complexity decision problem. Let \(f : \mathbb {F} _2 ^n \rightarrow \mathbb {F} _2 ^m\) again be an S-box and let \(k\) now be the bitslice gate complexity that we want to test for. Then our first set of equations \(C\) in ANF consists of:

  • \(\forall i \in \{0,\cdots ,k-1\}\): \(t_i = b_{3i} \cdot q_{2i} \cdot q_{2i+1} + b_{3i+1} \cdot q_{2i} + b_{3i+1} \cdot q_{2i+1} + b_{3i+2} + b_{3i+2} \cdot q_{2i}\), to encode the \(k\) AND, OR, XOR or NOT gates. The \(b_i\) determine what kind of gate this will represent, as can be seen in Table 2.

  • \(\forall i \in \{0,\cdots ,k-1\}\): \(0 = b_{3i} \cdot b_{3i+2}\) and \(0 = b_{3i+1} \cdot b_{3i+2}\), to make sure that the gate is either a unary NOT or a binary AND/OR/XOR, but not the XOR of them. This excludes NAND/NOR/XNOR gates.

  • \(\forall i \in \{0,\cdots ,2k-1\}\): \(q_i = \left( \sum _{j=0}^{n-1} a_{l + j} \cdot x_j\right) + \left( \sum _{j=0}^{\left\lfloor \frac{i}{2}\right\rfloor - 1} a_{l + n + j} \cdot t_j\right) \), where \(l = in + \left\lfloor \frac{i^2-2i+1}{4}\right\rfloor \), to encode that the inputs of the gates can be any S-box input bit or any previously computed bit.

  • \(\forall i \in \{0,\cdots ,2k-1\}\), \(\forall j \in \{l,\cdots ,l+n+\left\lfloor \frac{i}{2}\right\rfloor -2\}\),\(\forall u \in \{j+1,\cdots , l+n+\left\lfloor \frac{i}{2}\right\rfloor -1\}\): \(0 = a_j \cdot a_u\), to encode an ‘at most one’ constraint on the gate inputs.

  • \(\forall i \in \{0,\cdots ,m-1\}\): \(y_i = \left( \sum _{j=0}^{n-1} a_{s + j} \cdot x_j\right) + \left( \sum _{j=0}^{k-1} a_{s + n + j} \cdot t_j\right) \), where \(s = 2kn + k(k-1) + i(n+k)\), to encode that the S-box output bit can be any S-box input bit or any gate output.

  • \(\forall i \in \{0,\cdots ,m-1\}\), \(\forall j \in \{s,\cdots ,s+n+k-2\}\), \(\forall u \in \{j+1,\cdots ,s+n+k-1\}\): \(0 = a_j \cdot a_u\), to encode an ‘at most one’ constraint on the S-box outputs.

Table 2. Encoding of different types of gates (bitslice gate complexity)

Converting \(C\) to \(C'\) and then to CNF is the same process as with the multiplicative complexity decision problem. Note that the ‘constraint equations’ on \(a_i\) and \(b_j\) do not have to be duplicated \(2^n\) times for \(C'\), as they are not renumbered. This saves a lot of redundant clauses.

Results. As the amount of CNF clauses that is necessary to describe the bitslice gate complexity decision problem becomes much larger compared to the multiplicative complexity decision problem, it can take much more time for a SAT solver to actually solve a problem instance. Still, for some 4-bit and 5-bit S-boxes results can be obtained within minutes or within a few hours. Table 3 contains some examples. If a bitslice gate complexity is listed as \(\le k\), a solution was found for \(k\), but we were unable to prove that this is the minimum because the SAT solver did not terminate within a reasonable amount of time for \(k-1\). The actual implementations with the given number of operations can be found in Appendix A.

Table 3. Bitslice gate complexity of S-boxes

For Prøst and the (forward) S-box of RECTANGLE, it is interesting to note that the SAT solvers are able to find the same implementations as the corresponding authors already suggested. We have proven that their bitsliced implementations are indeed minimal.

3.4 Optimizing for Gate Complexity

A method to encode the gate complexity decision problem was also provided in [13, 20], but again, actual results were only given for the CTC2 toy cipher. We show that it is feasible to compute the gate complexity for real-world 4-bit S-boxes as well.

The encoding is very similar to the bitslice gate complexity decision problem. The first set of equations \(C\) in ANF only differs in two places:

  • Instead of the previous rule for \(t_i\), the gates are encoded differently: \(\forall i \in \{0,\cdots ,k-1\}\): \(t_i = b_{3i} \cdot q_{2i} \cdot q_{2i+1} + b_{3i+1} \cdot q_{2i} + b_{3i+1} \cdot q_{2i+1} + b_{3i+2}\), to encode the \(k\) gates. The \(b_i\) determine what kind of gate this will represent, as can be seen in Table 4.

  • The additional constraints on the \(b_i\) are completely omitted.

Table 4. Encoding of different types of gates (gate complexity)

Converting \(C\) to \(C'\) and then to CNF is similar to the previous optimization goals.

Results. Our results on real-world 4-bit S-boxes are summarized in Table 5. The full implementations can be found in Appendix A. For our 5-bit S-boxes we did not manage to retrieve results. Note that all types of logic gates are considered equally expensive. There is no type of gate that is preferred over the other, because information such as differences in area consumption or time delay are not taken into account. The implementations found by the SAT solver should therefore not be used directly for hardware implementations. However, they serve as an optimal starting point from where to swap ‘expensive’ gates for cheaper ones, depending on the specific technology that is to be used. For example, the designers of Piccolo suggested a hardware implementation [22] of their S-box that may or may not be more efficient than the implementation given here, depending on the specific technology.

Table 5. Gate complexity of S-boxes

3.5 Optimizing for Depth Complexity

There are many situations in high-speed hardware implementations where the implementer wants to keep the depth of the circuit as low as possible, in order to be able to increase the clock frequency, without having to use significantly more gates. We provide a novel method to find low-depth implementations of small functions such as S-boxes using SAT solvers. This method is inspired by the encoding of the gate complexity decision problem, but modified in some important ways.

In the encoding of the gate complexity decision problem, we expressed that every gate can use the S-box input and the outputs of previous gates as its input. The key idea here is to divide the circuit into depth layers and to encode the notion that a gate can only use the S-box input and the output of gates in the previous layers as its input. This is made more precise later.

First we note that it is necessary to limit the potential increase of the number of gates when reducing the depth of a circuit. We introduce a fixed maximum layer width \(w\) to address this, so we allow at most \(w\) gates to be executed in parallel. For some function \(f\), we want to be able to answer questions such as: “is there a circuit implementing \(f\) with depth \(k\) and with at most \(w\) gates on each depth layer?”.

Using this fixed maximum layer width, we make our encoding method more precise by once more creating a set \(C\) of multivariate equations over \(GF(2)\) in ANF that consists of:

  • \(\forall i \in \{0,\cdots ,kw-1\}\): \(t_i = b_{3i} \cdot q_{2i} \cdot q_{2i+1} + b_{3i+1} \cdot q_{2i} + b_{3i+1} \cdot q_{2i+1} + b_{3i+2}\), to encode the \(kw\) gates. The \(b_i\) determine what kind of gate this will represent, as can be seen in Table 4.

  • \(\forall i \in \{0,\cdots ,2kw-1\}\): \(q_i = \left( \sum _{j=0}^{n-1} a_{l + j} \cdot x_j\right) + \left( \sum _{j=0}^{v-1} a_{l + n + j} \cdot t_j\right) \), where \(v = \left\lfloor \frac{i}{2w}\right\rfloor w\) and \(l = in + v\left( i-v-w\right) \), to encode that the inputs of the gates can be any S-box input bit or any previously computed bit.

  • \(\forall i \in \{0,\cdots ,2kw-1\}\), \(\forall j \in \{l,\cdots ,l+n+v-2\}\), \(\forall u \in \{j+1,\cdots ,l+n+v-1\}\): \(0 = a_j \cdot a_u\), to encode an ‘at most one’ constraint on the gate inputs.

  • \(\forall i \in \{0,\cdots ,m-1\}\): \(y_i = \left( \sum _{j=0}^{n-1} a_{s + j} \cdot x_j\right) + \left( \sum _{j=0}^{kw-1} a_{s + n + j} \cdot t_j\right) \), where \(s = kw(2n + kw - w) + i(n+kw)\), to encode that the S-box output bit can be any S-box input bit or any gate output.

  • \(\forall i \in \{0,\cdots ,m-1\}\), \(\forall j \in \{s,\cdots ,s+n+kw-2\}\), \(\forall u \in \{j+1,\cdots ,s+n+kw-1\}\): \(0 = a_j \cdot a_u\), to encode an ‘at most one’ constraint on the S-box outputs.

Converting \(C\) to \(C'\) and subsequently expressing this in CNF is again the same process as before.

Results. Using our method, we are able to find low-depth implementations for our 4-bit S-boxes. The results are summarized in Table 6 and the corresponding implementations can be found in Appendix A. The last column in Table 6 lists scenarios that yield UNSAT, to show boundaries on what is possible. The trade-off between circuit depth and the number of gates is made here in such a way that reducing the depth by 1 would imply the implementation to have at least twice as many gates as is required by the gate complexity.

Table 6. Depth complexity of S-boxes

4 Combining Criteria: Optimizing the PRIMATEs S-Box

So far, we have seen how to optimize for one specific goal. However, a result that is optimized for multiplicative complexity may contain more XOR gates than is desired, and a result that is optimized for gate complexity may contain more nonlinear gates than is desired for a masked implementation. Here we show how multiple optimization goals can be combined by looking at the 5-bit PRIMATEs S-box. We first optimize for multiplicative complexity to have a minimal number of nonlinear gates, and subsequently we minimize the number of linear gates. The result is an implementation that has 4 AND, 3 OR, 31 XOR, and 5 NOT gates.

The PRIMATEs S-box is an almost bent permutation with a maximum linear and differential probability of \(2^{-4}\). It is chosen because of its low area consumption in hardware implementations.

When the optimization method for multiplicative complexity is applied, we find a solution with multiplicative complexity 7 as follows:

figure a

It is not hard to see that there are a lot of redundant XOR operations in this implementation. We distinguish between XOR operations before the nonlinear gates (on \(x_i\)) and XOR operations after the nonlinear gates (on \(t_i\)). It is possible to see them as two straight-line linear programs, where the first describes the linear part of the S-box approached from the input and the second describes the linear part approached from the S-box output.

The shortest linear straight-line program problem \({\varvec{A}}_\mathbf{1}{\varvec{x}}_\mathbf{1}\) can be given by

figure b

The shortest linear straight-line program problem \({\varvec{A}}_\mathbf{2}{\varvec{x}}_\mathbf{2}\) can be given by

figure c

We are able to find a minimal straight-line program computing \({\varvec{A}}_\mathbf{2}{\varvec{x}}_\mathbf{2}\) using SAT solvers. We use the method suggested by Fuhs and Schneider-Kamp [16] to encode the SLP problem as a SAT instance in CNF. This yields a result that is incorporated in our implementation of the PRIMATEs S-box. Finding a minimal straight-line program computing \({\varvec{A}}_\mathbf{1}{\varvec{x}}_\mathbf{1}\) turned out to be infeasible using SAT solvers within a reasonable amount of time. Therefore, we apply the heuristic approach as suggested by Boyar and Peralta [8]. This does provide us with a short straight-line program. We combine both results and amend the original PRIMATEs S-box implementation to get the more efficient implementation below, where \(z_i\) represent helper variables.

figure d

We are able to decrease the previous result of 58 XOR gates to only 31 XOR gates.

Tools. We provide tools to generate \(C'\) in ANF for all discussed optimization goals and to convert a SAT solver solution back to an S-box implementation. We place those tools into the public domain. They and additional documentation are available online at https://github.com/Ko-/sboxoptimization.

5 Conclusion

SAT solvers can be used to find minimal implementations for small functions such as S-boxes with respect to criteria as the multiplicative complexity, bitslice gate complexity, gate complexity, and circuit depth complexity. We have shown how this can be done and how multiple criteria can be combined. However, for 8-bit S-boxes and larger functions these methods quickly become infeasible. One will then have to resort to approaches based on heuristics.