Skip to main content

Limitations of the Use of Neural Networks in Black Box Cryptanalysis

  • Conference paper
  • First Online:
Innovative Security Solutions for Information Technology and Communications (SecITC 2021)

Abstract

In this work, we first abstract a block cipher to a set of parallel Boolean functions. Then, we establish the conditions that allow a multilayer perceptron (MLP) neural network to correctly emulate a Boolean function. We extend these conditions to the case of any block cipher. The modeling of the block cipher is performed in a black box scenario with a set of random samples, resulting in a single secret key chosen plaintext/ciphertext attack. Based on our findings we explain the reasons behind the success and failure of relevant related cases in the literature. Finally, we conclude by estimating what are the resources to fully emulate 2 rounds of AES-128, a task that has never been achieved by means of neural networks. Despite the presence of original results and observations, we remark the systematization of knowledge nature of this work, whose main point is to explain the reason behind the inefficacy of the use of neural networks for black box cryptanalysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Notice that S-DES uses 10 bit keys, 8 bit messages, 4 to 2 sboxes, and 2 rounds. This parameters are very far from the real DES.

References

  1. Rivest, R.L.: Cryptography and machine learning. In: Imai, H., Rivest, R.L., Matsumoto, T. (eds.) ASIACRYPT 1991. LNCS, vol. 739, pp. 427–439. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-57332-1_36

    Chapter  Google Scholar 

  2. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 19. The MIT Press, Cambridge (2017)

    Google Scholar 

  3. Géron, A.: Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Sebastopol (2019)

    Google Scholar 

  4. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  MATH  Google Scholar 

  5. Hochreiter, S., Urgen Schmidhuber, J.: Long shortterm memory. Neural Comput. 9(8), 17351780 (1997)

    Google Scholar 

  6. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  7. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. (1989)

    Google Scholar 

  8. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. (1991)

    Google Scholar 

  9. Anthony, M.: Connections between neural networks and Boolean functions. Boolean Methods Models 20 (2005)

    Google Scholar 

  10. Livni, R., Shalev-Shwartz, S., Shamir, O.: On the computational efficiency of symmetric neural networks. Adv. Neural. Inf. Process. Syst. 27, 855–863 (2014)

    Google Scholar 

  11. Kearns, M.J.: The Computational Complexity of Machine Learning. MIT press, Cambridge (1990)

    Google Scholar 

  12. Goldreich, O., Goldwasser, S., Micali, S.: How to construct random functions. In: Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali, pp. 241–264. ACM (2019)

    Google Scholar 

  13. Steinbach, B., Kohut, R.: Neural networks-a model of Boolean functions. In: Boolean Problems, Proceedings of the 5th International Workshop on Boolean Problems, pp. 223–240 (2002)

    Google Scholar 

  14. Livni, R., Shalev-Shwartz, S., Shamir, O.: On the computational efficiency of training neural networks. arXiv preprint arXiv:1410.1141 (2014)

  15. Malach, E., Shalev-Shwartz, S.: Learning Boolean circuits with neural networks. arXiv preprint arXiv:1910.11923, 2019

  16. Dileep, A.D., Sekhar, C.C.: Identification of block ciphers using support vector machines. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp. 2696–2701. IEEE (2006)

    Google Scholar 

  17. Swapna, S., Dileep, A.D., Sekhar, C.C., Kant, S.: Block cipher identification using support vector classification and regression. J. Discret. Math. Sci. Cryptogr. 13(4), 305–318 (2010)

    Google Scholar 

  18. Chou, J.W., Lin, S.D., Cheng, C.M.: On the effectiveness of using state-of-the-art machine learning techniques to launch cryptographic distinguishing attacks. In: Proceedings of the 5th ACM Workshop on Security and Artificial Intelligence, pp. 105–110 (2012)

    Google Scholar 

  19. de Mello, F.L., Xexeo, J.A.: Identifying encryption algorithms in ECB and CBC modes using computational intelligence. J. UCS 24(1), 25–42 (2018)

    Google Scholar 

  20. Lagerhjelm, L.: Extracting information from encrypted data using deep neural networks (2018)

    Google Scholar 

  21. Alani, M.M.: Neuro-cryptanalysis of des. In: World Congress on Internet Security (WorldCIS-2012), pp. 23–27. IEEE (2012)

    Google Scholar 

  22. Alani, M.M.: Neuro-cryptanalysis of DES and triple-DES. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012. LNCS, vol. 7667, pp. 637–646. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34500-5_75

    Chapter  Google Scholar 

  23. Xiao, Y., Hao, Q., Yao, D.D.: Neural cryptanalysis: metrics, methodology, and applications in CPS ciphers. In: 2019 IEEE Conference on Dependable and Secure Computing (DSC), pp. 1–8. IEEE (2019)

    Google Scholar 

  24. Alallayah, K.M., Alhamami, A.H., AbdElwahed, W., Amin, M.: Attack of against simplified data encryption standard cipher system using neural networks. J. Comput. Sci. 6(1), 29 (2010)

    Google Scholar 

  25. Alallayah, K.M., Alhamami, A.H., AbdElwahed, W., Amin, M.: Applying neural networks for simplified data encryption standard (SDES) cipher system cryptanalysis. Int. Arab J. Inf. Technol. 9(2), 163–169 (2012)

    Google Scholar 

  26. Danziger, M., Henriques, M.A.A.: Improved cryptanalysis combining differential and artificial neural network schemes. In: 2014 International Telecommunications Symposium (ITS), pp. 1–5. IEEE (2014)

    Google Scholar 

  27. So, J.: Deep learning-based cryptanalysis of lightweight block ciphers. Secur. Commun. Netw. 2020 (2020)

    Google Scholar 

  28. Pareek, M., Mishra, G., Kohli, V.: Deep learning based analysis of key scheduling algorithm of present cipher. Cryptology ePrint Archive, Report 2020/981 (2020). http://eprint.iacr.org/2020/981

  29. Flajolet, P., Gardy, D., Thimonier, L.: Birthday paradox, coupon collectors,caching algorithms and self-organizing search (1992)

    Google Scholar 

  30. Daemen, J., Rijmen, V.: The Design of Rijndael: AES-the Advanced Encryption Standard. In: Information Security and Cryptography. Springer, Heidelberg (2002). https://doi.org/10.1007/978-3-662-60769-5.

  31. Cid, C., Murphy, S., Robshaw, M.J.B.: Small scale variants of the AES. In: Gilbert, H., Handschuh, H. (eds.) FSE 2005. LNCS, vol. 3557, pp. 145–162. Springer, Heidelberg (2005). https://doi.org/10.1007/11502760_10

    Chapter  MATH  Google Scholar 

  32. Raphael Chung-Wei Phan: Mini advanced encryption standard (mini-AES): a testbed for cryptanalysis students. Cryptologia 26(4), 283–306 (2002)

    Article  Google Scholar 

  33. Carlet, C.: Boolean functions for cryptography and error correcting codes. In: Boolean Models and Methods in Mathematics, Computer Science, and Engineering, pp. 257–397 (2010)

    Google Scholar 

  34. MacWilliams, F.J., Sloane, N.J.A.: The theory of error-correcting codes. I. North-Holland Publishing Co., Amsterdam, 1977. North-Holland Mathematical Library, vol. 16 (1977)

    Google Scholar 

  35. O’Neil, S., Courtois, N.T.: Reverse-engineered Philips/NXP Hitag2 Cipher (2008). http://fse2008rump.cr.yp.to/00564f75b2f39604dc204d838da01e7a.pdf

  36. Plötz, H., Nohl, K.: Breaking hitag2. HAR2009, 2011 (2009)

    Google Scholar 

  37. Courtois, N.T., O’Neil, S., Quisquater, J.-J.: Practical algebraic attacks on the Hitag2 stream cipher. In: Samarati, P., Yung, M., Martinelli, F., Ardagna, C.A. (eds.) ISC 2009. LNCS, vol. 5735, pp. 167–176. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04474-8_14

    Chapter  Google Scholar 

  38. Štembera, P., Novotny, M.: Breaking hitag2 with reconfigurable hardware. In: 2011 14th Euromicro Conference on Digital System Design, pp. 558–563. IEEE (2011)

    Google Scholar 

  39. Immler, V.: Breaking Hitag 2 revisited. In: Bogdanov, A., Sanadhya, S. (eds.) SPACE 2012. LNCS, pp. 126–143. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34416-9_9

    Chapter  Google Scholar 

  40. Schaefer, E.F.: A simplified data encryption standard algorithm. Cryptologia 20(1), 77–84 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Rossi .

Editor information

Editors and Affiliations

Appendices

Appendix A Preliminaries on Boolean functions

We introduce here, for completeness, the relevant notions concerning Boolean functions. For a complete overview of the topic see [33] or [34].

We denote by \(\mathbb {F}_2\) the binary field with two elements. The set \(\mathbb {F}_2^n\) is the set of all binary vectors of length n, viewed as an \(\mathbb {F}_2\)-vector space. A Boolean function is a function \(f:\mathbb {F}_2^n\mapsto \mathbb {F}_2\). The set of all Boolean functions from \(\mathbb {F}_2^n\) to \(\mathbb {F}_2\) will be denoted by \({\mathcal B}_n\).

We assume implicitly to have ordered \(\mathbb {F}_2^n\), so that \(\mathbb {F}_2^n=\{x_1,\ldots ,x_{2^n}\}\). A Boolean function f can be specified by a truth table (or evaluation vector), which gives the evaluation of f at all \(x_i\)’s. Once the order on \(\mathbb {F}_2^n\) is chosen, i.e. the \(x_i\)’s are fixed, the truth table of f uniquely identifies f.

A Boolean function \(f \in {\mathcal B}_n\) can be expressed in another way, namely as a unique square free polynomial in \(\mathbb {F}_2[X]=\mathbb {F}_2[x_1,\ldots ,x_n]\), more precisely \( f = \sum _{(v_1,\ldots , v_n) \in \mathbb {F}_2^n}b_{(v_1,\ldots , v_n)} x_1^{v_1}\cdots x_n^{v_n} \). This representation is called the Algebraic Normal Form (ANF).

There exists a simple divide-and-conquer butterfly algorithm ( [33], p. 10) to compute the ANF from the truth-table (or vice-versa) of a Boolean function, which requires \(O(n2^{n})\) bit sums, while \(O(2^n)\) bits must be stored. This algorithm is known as the fast Möbius transform.

We now define a set of properties of Boolean functions that are useful in cryptography. In Appendix D we study the relation of this properties with the learnability of a Boolean function. We refer to [33] for more details.

The degree of the ANF of a Boolean function f is called the algebraic degree of f, denoted by \(\deg f\), and it is equal to the maximum of the degrees of the monomials appearing in the ANF. The correlation immunity of a Boolean function is a measure of the degree to which its outputs are uncorrelated with some subset of its inputs. More formally, a Boolean function is correlation-immune of order m if every subset of at most m variables in \(\{ x_{1},\ldots ,x_{n}\}\) is statistically independent of the value of \(f(x_{1},\ldots ,x_{n})\). The parameter of a Boolean function quantifying its resistance to algebraic attacks is called algebraic immunity. More precisely, this is the minimum degree of \(g \ne 0\) such that g is an annihilator of f.

The nonlinearity of a Boolean function is the distance to the linear functions, i.e. the minimum number of outputs that need to be flipped to obtain the output of a linear function.

Finally, a Boolean function is said to be resilient of order m if it is balanced (the output is 1 or 0 the same number of times) and correlation immune of order m. The resiliency order is the maximum value m such that the function is resilient of order m.

Appendix B Neural networks in black box cryptanalysis: previous results

1.1 B.1 Cipher identification

Neural networks can be used to distinguish the output of a cipher from random bit strings or from the output of another cipher, by training the network with pairs of plaintext-ciphertext obtained from a single secret key (single secret-key distinguisher) or from multiple keys (multiple secret-key distinguisher). Variations of these attacks might exist in the related key scenario, but we are not aware of any work in this direction related to neural networks. The general architecture of neural networks used for distinguisher attacks is shown in Fig. 5a.

Fig. 5.
figure 5

(a) Generic multilayer perceptron (MLP) architecture to perform a distinguisher attack in known plaintext scenario. The MLP receives n-bit plaintext \(p_1,\ldots , p_n\) and ciphertext \(c_1,\ldots , c_n\) as input. Each bit serves as input to one neuron, therefore the input layer consists of 2n neurons. The output layer consists of a single neuron with two possible outputs, depending on the outcome of the distinguishing attack. (b) Generic multilayer perceptron architecture to perform ciphertext emulation in a known plaintext scenario. (c) Generic multilayer perceptron architecture to map a key recovery attack in the known plaintext scenario. Given plaintext \(p_1,\ldots ,p_n\)/ciphertext \(c_1,\ldots ,c_n\) pairs as input, each neuron in the output layer predicts one bit of the key \(k_1,\ldots ,k_m\).

A direct application of ML to distinguishing the output produced by modern ciphers operating in a reasonably secure mode such as cipher block chaining (CBC) was explored in [18]. The ML distinguisher had no prior information on the cipher structure, and the authors conclude that their technique was not successful in the task of extracting useful information from the ciphertexts when CBC mode was used and not even distinguish them from random data. Better results were obtained in electronic codebook (ECB) mode, as one may easily expect, due to the lack of semantic security (non-randomization) of the mode. The main tools used in the experiment are Linear Classifiers and Support Vector Machine with Gaussian Kernel. To solve the problem of cipher identification, the authors focused on the bag-of-words model for feature extraction and the common classification framework previously used in [16, 17], where the extracted features of the input samples are mostly related to the variation in word length. In [18], the considered features are the entropy of the ciphertext, the number of symbols appearing in the ciphertext, 16-bit histograms with 65536 dimensions, the varying length words proposed in [16].

Similar experiments to the one of [18] have also been presented, essentially, with similar results. For example, in [19], the authors consider 8 different plaintext languages, 6 block ciphers (DES, Blowfish, ARC4, Rijndael, Serpent and Twofish) in ECB and CBC mode and a “CBC”-like variation of RSA, and perform the identification on a higher-performance machine (40 computational nodes, each with a 16-core Opteron 6276 CPU, a NVIDIA Tesla K20 GPU and 32 GB of central memory) compared to [18], by means of different classical machine learning classifiers: C4.5, PART, FT, Complement Naive Bayes, MLP and WiSARD. The NIST test suite was applied to the ciphertexts to guarantee the quality of the encryption. The authors conclude that the influence of the idiom in which plaintexts were written is not relevant to identify different encryption. Also, the proposed procedures obtained full identification for almost all of the selected cryptographic algorithms in ECB mode. The most surprising result reported by the author is the identification of algorithms in CBC mode, which showed lower rates than the ECB case, but, according to the authors, the lower rate is “not insignificant”, because the quality of identification in CBC mode is still “greater than the probabilistic bid”. Moreover, the authors point out that rates increased monotonically, and thus can be increased by intensive computation. The most efficient classifier was Complement Naive Bayes, not only with regard to successful identification, but also in time consumption.

Another recent work is the master thesis of Lagerhjelm [20], in 2018. In this work, long short-term memory networks are used to (unsuccessfully) decipher encrypted text, and convolutional neural network are used to perform classification tasks on encrypted MNIST images. Again, with success when distinguishing the ECB mode, and with no success in the CBC case.

1.2 B.2 Cipher emulation

Neural networks can be used to emulate the behaviour of a cipher, by training the network with pairs of plaintext and ciphertext generated from the same key. The general architecture of such networks is shown in Fig. 5b. Without knowing the secret key, one could either aim at predicting the ciphertext given a plaintext (encryption emulation), as done, for example, by Xiao et al. in [23], or to predict a plaintext given a ciphertext (decryption emulation), as done, for example, by Alani in [21, 22].

In 2012, Alani [21, 22] implements a known-plaintext attack based on neural networks, by training a neural network to retrieve plaintext from ciphertext without retrieving the key used in encryption, or, in other words, finding a functionally equivalent decryption function. The author claims to be able to use an average of 211 plaintext-ciphertext pairs to perform cryptanalysis of DES in an average duration of 51 min, and an average of only 212 plaintext-ciphertext pairs for Triple-DES in an average duration of 72 min. His results, though, could not be reproduced by, for example, Xiao et al. [23], and no source code is provided to reproduce the attack. The adopted network layouts were 4 or 5 layers perceptrons, with different configurations: 128-256-256-128, 128-256-512-256, 128-512-256-256, 128-256-512-128, 128-512-512-128, 64-128-256-512-1024 (Triple-DES), and similar. The average size of data sets used was about \(2^{20}\) plaintext-ciphertext pairs. The training algorithm was the scaled conjugate-gradient. The experiment, implemented in MATLAB, was run on single computer with AMD Athlon X2 processor with 1.9 GHz clock frequency and 4 Gigabytes of memory.

In 2019, Xiao et al. [23] try to predict the output of a cipher treating it as a black box using an unknown key. The prediction is performed by training a neural network with plaintext/ciphertext pairs. The error function chosen to correct the weights during the training was mean-squared error. Weights were initialized randomly. The maximum numbers of training cycles (epochs) was set to \(10^4\). Then, the measure of the strength of a cipher is given by three metrics: cipher match rate, training data, and time complexity. They perform their experiment on reduced-round DES and Hitaj2 [35], a 48-bit key and 48-bit state stream cipher, developed and introduced in late 90’s by Philips Semiconductors (currently NXP), primarily used in Radio Frequency Identification (RFID) applications, such as car immobilizers. Note that Hitaj2 has been attacked several times with algebraic attacks using SAT solvers (e.g. [36, 37]) or by exhaustive search (e.g. [38, 39]).

Xiao et al. test three different networks: a deep and thin fully connected network (MLP with 4 layers of 128 neurons each), a shallow and fat network (MLP with 1 layer of 1000 neurons), and a cascade network (4 layers with 128, 256, 256, 128 neurons). All three networks end with a softmax binary classifier. Their experiments show that the neural network able to perform the most powerful attack varies from cipher to cipher. While a fat and shallow shaped fully connected network is the best to attack the round-reduced DES (up to 2 rounds), a deep-and thin shaped fully connected network works best on Hitag2. Three common activation functions, sigmoid, tanh and relu, are tested, however, only for the shallow-fat network. The authors conclude that the sigmoid function allows a faster training, though all functions eventually reach the same accuracy. Training and testing are performed on a personal laptop (no details provided), so the network used cannot be too large. The training has been performed with up to \(2^{30}\) samples.

1.3 B.3 Key recovery attacks

Neural networks can be used to predict the key of a cipher, by training the network with triples of plaintext, ciphertext and key (different from the one that needs to be found). The general architecture of such networks is shown in Fig. 5c.

In 2014, Danziger and Henriques [26] successfully mapped the input/output behaviour of the Simplified Data Encryption Standard (S-DES) [40]Footnote 1, with the use of a single hidden layer perceptron neural network (see Fig. 5c). They also showed that the effectiveness of the MLP network depends on the nonlinearity of the internal s-boxes of S-DES. Indeed, the main goal of the authors was to understand the relation between the differential cryptanalysis results and the ones obtained with the neural network. In their experiment, given the plaintext P and ciphertext C, the output layer of the neural network is used to predict the key K. Thus, for the training of the weights and biases in the neural network, training data of the form (PCK) is needed. After training has finished, the neural network was expected to predict a new value of K (not appearing in the training phase) given a new (PC) pair as input.

Prior works on S-DES include [24, 25], where Alallayah et al. propose the use of Levenberg-Marquardt algorithm rather than the Gradient Descent to speed up the training. Besides key recovery, they also use a single layer perceptron network to emulate the behaviour of S-DES, modelling the network with the plaintext as input, and the ciphertext as output. Their results is positive due to the small size of the cipher, and a thorough analysis of the techniques used is lacking.

In 2020, So et al. [27] proposed the use of 3 to 7 layer MLPs (see Fig. 5c) to perform a known plaintext key recovery attack on S-DES (8 bit block, 10 bit key, 2 rounds), Simon32/64 (32 bit block, 64 bit key, 32 rounds), and Speck32/64 (32 bit block, 64 bit key, 22 rounds). Besides considering random keys, So et al. additionally restricts keys to be made of ASCII characters. In this second case, the MLP is able to recover keys for all the non-reduced ciphers. It is important to notice that the largest cipher analyzed by So et al. has a key space of \(2^{64}\) keys, which is reduced to \(2^{48} = 64^{8}\) keys when only ASCII keys are considered. The number of hidden layers adopted in this work ranges between 3,5,7, while the number of neurons per layer ranges between 128, 256, 512. In the training phase, So et al. use 5000 epochs and the Adam adaptive moment algorithm as optimization algorithm for the MLP. In comparison to regular gradient descent, Adam is a more sophisticated optimizer which adapts the learning rate and momentum. The training and testing are run on GPU-based server with Nvidia GeForce RTX 2080 Ti and its CPU is Intel Core i9-9900K.

1.4 B.4 Key-Schedule inversion

As for the emulation of cipher decryption described in Subsect. B.2, one might try to invert the behavior of the key schedule routine, as done for example by Pareek et al. [28], in 2020. In their work, they considered the key schedule of PRESENT and tried to retrieve the 80-bit key from the last 64-bit round key, using an MLP network with 3 hidden layers of 32, 16, and 8 neurons. Unfortunately, the authors concluded that, using this type of network, the accuracy of predicting the key bits, were not significantly deviating from 0.5.

Appendix C A tiny example

We consider here two parallel Boolean functions \(f_1(x_1,x_2)\) and \(f_2(x_1,x_2)\), and suppose we know how two inputs are mapped, i.e. \(f_1(00) = 00\), \(f_2(01) = 11\). To evaluate the accuracy of an algorithm guessing the output of 10 and 11, one might consider to increase a counter every time

  1. 1.

    the output of the full 2-bits block is guessed correctly. To compute the accuracy, divide the counter by the total number of 2-bit output that have been guessed.

  2. 2.

    the output of the full 2-bit block is guessed correctly for at least 1 bit. To compute the accuracy, divide the counter by the total number of 2-bit output that have been guessed.

  3. 3.

    a single bit is guessed correctly (over all guessed outputs). To compute the accuracy, divide the counter by the product of bits per block (2) and the total number of 2-bit output that have been guessed.

As an example, let us suppose that the correct missing values are mapped to \(f_1(10) = 01\), \(f_2(11) = 11\). Let us also suppose that an algorithm \(\mathcal {A}\) made the following guess \(10 \mapsto 00\), \(11 \mapsto 10\). According to the first metric the accuracy of \(\mathcal {A}\) is 0. According to the second metric the accuracy of \(\mathcal {A}\) is 1. According to the third metric, the accuracy of \(\mathcal {A}\) is 3/4.

Note that if we have to guess two 2-bit Boolean functions mapping \(00 \mapsto 00\), \(01 \mapsto 11\), \(10 \mapsto 01\), then we can correctly guess where the value 11 will be mapped to with probability 1/4. On the other hand, if we know that the two Boolean functions have to form a permutation over the set \(\{00,01,10,11\}\), then we only have the option \(11 \mapsto 10\). In general, if there are r missing values for a set of \(m'\) m-bit Boolean functions, and we know they have to form a permutation (\(m'=m\)), we can guess correctly with probability 1/r!. If the \(m'\) m-bit Boolean function does not necessarily form a permutation, then we can guess correctly with probability \(1/(2^{rm'})\), which is much lower than 1/r!. In the case of a block cipher, we also know that not all permutations are possible, but only the ones indexed by the n-bits keys, which are \(2^n\).

Appendix D Emulating Boolean functions with different cryptographic properties

In this section, we want to determine if there exist a correlation between the learnability of a Boolean function and some of its most relevant cryptographic properties, namely: algebraic degree, algebraic immunity, correlation immunity, nonlinearity and resiliency order (see Appendix A or [33] for definitions).

We randomly picked ten Boolean functions, in \(m=10\) variables, for each algebraic degree from \(1,\ldots ,9\) (i.e. 90 Boolean functions in total). A neural network was trained to predict the output of these functions. In Fig. 6a it is shown how the neural network parameters affect the accuracy of the predictions (for the case of algebraic degree property), while Fig. 6b shows the network performance during the training. In both graphs, we take, for each value of the algebraic degree, the average of the accuracy and the loss over the ten Boolean functions considered.

In particular, we notice two facts. The first one is that we need the full dataset in order to be able to predict the outcome of the Boolean functions. The second one is the similarity of the training progress for all algebraic degrees (with a slight irregularity in linear functions) in Fig. 6b, which points out that the algebraic degree is not causing major differences in the learnability of the Boolean functions.

The panels in figure Fig. 6c show the training progress for the algebraic immunity, the correlation immunity, the nonlinearity and the resiliency order. While for the algebraic immunity and nonlinearity no major differences in the training progress are visible, we notice that for correlation immunity and resiliency order there are some differences in the training progress. The results on correlation immunity are in line with the work from Malach et al. [15], but a detailed investigation is beyond the scope of this work and is left for future research.

Fig. 6.
figure 6

Binary accuracy (blue) and binary crossentropy loss (red) of an MLP learning Boolean functions of varying algebraic degree. The left hand side figure (a) shows the final accuracy and loss values obtained on the validation dataset for different configurations \(e=1,\ldots ,10\). In detail the number of neurons in the hidden layer of the MLP was varied (\(2^e = 2^1,\ldots ,2^{10}\)), as well as the number of samples (\(2^e\)) and number of training epochs (\(2^e\)). The right hand side figure (b) shows the training progress of a neural network with 1024 neurons, 1024 samples and 1024 epochs. Figure (c) in the lower panel shows the training progress of a neural network with 1024 neurons, 1024 samples and 1024 epochs for various other considered properties of Boolean functions. (Color figure online)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bellini, E., Hambitzer, A., Protopapa, M., Rossi, M. (2022). Limitations of the Use of Neural Networks in Black Box Cryptanalysis. In: Ryan, P.Y., Toma, C. (eds) Innovative Security Solutions for Information Technology and Communications. SecITC 2021. Lecture Notes in Computer Science, vol 13195. Springer, Cham. https://doi.org/10.1007/978-3-031-17510-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17510-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17509-1

  • Online ISBN: 978-3-031-17510-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics