Skip to main content

Breaking Cryptographic Implementations Using Deep Learning Techniques

  • Conference paper
  • First Online:
Security, Privacy, and Applied Cryptography Engineering (SPACE 2016)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10076))

Abstract

Template attack is the most common and powerful profiled side channel attack. It relies on a realistic assumption regarding the noise of the device under attack: the probability density function of the data is a multivariate Gaussian distribution. To relax this assumption, a recent line of research has investigated new profiling approaches mainly by applying machine learning techniques. The obtained results are commensurate, and in some particular cases better, compared to template attack. In this work, we propose to continue this recent line of research by applying more sophisticated profiling techniques based on deep learning. Our experimental results confirm the overwhelming advantages of the resulting new attacks when targeting both unprotected and protected cryptographic implementations.

T. Portigliatti—Work done when the author was at SAFRAN Identity and Security.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The training data-set is composed of pairs of some known (input, output).

  2. 2.

    The loss (aka cost, error) function quantifies in a supervised learning problem the compatibility between a prediction and the ground truth label (output). The loss function is typically defined as the negative log-likelihood or the mean squared error.

  3. 3.

    Introducing a value that is independent of the input shifts the boundary away from the origin.

  4. 4.

    In the case of the perceptron, the activation function is commonly a Heaviside function. In more complex models (e.g. the multilayer perceptron that we will describe in the next section), this function can be chosen to be a sigmoid function (tanh).

  5. 5.

    E.g. for the Euclidean distance.

  6. 6.

    Perceptrons are also called “units”, “nodes” or neurons in this model.

  7. 7.

    The SNR is defined as the ratio of signal power to the noise power.

  8. 8.

    The goal is to control the size of the output.

  9. 9.

    As for the MLP weights estimations, the filter parameters are learned using the back-propagation algorithm.

  10. 10.

    This is also known as a restricted Boltzmann machine [46].

  11. 11.

    We refer the interested reader to another type of auto-encoder deep learning technique called Denoising auto-encoder [56, 58]. This specific kind of auto-encoder aims at removing the noise when fed with a noisy input.

  12. 12.

    This is not mandatory; some empirical results have shown that it might be better to sometimes have more neurons on the first hidden layer than on the output as a “pre-learning” step.

  13. 13.

    The purpose is to reduce the number of parameters to be learned.

  14. 14.

    which is that the distribution of the leakage when the algorithm inputs are fixed is well estimated by a Gaussian Law.

  15. 15.

    This set of traces is typically acquired on an open copy of the targeted device.

  16. 16.

    The couple (\(\mu _z,\varSigma _z\)) represents the template of the value z.

  17. 17.

    The parameters for each attack are detailed in Appendix A.

  18. 18.

    In our attack experiments, we didn’t reported the results of the SVM-based attack since it achieves a comparable results as those obtained for the RF-based attack. The same observations were highlighted in [33].

  19. 19.

    The product combining function maps the leakages of the masked data \((Z \oplus M)\) and the mask (M) into a univariate sample depending on the sensitive data Z.

References

  1. Deep learning website. http://deeplearning.net/tutorial/

  2. Keras library. https://keras.io/

  3. Scikit-learn library. http://scikit-learn.org/stable/

  4. Archambeau, C., Peeters, E., Standaert, F.-X., Quisquater, J.-J.: Template attacks in principal subspaces. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 1–14. Springer, Heidelberg (2006). doi:10.1007/11894063_1

    Chapter  Google Scholar 

  5. Bartkewitz, T., Lemke-Rust, K.: Efficient template attacks based on probabilistic multi-class support vector machines. In: Mangard, S. (ed.) CARDIS 2012. LNCS, vol. 7771, pp. 263–276. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37288-9_18

    Chapter  Google Scholar 

  6. Bengio, Y.: Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1), 1–127 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw. 5(2), 157–166 (1994)

    Article  Google Scholar 

  8. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press Inc., New York (1995)

    Google Scholar 

  9. Brier, E., Clavier, C., Olivier, F.: Correlation power analysis with a leakage model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29. Springer, Heidelberg (2004). doi:10.1007/978-3-540-28632-5_2

    Chapter  Google Scholar 

  10. Chari, S., Rao, J.R., Rohatgi, P.: Template attacks. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg (2003). doi:10.1007/3-540-36400-5_3

    Chapter  Google Scholar 

  11. Chen, Z., Zhou, Y.: Dual-rail random switching logic: a countermeasure to reduce side channel leakage. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 242–254. Springer, Heidelberg (2006). doi:10.1007/11894063_20

    Chapter  Google Scholar 

  12. Choudary, O., Kuhn, M.G.: Efficient Template Attacks. Cryptology ePrint Archive, Report 2013/770 (2013). http://eprint.iacr.org/2013/770

  13. Coron, J.-S.: Higher order masking of look-up tables. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS, vol. 8441, pp. 441–458. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55220-5_25

    Chapter  Google Scholar 

  14. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  15. Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Signal Process. 7(3–4), 197–387 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  16. Doget, J., Prouff, E., Rivain, M., Standaert, F.-X.: Univariate side channel attacks and leakage modeling. J. Cryptographic Eng. 1(2), 123–144 (2011)

    Article  Google Scholar 

  17. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (2000)

    Google Scholar 

  18. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg (2003)

    Book  MATH  Google Scholar 

  19. Genelle, L., Prouff, E., Quisquater, M.: Thwarting higher-order side channel analysis with additive and multiplicative maskings. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 240–255. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23951-9_16

    Chapter  Google Scholar 

  20. Gierlichs, B., Batina, L., Tuyls, P., Preneel, B.: Mutual information analysis. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 426–442. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85053-3_27

    Chapter  Google Scholar 

  21. Gilmore, R., Hanley, N., O’Neill, M.: Neural network based attack on a masked implementation of aes. In: 2015 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pp. 106–111, May 2015

    Google Scholar 

  22. Hermans, M., Schrauwen, B.: Training and analysing deep recurrent neural networks. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 190–198. Curran Associates Inc. (2013)

    Google Scholar 

  23. Heuser, A., Zohner, M.: Intelligent machine homicide - breaking cryptographic devices using support vector machines. In: Schindler, W., Huss, S.A. (eds.) COSADE 2012. LNCS, vol. 7275, pp. 249–264. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29912-4_18

    Chapter  Google Scholar 

  24. Heyszl, J., Ibing, A., Mangard, S., Santis, F.D., Sigl, G.: Clustering Algorithms for Non-Profiled Single-Execution Attacks on Exponentiations. IACR Cryptology ePrint Archive 2013, 438 (2013)

    Google Scholar 

  25. Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(2), 107–116 (1998)

    Google Scholar 

  26. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  27. Hoogvorst, P., Danger, J.-L., Duc, G.: Software implementation of dual-rail representation. In: COSADE, Darmstadt, Germany, 24–25 February 2011

    Google Scholar 

  28. Hospodar, G., Gierlichs, B., De Mulder, E., Verbauwhede, I., Vandewalle, J.: Machine learning in side-channel analysis: a first study. J. Cryptographic Eng. 1(4), 293–302 (2011)

    Article  Google Scholar 

  29. Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: ICCV, pp. 2146–2153. IEEE (2009)

    Google Scholar 

  30. Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). doi:10.1007/3-540-48405-1_25

    Chapter  Google Scholar 

  31. LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: Handbook of Brain Theory and Neural Networks, pp. 255–258. MIT Press, Cambridge (998)

    Google Scholar 

  32. Lerman, L., Bontempi, G., Markowitch, O.: Power analysis attack: an approach based on machine learning. Int. J. Appl. Cryptography 3(2), 97–115 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  33. Lerman, L., Medeiros, S.F., Bontempi, G., Markowitch, O.: A Machine Learning Approach Against a Masked AES. In: Francillon, A., Rohatgi, P. (eds.) CARDIS 2013. LNCS, vol. 8419, pp. 61–75. Springer, Heidelberg (2014). doi:10.1007/978-3-319-08302-5_5

    Google Scholar 

  34. Lerman, L., Poussier, R., Bontempi, G., Markowitch, O., Standaert, F.-X.: Template attacks vs. machine learning revisited (and the curse of dimensionality in side-channel analysis). In: Mangard, S., Poschmann, A.Y. (eds.) COSADE 2014. LNCS, vol. 9064, pp. 20–33. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21476-4_2

    Chapter  Google Scholar 

  35. Lomné, V., Prouff, E., Rivain, M., Roche, T., Thillard, A.: How to estimate the success rate of higher-order side-channel attacks, pp. 35–54. Heidelberg (2014)

    Google Scholar 

  36. Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets of Smart Cards. Springer, December 2006. ISBN 0-387-30857-1, http://www.dpabook.org/

  37. Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21735-7_7

    Chapter  Google Scholar 

  38. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1998)

    Google Scholar 

  39. O’Flynn, C., Chen, Z.D.: Chipwhisperer: An open-source platform for hardware embedded security research. Cryptology ePrint Archive, Report 2014/204 (2014). http://eprint.iacr.org/2014/204

  40. O’Shea, K., Nash, R.: An introduction to convolutional neural networks. CoRR, abs/1511.08458 (2015)

    Google Scholar 

  41. Oswald, E., Mangard, S.: Template attacks on masking—resistance is futile. In: Abe, M. (ed.) CT-RSA 2007. LNCS, vol. 4377, pp. 243–256. Springer, Heidelberg (2006). doi:10.1007/11967668_16

    Chapter  Google Scholar 

  42. Prouff, E., Rivain, M., Bevan, R.: Statistical analysis of second order differential power analysis. IEEE Trans. Computers 58(6), 799–811 (2009)

    Article  MathSciNet  Google Scholar 

  43. Rivain, M.: On the exact success rate of side channel analysis in the gaussian model. In: Avanzi, R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381, pp. 165–183. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04159-4_11

    Chapter  Google Scholar 

  44. Rivain, M., Prouff, E.: Provably secure higher-order masking of AES. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 413–427. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15031-9_28

    Chapter  Google Scholar 

  45. Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theroy and Applications. World Scientific Publishing Co. Inc., River Edge (2008)

    MATH  Google Scholar 

  46. Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 791–798. ACM, New York (2007)

    Google Scholar 

  47. W. Schindler.: Advanced stochastic methods in side channel analysis on block ciphers in the presence of masking. J. Math. Cryptology 2(3), 291–310 (2008). ISSN (Online) 1862–2984. ISSN (Print) 1862–2976. doi:10.1515/JMC.2008.013

  48. Schindler, W., Lemke, K., Paar, C.: A Stochastic Model for Differential Side Channel Cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 30–46. Springer, Heidelberg (2005). doi:10.1007/11545262_3

    Chapter  Google Scholar 

  49. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)

    Google Scholar 

  50. Servant, V., Debande, N., Maghrebi, H., Bringer, J.: Study of a novel software constant weight implementation. In: Joye, M., Moradi, A. (eds.) CARDIS 2014. LNCS, vol. 8968, pp. 35–48. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16763-3_3

    Google Scholar 

  51. Silva, T.C., Zhao, L.: Machine Learning in Complex Networks. Springer, Switzerland (2016)

    Book  Google Scholar 

  52. Souissi, Y., Nassar, M., Guilley, S., Danger, J.-L., Flament, F.: First principal components analysis: a new side channel distinguisher. In: Rhee, K.-H., Nyang, D.H. (eds.) ICISC 2010. LNCS, vol. 6829, pp. 407–419. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24209-0_27

    Chapter  Google Scholar 

  53. Standaert, F.-X., Malkin, T.G., Yung, M.: A unified framework for the analysis of side-channel key recovery attacks. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 443–461. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01001-9_26

    Chapter  Google Scholar 

  54. TELECOM ParisTech SEN research group. DPA Contest, 2nd edn. (2009–2010). http://www.DPAcontest.org/v2/

  55. TELECOM ParisTech SEN research group.DPA Contest, 4th edn. (2013–2014). http://www.DPAcontest.org/v4/

  56. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 1096–1103. ACM, New York (2008)

    Google Scholar 

  57. Weston, J., Watkins, C.: Multi-class support vector machines (1998)

    Google Scholar 

  58. Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 341–349. Curran Associates Inc. (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emmanuel Prouff .

Editor information

Editors and Affiliations

A Attack Settings

A Attack Settings

Our proposed deep learning attacks are based on Keras library [2]. We provide hereafter the architecture and the used parameters for our deep learning networks.

  • Multilayer Perceptron:

    • Dense input layer: the number of neurons = the number of samples in the processed trace

    • Dense hidden layer: 20 neurons

    • Dense output layer: 256 neurons

  • Stacked Auto-Encoder:

    • Dense input layer: the number of neurons = the number of samples in the processed trace

    • Dense hidden layer: 100 neurons

    • Dense hidden layer: 50 neurons

    • Dense hidden layer: 20 neurons

    • Dense output layer: 256 neurons

  • Convolutionnal Neural Network:

    • Convolution layer

      • * Number of filters: 8

      • * Filters length: 16

      • * Activation function: Rectified Linear Unit

    • Dropout

    • Max pooling layer with a pooling size: 2

    • Convolution layer

      • * Number of filters: 8

      • * Filters length: 8

      • * Activation function: tanh(x)

    • Dropout

    • Dense output layer: 256 neurons

  • Long and Short Term Memory:

    • LSTM layer: 26 units

    • LSTM layer: 26 units

    • Dense output layer: 256 neurons

  • Random Forest: For this machine learning based attack, we have used the scikit-learn python library [3].

    • Number of trees: 300

In several published works [23, 28], authors have noticed the influence of the parameters chosen for SVM and RF networks on the attack results. When dealing with deep learning techniques we have observed the same effect. To find the optimal parameters setup for our practical attacks, a deeply analyzed method is detailed in the following section.

1.1 A.1 How to Choose the Optimal Parameters?

When dealing with artificial neural networks, several meta-parameters have to be tuned (e.g. number of layers, number of neurons on each layer, activation function, ...). One common technique to find the optimal parameters is to use evolutionary algorithms [18] and more precisely the so-called genetic algorithm [38].

At the beginning of the algorithm, a population (a set of individuals with different genes) is randomly initialized. In our case, an individual is a list of the parameters we want to estimate (e.g. number of layers, number of neurons on each layer, activation function, ...) and the genes are the corresponding values. Then, the performance of each individual is evaluated using what is called a fitness function. In our context, the fitness function is the guessing entropy outputted by the attack. Said, differently, for each set of parameters we perform the attack and we note the guessing entropy obtained. Only the individuals that achieve good guessing entropy scores are kept. Their genes are mutated and mixed to generate a better population. This process is repeated until a satisfying fitness is achieved (i.e. a guessing entropy equals one).

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Maghrebi, H., Portigliatti, T., Prouff, E. (2016). Breaking Cryptographic Implementations Using Deep Learning Techniques. In: Carlet, C., Hasan, M., Saraswat, V. (eds) Security, Privacy, and Applied Cryptography Engineering. SPACE 2016. Lecture Notes in Computer Science(), vol 10076. Springer, Cham. https://doi.org/10.1007/978-3-319-49445-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49445-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49444-9

  • Online ISBN: 978-3-319-49445-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics