Abstract
Template attack is the most common and powerful profiled side channel attack. It relies on a realistic assumption regarding the noise of the device under attack: the probability density function of the data is a multivariate Gaussian distribution. To relax this assumption, a recent line of research has investigated new profiling approaches mainly by applying machine learning techniques. The obtained results are commensurate, and in some particular cases better, compared to template attack. In this work, we propose to continue this recent line of research by applying more sophisticated profiling techniques based on deep learning. Our experimental results confirm the overwhelming advantages of the resulting new attacks when targeting both unprotected and protected cryptographic implementations.
T. Portigliatti—Work done when the author was at SAFRAN Identity and Security.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The training data-set is composed of pairs of some known (input, output).
- 2.
The loss (aka cost, error) function quantifies in a supervised learning problem the compatibility between a prediction and the ground truth label (output). The loss function is typically defined as the negative log-likelihood or the mean squared error.
- 3.
Introducing a value that is independent of the input shifts the boundary away from the origin.
- 4.
In the case of the perceptron, the activation function is commonly a Heaviside function. In more complex models (e.g. the multilayer perceptron that we will describe in the next section), this function can be chosen to be a sigmoid function (tanh).
- 5.
E.g. for the Euclidean distance.
- 6.
Perceptrons are also called “units”, “nodes” or neurons in this model.
- 7.
The SNR is defined as the ratio of signal power to the noise power.
- 8.
The goal is to control the size of the output.
- 9.
As for the MLP weights estimations, the filter parameters are learned using the back-propagation algorithm.
- 10.
This is also known as a restricted Boltzmann machine [46].
- 11.
- 12.
This is not mandatory; some empirical results have shown that it might be better to sometimes have more neurons on the first hidden layer than on the output as a “pre-learning” step.
- 13.
The purpose is to reduce the number of parameters to be learned.
- 14.
which is that the distribution of the leakage when the algorithm inputs are fixed is well estimated by a Gaussian Law.
- 15.
This set of traces is typically acquired on an open copy of the targeted device.
- 16.
The couple (\(\mu _z,\varSigma _z\)) represents the template of the value z.
- 17.
The parameters for each attack are detailed in Appendix A.
- 18.
In our attack experiments, we didn’t reported the results of the SVM-based attack since it achieves a comparable results as those obtained for the RF-based attack. The same observations were highlighted in [33].
- 19.
The product combining function maps the leakages of the masked data \((Z \oplus M)\) and the mask (M) into a univariate sample depending on the sensitive data Z.
References
Deep learning website. http://deeplearning.net/tutorial/
Keras library. https://keras.io/
Scikit-learn library. http://scikit-learn.org/stable/
Archambeau, C., Peeters, E., Standaert, F.-X., Quisquater, J.-J.: Template attacks in principal subspaces. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 1–14. Springer, Heidelberg (2006). doi:10.1007/11894063_1
Bartkewitz, T., Lemke-Rust, K.: Efficient template attacks based on probabilistic multi-class support vector machines. In: Mangard, S. (ed.) CARDIS 2012. LNCS, vol. 7771, pp. 263–276. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37288-9_18
Bengio, Y.: Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw. 5(2), 157–166 (1994)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press Inc., New York (1995)
Brier, E., Clavier, C., Olivier, F.: Correlation power analysis with a leakage model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29. Springer, Heidelberg (2004). doi:10.1007/978-3-540-28632-5_2
Chari, S., Rao, J.R., Rohatgi, P.: Template attacks. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg (2003). doi:10.1007/3-540-36400-5_3
Chen, Z., Zhou, Y.: Dual-rail random switching logic: a countermeasure to reduce side channel leakage. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 242–254. Springer, Heidelberg (2006). doi:10.1007/11894063_20
Choudary, O., Kuhn, M.G.: Efficient Template Attacks. Cryptology ePrint Archive, Report 2013/770 (2013). http://eprint.iacr.org/2013/770
Coron, J.-S.: Higher order masking of look-up tables. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS, vol. 8441, pp. 441–458. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55220-5_25
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Signal Process. 7(3–4), 197–387 (2014)
Doget, J., Prouff, E., Rivain, M., Standaert, F.-X.: Univariate side channel attacks and leakage modeling. J. Cryptographic Eng. 1(2), 123–144 (2011)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (2000)
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg (2003)
Genelle, L., Prouff, E., Quisquater, M.: Thwarting higher-order side channel analysis with additive and multiplicative maskings. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 240–255. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23951-9_16
Gierlichs, B., Batina, L., Tuyls, P., Preneel, B.: Mutual information analysis. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 426–442. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85053-3_27
Gilmore, R., Hanley, N., O’Neill, M.: Neural network based attack on a masked implementation of aes. In: 2015 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pp. 106–111, May 2015
Hermans, M., Schrauwen, B.: Training and analysing deep recurrent neural networks. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 190–198. Curran Associates Inc. (2013)
Heuser, A., Zohner, M.: Intelligent machine homicide - breaking cryptographic devices using support vector machines. In: Schindler, W., Huss, S.A. (eds.) COSADE 2012. LNCS, vol. 7275, pp. 249–264. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29912-4_18
Heyszl, J., Ibing, A., Mangard, S., Santis, F.D., Sigl, G.: Clustering Algorithms for Non-Profiled Single-Execution Attacks on Exponentiations. IACR Cryptology ePrint Archive 2013, 438 (2013)
Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(2), 107–116 (1998)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hoogvorst, P., Danger, J.-L., Duc, G.: Software implementation of dual-rail representation. In: COSADE, Darmstadt, Germany, 24–25 February 2011
Hospodar, G., Gierlichs, B., De Mulder, E., Verbauwhede, I., Vandewalle, J.: Machine learning in side-channel analysis: a first study. J. Cryptographic Eng. 1(4), 293–302 (2011)
Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: ICCV, pp. 2146–2153. IEEE (2009)
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). doi:10.1007/3-540-48405-1_25
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: Handbook of Brain Theory and Neural Networks, pp. 255–258. MIT Press, Cambridge (998)
Lerman, L., Bontempi, G., Markowitch, O.: Power analysis attack: an approach based on machine learning. Int. J. Appl. Cryptography 3(2), 97–115 (2014)
Lerman, L., Medeiros, S.F., Bontempi, G., Markowitch, O.: A Machine Learning Approach Against a Masked AES. In: Francillon, A., Rohatgi, P. (eds.) CARDIS 2013. LNCS, vol. 8419, pp. 61–75. Springer, Heidelberg (2014). doi:10.1007/978-3-319-08302-5_5
Lerman, L., Poussier, R., Bontempi, G., Markowitch, O., Standaert, F.-X.: Template attacks vs. machine learning revisited (and the curse of dimensionality in side-channel analysis). In: Mangard, S., Poschmann, A.Y. (eds.) COSADE 2014. LNCS, vol. 9064, pp. 20–33. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21476-4_2
Lomné, V., Prouff, E., Rivain, M., Roche, T., Thillard, A.: How to estimate the success rate of higher-order side-channel attacks, pp. 35–54. Heidelberg (2014)
Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets of Smart Cards. Springer, December 2006. ISBN 0-387-30857-1, http://www.dpabook.org/
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21735-7_7
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1998)
O’Flynn, C., Chen, Z.D.: Chipwhisperer: An open-source platform for hardware embedded security research. Cryptology ePrint Archive, Report 2014/204 (2014). http://eprint.iacr.org/2014/204
O’Shea, K., Nash, R.: An introduction to convolutional neural networks. CoRR, abs/1511.08458 (2015)
Oswald, E., Mangard, S.: Template attacks on masking—resistance is futile. In: Abe, M. (ed.) CT-RSA 2007. LNCS, vol. 4377, pp. 243–256. Springer, Heidelberg (2006). doi:10.1007/11967668_16
Prouff, E., Rivain, M., Bevan, R.: Statistical analysis of second order differential power analysis. IEEE Trans. Computers 58(6), 799–811 (2009)
Rivain, M.: On the exact success rate of side channel analysis in the gaussian model. In: Avanzi, R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381, pp. 165–183. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04159-4_11
Rivain, M., Prouff, E.: Provably secure higher-order masking of AES. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 413–427. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15031-9_28
Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theroy and Applications. World Scientific Publishing Co. Inc., River Edge (2008)
Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 791–798. ACM, New York (2007)
W. Schindler.: Advanced stochastic methods in side channel analysis on block ciphers in the presence of masking. J. Math. Cryptology 2(3), 291–310 (2008). ISSN (Online) 1862–2984. ISSN (Print) 1862–2976. doi:10.1515/JMC.2008.013
Schindler, W., Lemke, K., Paar, C.: A Stochastic Model for Differential Side Channel Cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 30–46. Springer, Heidelberg (2005). doi:10.1007/11545262_3
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Servant, V., Debande, N., Maghrebi, H., Bringer, J.: Study of a novel software constant weight implementation. In: Joye, M., Moradi, A. (eds.) CARDIS 2014. LNCS, vol. 8968, pp. 35–48. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16763-3_3
Silva, T.C., Zhao, L.: Machine Learning in Complex Networks. Springer, Switzerland (2016)
Souissi, Y., Nassar, M., Guilley, S., Danger, J.-L., Flament, F.: First principal components analysis: a new side channel distinguisher. In: Rhee, K.-H., Nyang, D.H. (eds.) ICISC 2010. LNCS, vol. 6829, pp. 407–419. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24209-0_27
Standaert, F.-X., Malkin, T.G., Yung, M.: A unified framework for the analysis of side-channel key recovery attacks. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 443–461. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01001-9_26
TELECOM ParisTech SEN research group. DPA Contest, 2nd edn. (2009–2010). http://www.DPAcontest.org/v2/
TELECOM ParisTech SEN research group.DPA Contest, 4th edn. (2013–2014). http://www.DPAcontest.org/v4/
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 1096–1103. ACM, New York (2008)
Weston, J., Watkins, C.: Multi-class support vector machines (1998)
Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 341–349. Curran Associates Inc. (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Attack Settings
A Attack Settings
Our proposed deep learning attacks are based on Keras library [2]. We provide hereafter the architecture and the used parameters for our deep learning networks.
-
Multilayer Perceptron:
-
Dense input layer: the number of neurons = the number of samples in the processed trace
-
Dense hidden layer: 20 neurons
-
Dense output layer: 256 neurons
-
-
Stacked Auto-Encoder:
-
Dense input layer: the number of neurons = the number of samples in the processed trace
-
Dense hidden layer: 100 neurons
-
Dense hidden layer: 50 neurons
-
Dense hidden layer: 20 neurons
-
Dense output layer: 256 neurons
-
-
Convolutionnal Neural Network:
-
Convolution layer
-
* Number of filters: 8
-
* Filters length: 16
-
* Activation function: Rectified Linear Unit
-
-
Dropout
-
Max pooling layer with a pooling size: 2
-
Convolution layer
-
* Number of filters: 8
-
* Filters length: 8
-
* Activation function: tanh(x)
-
-
Dropout
-
Dense output layer: 256 neurons
-
-
Long and Short Term Memory:
-
LSTM layer: 26 units
-
LSTM layer: 26 units
-
Dense output layer: 256 neurons
-
-
Random Forest: For this machine learning based attack, we have used the scikit-learn python library [3].
-
Number of trees: 300
-
In several published works [23, 28], authors have noticed the influence of the parameters chosen for SVM and RF networks on the attack results. When dealing with deep learning techniques we have observed the same effect. To find the optimal parameters setup for our practical attacks, a deeply analyzed method is detailed in the following section.
1.1 A.1 How to Choose the Optimal Parameters?
When dealing with artificial neural networks, several meta-parameters have to be tuned (e.g. number of layers, number of neurons on each layer, activation function, ...). One common technique to find the optimal parameters is to use evolutionary algorithms [18] and more precisely the so-called genetic algorithm [38].
At the beginning of the algorithm, a population (a set of individuals with different genes) is randomly initialized. In our case, an individual is a list of the parameters we want to estimate (e.g. number of layers, number of neurons on each layer, activation function, ...) and the genes are the corresponding values. Then, the performance of each individual is evaluated using what is called a fitness function. In our context, the fitness function is the guessing entropy outputted by the attack. Said, differently, for each set of parameters we perform the attack and we note the guessing entropy obtained. Only the individuals that achieve good guessing entropy scores are kept. Their genes are mutated and mixed to generate a better population. This process is repeated until a satisfying fitness is achieved (i.e. a guessing entropy equals one).
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Maghrebi, H., Portigliatti, T., Prouff, E. (2016). Breaking Cryptographic Implementations Using Deep Learning Techniques. In: Carlet, C., Hasan, M., Saraswat, V. (eds) Security, Privacy, and Applied Cryptography Engineering. SPACE 2016. Lecture Notes in Computer Science(), vol 10076. Springer, Cham. https://doi.org/10.1007/978-3-319-49445-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-49445-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49444-9
Online ISBN: 978-3-319-49445-6
eBook Packages: Computer ScienceComputer Science (R0)