Breaking Cryptographic Implementations Using Deep Learning Techniques

Maghrebi, Houssem; Portigliatti, Thibault; Prouff, Emmanuel

doi:10.1007/978-3-319-49445-6_1

Houssem Maghrebi¹⁶,
Thibault Portigliatti¹⁶ &
Emmanuel Prouff¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10076))

Included in the following conference series:

International Conference on Security, Privacy, and Applied Cryptography Engineering

3504 Accesses
197 Citations

Abstract

Template attack is the most common and powerful profiled side channel attack. It relies on a realistic assumption regarding the noise of the device under attack: the probability density function of the data is a multivariate Gaussian distribution. To relax this assumption, a recent line of research has investigated new profiling approaches mainly by applying machine learning techniques. The obtained results are commensurate, and in some particular cases better, compared to template attack. In this work, we propose to continue this recent line of research by applying more sophisticated profiling techniques based on deep learning. Our experimental results confirm the overwhelming advantages of the resulting new attacks when targeting both unprotected and protected cryptographic implementations.

T. Portigliatti—Work done when the author was at SAFRAN Identity and Security.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The training data-set is composed of pairs of some known (input, output).
2.
The loss (aka cost, error) function quantifies in a supervised learning problem the compatibility between a prediction and the ground truth label (output). The loss function is typically defined as the negative log-likelihood or the mean squared error.
3.
Introducing a value that is independent of the input shifts the boundary away from the origin.
4.
In the case of the perceptron, the activation function is commonly a Heaviside function. In more complex models (e.g. the multilayer perceptron that we will describe in the next section), this function can be chosen to be a sigmoid function (tanh).
5.
E.g. for the Euclidean distance.
6.
Perceptrons are also called “units”, “nodes” or neurons in this model.
7.
The SNR is defined as the ratio of signal power to the noise power.
8.
The goal is to control the size of the output.
9.
As for the MLP weights estimations, the filter parameters are learned using the back-propagation algorithm.
10.
This is also known as a restricted Boltzmann machine [46].
11.
We refer the interested reader to another type of auto-encoder deep learning technique called Denoising auto-encoder [56, 58]. This specific kind of auto-encoder aims at removing the noise when fed with a noisy input.
12.
This is not mandatory; some empirical results have shown that it might be better to sometimes have more neurons on the first hidden layer than on the output as a “pre-learning” step.
13.
The purpose is to reduce the number of parameters to be learned.
14.
which is that the distribution of the leakage when the algorithm inputs are fixed is well estimated by a Gaussian Law.
15.
This set of traces is typically acquired on an open copy of the targeted device.
16.
The couple (\(\mu _z,\varSigma _z\)) represents the template of the value z.
17.
The parameters for each attack are detailed in Appendix A.
18.
In our attack experiments, we didn’t reported the results of the SVM-based attack since it achieves a comparable results as those obtained for the RF-based attack. The same observations were highlighted in [33].
19.
The product combining function maps the leakages of the masked data \((Z \oplus M)\) and the mask (M) into a univariate sample depending on the sensitive data Z.

References

Deep learning website. http://deeplearning.net/tutorial/
Keras library. https://keras.io/
Scikit-learn library. http://scikit-learn.org/stable/
Archambeau, C., Peeters, E., Standaert, F.-X., Quisquater, J.-J.: Template attacks in principal subspaces. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 1–14. Springer, Heidelberg (2006). doi:10.1007/11894063_1
Chapter Google Scholar
Bartkewitz, T., Lemke-Rust, K.: Efficient template attacks based on probabilistic multi-class support vector machines. In: Mangard, S. (ed.) CARDIS 2012. LNCS, vol. 7771, pp. 263–276. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37288-9_18
Chapter Google Scholar
Bengio, Y.: Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MathSciNet MATH Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. Trans. Neur. Netw. 5(2), 157–166 (1994)
Article Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press Inc., New York (1995)
Google Scholar
Brier, E., Clavier, C., Olivier, F.: Correlation power analysis with a leakage model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29. Springer, Heidelberg (2004). doi:10.1007/978-3-540-28632-5_2
Chapter Google Scholar
Chari, S., Rao, J.R., Rohatgi, P.: Template attacks. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg (2003). doi:10.1007/3-540-36400-5_3
Chapter Google Scholar
Chen, Z., Zhou, Y.: Dual-rail random switching logic: a countermeasure to reduce side channel leakage. In: Goubin, L., Matsui, M. (eds.) CHES 2006. LNCS, vol. 4249, pp. 242–254. Springer, Heidelberg (2006). doi:10.1007/11894063_20
Chapter Google Scholar
Choudary, O., Kuhn, M.G.: Efficient Template Attacks. Cryptology ePrint Archive, Report 2013/770 (2013). http://eprint.iacr.org/2013/770
Coron, J.-S.: Higher order masking of look-up tables. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. LNCS, vol. 8441, pp. 441–458. Springer, Heidelberg (2014). doi:10.1007/978-3-642-55220-5_25
Chapter Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Deng, L., Yu, D.: Deep learning: methods and applications. Found. Trends Signal Process. 7(3–4), 197–387 (2014)
Article MathSciNet MATH Google Scholar
Doget, J., Prouff, E., Rivain, M., Standaert, F.-X.: Univariate side channel attacks and leakage modeling. J. Cryptographic Eng. 1(2), 123–144 (2011)
Article Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (2000)
Google Scholar
Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer, Heidelberg (2003)
Book MATH Google Scholar
Genelle, L., Prouff, E., Quisquater, M.: Thwarting higher-order side channel analysis with additive and multiplicative maskings. In: Preneel, B., Takagi, T. (eds.) CHES 2011. LNCS, vol. 6917, pp. 240–255. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23951-9_16
Chapter Google Scholar
Gierlichs, B., Batina, L., Tuyls, P., Preneel, B.: Mutual information analysis. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 426–442. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85053-3_27
Chapter Google Scholar
Gilmore, R., Hanley, N., O’Neill, M.: Neural network based attack on a masked implementation of aes. In: 2015 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pp. 106–111, May 2015
Google Scholar
Hermans, M., Schrauwen, B.: Training and analysing deep recurrent neural networks. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 190–198. Curran Associates Inc. (2013)
Google Scholar
Heuser, A., Zohner, M.: Intelligent machine homicide - breaking cryptographic devices using support vector machines. In: Schindler, W., Huss, S.A. (eds.) COSADE 2012. LNCS, vol. 7275, pp. 249–264. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29912-4_18
Chapter Google Scholar
Heyszl, J., Ibing, A., Mangard, S., Santis, F.D., Sigl, G.: Clustering Algorithms for Non-Profiled Single-Execution Attacks on Exponentiations. IACR Cryptology ePrint Archive 2013, 438 (2013)
Google Scholar
Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 6(2), 107–116 (1998)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hoogvorst, P., Danger, J.-L., Duc, G.: Software implementation of dual-rail representation. In: COSADE, Darmstadt, Germany, 24–25 February 2011
Google Scholar
Hospodar, G., Gierlichs, B., De Mulder, E., Verbauwhede, I., Vandewalle, J.: Machine learning in side-channel analysis: a first study. J. Cryptographic Eng. 1(4), 293–302 (2011)
Article Google Scholar
Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: ICCV, pp. 2146–2153. IEEE (2009)
Google Scholar
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). doi:10.1007/3-540-48405-1_25
Chapter Google Scholar
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: Handbook of Brain Theory and Neural Networks, pp. 255–258. MIT Press, Cambridge (998)
Google Scholar
Lerman, L., Bontempi, G., Markowitch, O.: Power analysis attack: an approach based on machine learning. Int. J. Appl. Cryptography 3(2), 97–115 (2014)
Article MathSciNet MATH Google Scholar
Lerman, L., Medeiros, S.F., Bontempi, G., Markowitch, O.: A Machine Learning Approach Against a Masked AES. In: Francillon, A., Rohatgi, P. (eds.) CARDIS 2013. LNCS, vol. 8419, pp. 61–75. Springer, Heidelberg (2014). doi:10.1007/978-3-319-08302-5_5
Google Scholar
Lerman, L., Poussier, R., Bontempi, G., Markowitch, O., Standaert, F.-X.: Template attacks vs. machine learning revisited (and the curse of dimensionality in side-channel analysis). In: Mangard, S., Poschmann, A.Y. (eds.) COSADE 2014. LNCS, vol. 9064, pp. 20–33. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21476-4_2
Chapter Google Scholar
Lomné, V., Prouff, E., Rivain, M., Roche, T., Thillard, A.: How to estimate the success rate of higher-order side-channel attacks, pp. 35–54. Heidelberg (2014)
Google Scholar
Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets of Smart Cards. Springer, December 2006. ISBN 0-387-30857-1, http://www.dpabook.org/
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21735-7_7
Chapter Google Scholar
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1998)
Google Scholar
O’Flynn, C., Chen, Z.D.: Chipwhisperer: An open-source platform for hardware embedded security research. Cryptology ePrint Archive, Report 2014/204 (2014). http://eprint.iacr.org/2014/204
O’Shea, K., Nash, R.: An introduction to convolutional neural networks. CoRR, abs/1511.08458 (2015)
Google Scholar
Oswald, E., Mangard, S.: Template attacks on masking—resistance is futile. In: Abe, M. (ed.) CT-RSA 2007. LNCS, vol. 4377, pp. 243–256. Springer, Heidelberg (2006). doi:10.1007/11967668_16
Chapter Google Scholar
Prouff, E., Rivain, M., Bevan, R.: Statistical analysis of second order differential power analysis. IEEE Trans. Computers 58(6), 799–811 (2009)
Article MathSciNet Google Scholar
Rivain, M.: On the exact success rate of side channel analysis in the gaussian model. In: Avanzi, R.M., Keliher, L., Sica, F. (eds.) SAC 2008. LNCS, vol. 5381, pp. 165–183. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04159-4_11
Chapter Google Scholar
Rivain, M., Prouff, E.: Provably secure higher-order masking of AES. In: Mangard, S., Standaert, F.-X. (eds.) CHES 2010. LNCS, vol. 6225, pp. 413–427. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15031-9_28
Chapter Google Scholar
Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theroy and Applications. World Scientific Publishing Co. Inc., River Edge (2008)
MATH Google Scholar
Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 791–798. ACM, New York (2007)
Google Scholar
W. Schindler.: Advanced stochastic methods in side channel analysis on block ciphers in the presence of masking. J. Math. Cryptology 2(3), 291–310 (2008). ISSN (Online) 1862–2984. ISSN (Print) 1862–2976. doi:10.1515/JMC.2008.013
Schindler, W., Lemke, K., Paar, C.: A Stochastic Model for Differential Side Channel Cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 30–46. Springer, Heidelberg (2005). doi:10.1007/11545262_3
Chapter Google Scholar
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Google Scholar
Servant, V., Debande, N., Maghrebi, H., Bringer, J.: Study of a novel software constant weight implementation. In: Joye, M., Moradi, A. (eds.) CARDIS 2014. LNCS, vol. 8968, pp. 35–48. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16763-3_3
Google Scholar
Silva, T.C., Zhao, L.: Machine Learning in Complex Networks. Springer, Switzerland (2016)
Book Google Scholar
Souissi, Y., Nassar, M., Guilley, S., Danger, J.-L., Flament, F.: First principal components analysis: a new side channel distinguisher. In: Rhee, K.-H., Nyang, D.H. (eds.) ICISC 2010. LNCS, vol. 6829, pp. 407–419. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24209-0_27
Chapter Google Scholar
Standaert, F.-X., Malkin, T.G., Yung, M.: A unified framework for the analysis of side-channel key recovery attacks. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 443–461. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01001-9_26
Chapter Google Scholar
TELECOM ParisTech SEN research group. DPA Contest, 2nd edn. (2009–2010). http://www.DPAcontest.org/v2/
TELECOM ParisTech SEN research group.DPA Contest, 4th edn. (2013–2014). http://www.DPAcontest.org/v4/
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 1096–1103. ACM, New York (2008)
Google Scholar
Weston, J., Watkins, C.: Multi-class support vector machines (1998)
Google Scholar
Xie, J., Xu, L., Chen, E.: Image denoising and inpainting with deep neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 341–349. Curran Associates Inc. (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

SAFRAN Identity and Security, 18, Chaussée Jules César, 95520, Osny, France
Houssem Maghrebi, Thibault Portigliatti & Emmanuel Prouff

Authors

Houssem Maghrebi
View author publications
You can also search for this author in PubMed Google Scholar
Thibault Portigliatti
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Prouff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emmanuel Prouff .

Editor information

Editors and Affiliations

Universities of Paris 8 and Paris 13, LAGA, Paris, France
Claude Carlet
University of Waterloo, Waterloo, Ontario, Canada
M. Anwar Hasan
CRRao AIMSCS, Hyderabad, India
Vishal Saraswat

A Attack Settings

Our proposed deep learning attacks are based on Keras library [2]. We provide hereafter the architecture and the used parameters for our deep learning networks.

Multilayer Perceptron:
- Dense input layer: the number of neurons = the number of samples in the processed trace
- Dense hidden layer: 20 neurons
- Dense output layer: 256 neurons
Stacked Auto-Encoder:
- Dense input layer: the number of neurons = the number of samples in the processed trace
- Dense hidden layer: 100 neurons
- Dense hidden layer: 50 neurons
- Dense hidden layer: 20 neurons
- Dense output layer: 256 neurons
Convolutionnal Neural Network:
- Convolution layer
  - * Number of filters: 8
  - * Filters length: 16
  - * Activation function: Rectified Linear Unit
- Dropout
- Max pooling layer with a pooling size: 2
- Convolution layer
  - * Number of filters: 8
  - * Filters length: 8
  - * Activation function: tanh(x)
- Dropout
- Dense output layer: 256 neurons
Long and Short Term Memory:
- LSTM layer: 26 units
- LSTM layer: 26 units
- Dense output layer: 256 neurons
Random Forest: For this machine learning based attack, we have used the scikit-learn python library [3].
- Number of trees: 300

In several published works [23, 28], authors have noticed the influence of the parameters chosen for SVM and RF networks on the attack results. When dealing with deep learning techniques we have observed the same effect. To find the optimal parameters setup for our practical attacks, a deeply analyzed method is detailed in the following section.

1.1 A.1 How to Choose the Optimal Parameters?

When dealing with artificial neural networks, several meta-parameters have to be tuned (e.g. number of layers, number of neurons on each layer, activation function, ...). One common technique to find the optimal parameters is to use evolutionary algorithms [18] and more precisely the so-called genetic algorithm [38].

At the beginning of the algorithm, a population (a set of individuals with different genes) is randomly initialized. In our case, an individual is a list of the parameters we want to estimate (e.g. number of layers, number of neurons on each layer, activation function, ...) and the genes are the corresponding values. Then, the performance of each individual is evaluated using what is called a fitness function. In our context, the fitness function is the guessing entropy outputted by the attack. Said, differently, for each set of parameters we perform the attack and we note the guessing entropy obtained. Only the individuals that achieve good guessing entropy scores are kept. Their genes are mutated and mixed to generate a better population. This process is repeated until a satisfying fitness is achieved (i.e. a guessing entropy equals one).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maghrebi, H., Portigliatti, T., Prouff, E. (2016). Breaking Cryptographic Implementations Using Deep Learning Techniques. In: Carlet, C., Hasan, M., Saraswat, V. (eds) Security, Privacy, and Applied Cryptography Engineering. SPACE 2016. Lecture Notes in Computer Science(), vol 10076. Springer, Cham. https://doi.org/10.1007/978-3-319-49445-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-49445-6_1
Published: 18 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49444-9
Online ISBN: 978-3-319-49445-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Breaking Cryptographic Implementations Using Deep Learning Techniques

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Attack Settings

A Attack Settings

1.1 A.1 How to Choose the Optimal Parameters?

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation