On the Influence of Optimizers in Deep Learning-Based Side-Channel Analysis

Perin, Guilherme; Picek, Stjepan

doi:10.1007/978-3-030-81652-0_24

On the Influence of Optimizers in Deep Learning-Based Side-Channel Analysis

Guilherme Perin¹¹ &
Stjepan Picek¹¹

Conference paper
First Online: 21 July 2021

872 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12804))

Abstract

The deep learning-based side-channel analysis represents a powerful and easy to deploy option for profiling side-channel attacks. A detailed tuning phase is often required to reach a good performance where one first needs to select relevant hyperparameters and then tune them. A common selection for the tuning phase are hyperparameters connected with the neural network architecture, while those influencing the training process are less explored.

In this work, we concentrate on the optimizer hyperparameter, and we show that this hyperparameter has a significant role in the attack performance. Our results show that common choices of optimizers (Adam and RMSprop) indeed work well, but they easily overfit, which means that we must use short training phases, small profiling models, and explicit regularization. On the other hand, SGD type of optimizers works well on average (slower convergence and less overfit), but only if momentum is used. Finally, our results show that Adagrad represents a strong option to use in scenarios with longer training phases or larger profiling models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The parameters are the configuration variables internal to the model and whose values can be estimated from data.
2.
The hyperparameters are all those configuration variables that are external to the model.

References

Brier, E., Clavier, C., Olivier, F.: Correlation power analysis with a leakage model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28632-5_2
Chapter Google Scholar
Cagli, E., Dumas, C., Prouff, E.: Convolutional neural networks with data augmentation against jitter-based countermeasures. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 45–68. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_3
Chapter Google Scholar
Chari, S., Rao, J.R., Rohatgi, P.: Template attacks. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36400-5_3
Chapter Google Scholar
Choi, D., Shallue, C.J., Nado, Z., Lee, J., Maddison, C.J., Dahl, G.E.: On empirical comparisons of optimizers for deep learning (2019)
Google Scholar
Collobert, R., Bengio, S.: Links between perceptrons, MLPs and SVMs. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, p. 23. ACM, New York (2004). https://doi.org/10.1145/1015330.1015415
Gierlichs, B., Batina, L., Tuyls, P., Preneel, B.: Mutual information analysis. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 426–442. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85053-3_27
Chapter Google Scholar
Gilmore, R., Hanley, N., O’Neill, M.: Neural network based attack on a masked implementation of AES. In: 2015 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pp. 106–111, May 2015. https://doi.org/10.1109/HST.2015.7140247
Hettwer, B., Gehrer, S., Güneysu, T.: Deep neural network attribution methods for leakage analysis and symmetric key recovery. In: Paterson, K.G., Stebila, D. (eds.) SAC 2019. LNCS, vol. 11959, pp. 645–666. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38471-5_26
Chapter Google Scholar
Heuser, A., Picek, S., Guilley, S., Mentens, N.: Side-channel analysis of lightweight ciphers: does lightweight equal easy? In: Hancke, G.P., Markantonakis, K. (eds.) RFIDSec 2016. LNCS, vol. 10155, pp. 91–104. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62024-4_7
Chapter Google Scholar
Heuser, A., Zohner, M.: Intelligent machine homicide. In: Schindler, W., Huss, S.A. (eds.) COSADE 2012. LNCS, vol. 7275, pp. 249–264. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29912-4_18
Chapter Google Scholar
Kim, J., Picek, S., Heuser, A., Bhasin, S., Hanjalic, A.: Make some noise. Unleashing the power of convolutional neural networks for profiled side-channel analysis. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019(3), 148–179 (2019). https://doi.org/10.13154/tches.v2019.i3.148-179. https://tches.iacr.org/index.php/TCHES/article/view/8292
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_25. http://dl.acm.org/citation.cfm?id=646764.703989
Chapter Google Scholar
Li, H., Krcek, M., Perin, G.: A comparison of weight initializers in deep learning-based side-channel analysis. IACR Cryptology ePrint Archive 2020/904 (2020). https://eprint.iacr.org/2020/904
Luo, L., Xiong, Y., Liu, Y., Sun, X.: Adaptive gradient methods with dynamic bound of learning rate. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net (2019). https://openreview.net/forum?id=Bkg3g2R9FX
Maghrebi, H., Portigliatti, T., Prouff, E.: Breaking cryptographic implementations using deep learning techniques. In: Carlet, C., Hasan, M.A., Saraswat, V. (eds.) SPACE 2016. LNCS, vol. 10076, pp. 3–26. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49445-6_1
Chapter Google Scholar
Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets of Smart Cards. Springer, Heidelberg (2007). https://doi.org/10.1007/978-0-387-38162-6. http://www.dpabook.org/. ISBN: 978-0-387-30857-9
Book MATH Google Scholar
Masure, L., Dumas, C., Prouff, E.: Gradient visualization for general characterization in profiling attacks. In: Polian, I., Stöttinger, M. (eds.) COSADE 2019. LNCS, vol. 11421, pp. 145–167. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16350-1_9
Chapter Google Scholar
Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: where bigger models and more data hurt. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020). https://openreview.net/forum?id=B1g5sA4twr
Perin, G., Buhan, I., Picek, S.: Learning when to stop: a mutual information approach to fight overfitting in profiled side-channel analysis. Cryptology ePrint Archive, report 2020/058 (2020). https://eprint.iacr.org/2020/058
Perin, G., Chmielewski, L., Picek, S.: Strength in numbers: improving generalization with ensembles in machine learning-based profiled side-channel analysis. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020(4), 337–364 (2020). https://doi.org/10.13154/tches.v2020.i4.337-364. https://tches.iacr.org/index.php/TCHES/article/view/8686
Picek, S., Heuser, A., Jovic, A., Bhasin, S., Regazzoni, F.: The curse of class imbalance and conflicting metrics with machine learning for side-channel evaluations. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019(1), 209–237 (2019). https://doi.org/10.13154/tches.v2019.i1.209-237
Article Google Scholar
Picek, S., et al.: Side-channel analysis and machine learning: a practical perspective. In: 2017 International Joint Conference on Neural Networks, IJCNN 2017, Anchorage, AK, USA, 14–19 May 2017, pp. 4095–4102 (2017)
Google Scholar
Picek, S., Samiotis, I.P., Kim, J., Heuser, A., Bhasin, S., Legay, A.: On the performance of convolutional neural networks for side-channel analysis. In: Chattopadhyay, A., Rebeiro, C., Yarom, Y. (eds.) SPACE 2018. LNCS, vol. 11348, pp. 157–176. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05072-6_10
Chapter Google Scholar
Prouff, E., Strullu, R., Benadjila, R., Cagli, E., Dumas, C.: Study of deep learning techniques for side-channel analysis and introduction to ASCAD database. Cryptology ePrint Archive, report 2018/053 (2018). https://eprint.iacr.org/2018/053
Quisquater, J.-J., Samyde, D.: Electro magnetic analysis (EMA): measures and counter-measures for smart cards. In: Attali, I., Jensen, T. (eds.) E-smart 2001. LNCS, vol. 2140, pp. 200–210. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45418-7_17
Chapter MATH Google Scholar
Robissout, D., Zaid, G., Colombier, B., Bossuet, L., Habrard, A.: Online performance evaluation of deep learning networks for side-channel analysis. Cryptology ePrint Archive, report 2020/039 (2020). https://eprint.iacr.org/2020/039
Schindler, W., Lemke, K., Paar, C.: A stochastic model for differential side channel cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 30–46. Springer, Heidelberg (2005). https://doi.org/10.1007/11545262_3
Chapter Google Scholar
Standaert, F.-X., Malkin, T.G., Yung, M.: A unified framework for the analysis of side-channel key recovery attacks. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 443–461. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01001-9_26
Chapter Google Scholar
Weissbart, L.: On the performance of multilayer perceptron in profiling side-channel analysis. Cryptology ePrint Archive, report 2019/1476 (2019). https://eprint.iacr.org/2019/1476
Wilson, A.C., Roelofs, R., Stern, M., Srebro, N., Recht, B.: The marginal value of adaptive gradient methods in machine learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 December 2017, Long Beach, CA, USA, pp. 4148–4158 (2017). http://papers.nips.cc/paper/7003-the-marginal-value-of-adaptive-gradient-methods-in-machine-learning
Zaid, G., Bossuet, L., Habrard, A., Venelli, A.: Methodology for efficient CNN architectures in profiling attacks. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020(1), 1–36 (2019). https://doi.org/10.13154/tches.v2020.i1.1-36. https://tches.iacr.org/index.php/TCHES/article/view/8391

Download references

Author information

Authors and Affiliations

Delft University of Technology, Delft, The Netherlands
Guilherme Perin & Stjepan Picek

Authors

Guilherme Perin
View author publications
You can also search for this author in PubMed Google Scholar
Stjepan Picek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Haifa, Haifa, Israel
Orr Dunkelman
University of Calgary, Calgary, AB, Canada
Michael J. Jacobson, Jr.
Dalhousie University, Halifax, NS, Canada
Colin O'Flynn

Appendices

A Machine Learning Classifiers

We consider two neural network types that are standard techniques in the profiling SCA: multilayer perceptron and convolutional neural networks.

Multilayer Perceptron. The multilayer perceptron (MLP) is a feed-forward neural network that maps sets of inputs onto sets of appropriate outputs. MLP consists of multiple layers of nodes in a directed graph, where each layer is fully connected to the next one (thus, layers are called fully-connected or dense layers). An MLP consists of three or more layers (since input and output represent two layers) of nonlinearly-activating nodes [5].

Convolutional Neural Networks. Convolutional neural networks(CNNs) are feed-forward neural networks commonly consisting of three types of layers: convolutional layers, pooling layers, and fully-connected layers. The convolution layer computes neurons’ output connected to local regions in the input, each computing a dot product between their weights and a small region connected to the input volume. Pooling decrease the number of extracted features by performing a down-sampling operation along the spatial dimensions. The fully-connected layer (the same as in multilayer perceptron) computes either the hidden activations or the class scores.

B Neural Networks Hyperparameter Ranges

In Table 1, we show the hyperparameter ranges we explore for the MLP architectures, while in Table 2, we show the hyperparameter ranges for CNNs.

Table 1. Hyperparameter search space for multilayer perceptron.

Full size table

Table 2. Hyperparameters search space for convolutional neural network (l = convolution layer index).

Full size table

In our experiments, training, validation, and test side-channel traces are normalized using z-score normalization:

$$\begin{aligned} t_{i} = \frac{t_{i} - \mu _{T_{train}}}{\sigma _{T_{train}}}, \end{aligned}$$

(1)

where $T_{train}$ is the set of training traces and $t_{i}$ is a single trace from training, validation, or test sets. Note that all traces are normalized according to the training set. For all configured neural networks, there are no batch normalization layers.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perin, G., Picek, S. (2021). On the Influence of Optimizers in Deep Learning-Based Side-Channel Analysis. In: Dunkelman, O., Jacobson, Jr., M.J., O'Flynn, C. (eds) Selected Areas in Cryptography. SAC 2020. Lecture Notes in Computer Science(), vol 12804. Springer, Cham. https://doi.org/10.1007/978-3-030-81652-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-030-81652-0_24
Published: 21 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81651-3
Online ISBN: 978-3-030-81652-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics