Skip to main content

Breaking DPA-Protected Kyber via the Pair-Pointwise Multiplication

  • Conference paper
  • First Online:
Applied Cryptography and Network Security (ACNS 2024)

Abstract

We introduce a novel template attack for secret key recovery in Kyber, leveraging side-channel information from polynomial multiplication during decapsulation. Conceptually, our attack exploits that Kyber’s incomplete number-theoretic transform (NTT) causes each secret coefficient to be used multiple times, unlike when performing a complete NTT.

Our attack is a single trace known ciphertext attack that avoids machine-learning techniques and instead relies on correlation-matching only. Additionally, our template generation method is very simple and easy to replicate, and we describe different attack strategies, varying on the number of templates required. Moreover, our attack applies to both masked implementations as well as designs with multiplication shuffling.

We demonstrate its effectiveness by targeting a masked implementation from the mkm4 repository. We initially perform simulations in the noisy Hamming-Weight model and achieve high success rates with just \(13\,316\) templates while tolerating noise values up to \(\sigma =0.3\). In a practical setup, we measure power consumption and notice that our attack falls short of expectations. However, we introduce an extension inspired by known online template attacks, enabling us to recover 128 coefficient pairs from a single polynomial multiplication. Our results provide evidence that the incomplete NTT, which is used in Kyber-768 and similar schemes, introduces an additional side-channel weakness worth further exploration.

Author list in alphabetical order; see https://www.ams.org/profession/leaders/CultureStatement04.pdf. Date of this document: 2024-02-16.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For details the attacks and the experiments see Sect. 5.

  2. 2.

    Paper supplementary materials, the attack scripts in particular, are available at: https://github.com/crocs-muni/Attack_Kyber_ACNS2024.

  3. 3.

    Note, however, that we did not optimize our setup for the speed of acquisition.

  4. 4.

    Initial tests hint at a 30\(\%\) acquisition reduction for the OTA step with a single poly_basemul_acc experiment. However, we exclude this result from our estimates, reserving exploration of this optimization for future work.

  5. 5.

    They also attack a reference implementation, but we do not concentrate on that since this implementation leaks much more than pqm4 and the attacked by us mkm4. We are only looking at the long-term secret key and we do not consider the attacks on the encryption procedure.

References

  1. Github repository: Collection of post-quantum cryptographic algorithms for the arm cortex-m4 (2023). https://github.com/mupq/pqm4

  2. Github respository for masked Kyber presented in [18] (2022). https://github.com/masked-kyber-m4/mkm4

  3. Avanzi, R., et al.: CRYSTALS-Kyber (version 3.0) - submission to round 3 of the NIST post-quantum project. submission to the NIST post-quantum cryptography standardization project (2020). https://pq-crystals.org/kyber/data/kyber-specification-round3-20210804.pdf

  4. Backlund, L., Ngo, K., Gärtner, J., Dubrova, E.: Secret key recovery attacks on masked and shuffled implementations of CRYSTALS-Kyber and saber. Cryptology ePrint Archive, Paper 2022/1692 (2022). https://eprint.iacr.org/2022/1692

  5. Batina, L., Chmielewski, Ł., Papachristodoulou, L., Schwabe, P., Tunstall, M.: Online template attacks. In: Meier, W., Mukhopadhyay, D. (eds.) Progress in Cryptology - INDOCRYPT 2014–15th International Conference on Cryptology in India, New Delhi, India, 14–17 December 2014, Proceedings, vol. 8885 of Lecture Notes in Computer Science, pp. 21–36. Springer, Heidelberg (2014). https://doi.org/10.1007/s13389-017-0171-8

  6. Batina, L., Chmielewski, Ł, Papachristodoulou, L., Schwabe, P., Tunstall, M.: Online template attacks. J. Cryptogr. Eng. 9, 1–16 (2019)

    Article  Google Scholar 

  7. Van Beirendonck, M., D’Anvers, J.P., Karmakar, A., Balasch, J., Verbauwhede, I.: A side-channel-resistant implementation of SABER. ACM J. Emerg. Technol. Comput. Syst. 17(2), 10:1–10:26 (2021)

    Google Scholar 

  8. Bos, J.W., et al.: CRYSTALS - kyber: a CCA-secure module-lattice-based KEM. In: 2018 IEEE European Symposium on Security and Privacy, EuroS &P 2018, London, United Kingdom, 24–26 April 2018, pp. 353–367. IEEE (2018)

    Google Scholar 

  9. Bos, J.W., Friedberger, S., Martinoli, M., Oswald, E., Stam, M.: Assessing the feasibility of single trace power analysis of frodo. In: Cid, C., Jacobson Jr., M.J. (eds.) Selected Areas in Cryptography - SAC 2018–25th International Conference, Calgary, AB, Canada, 15–17 August 2018, Revised Selected Papers, vol. 11349 of Lecture Notes in Computer Science, pp. 216–234. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-030-10970-7_10

  10. Bos, J.W., Gourjon, M., Renes, J., Schneider, T., van Vredendaal, C.: Masking Kyber: first- and higher-order implementations. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021(4), 173–214 (2021)

    Article  Google Scholar 

  11. Aldaya, A.C., Brumley, B.B.: Online template attacks: revisited. IACR Trans. Cryptogr. Hardware Embed. Syst. 2021(3), 28–59 (2021). https://artifacts.iacr.org/tches/2021/a11

  12. Chari, S., Rao, J.R., Rohatgi, P.: Template attacks. In: Kaliski, B.S., Koç, Ç.K., Paar, C. (eds.) Cryptographic Hardware and Embedded Systems - CHES 2002, pp. 13–28. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36400-5_3

  13. Coron, J.S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Koç, Ç.K., Paar, C. (eds.) Cryptographic Hardware and Embedded Systems, pp. 292–302. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48059-5_25

  14. D’Anvers, J.P., Tiepelt, M., Vercauteren, F., Verbauwhede, I.: Timing attacks on error correcting codes in post-quantum schemes. In: Proceedings of ACM Workshop on Theory of Implementation Security Workshop, TIS 2019, pp. 2–9. Association for Computing Machinery, New York (2019)

    Google Scholar 

  15. Dubrova, E., Ngo, K., Gärtner, J., Wang, R.: Breaking a fifth-order masked implementation of crystals-kyber by copy-paste. In: Proceedings of the 10th ACM Asia Public-Key Cryptography Workshop, APKC 2023, pp. 10–20. Association for Computing Machinery, New York (2023)

    Google Scholar 

  16. Fujisaki, E., Okamoto, T.: Secure integration of asymmetric and symmetric encryption schemes. J. Cryptol. 26(1), 80–101 (2013)

    Article  MathSciNet  Google Scholar 

  17. Hamburg, M.: Chosen ciphertext k-trace attacks on masked CCA2 secure Kyber. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021(4), 88–113 (2021)

    Article  Google Scholar 

  18. Heinz, D., Kannwischer, M.J., Land, G., Pöppelmann, T., Schwabe, P., Sprenkels, A.: First-order masked Kyber on ARM Cortex-M4. Cryptology ePrint Archive, Paper 2022/058 (2022). https://eprint.iacr.org/2022/058

  19. Heinz, D., Pöppelmann, T.: Combined fault and DPA protection for lattice-based cryptography. IEEE Trans. Comput. 72(4), 1055–1066 (2023)

    Article  Google Scholar 

  20. Homma, N., Miyamoto, A., Aoki, T., Satoh, A., Shamir, A.: Collision-based power analysis of modular exponentiation using chosen-message pairs. In: Oswald, E., Rohatgi, P. (eds.) Cryptographic Hardware and Embedded Systems - CHES 2008, 10th International Workshop, Washington, D.C., USA, 10–13 August 2008. Proceedings, vol. 5154 of Lecture Notes in Computer Science, pp. 15–29. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85053-3_2

  21. Hutter, M., Kirschbaum, M., Plos, T., Schmidt, J.M., Mangard, S.: Exploiting the difference of side-channel leakages. In: Schindler, W., Huss, S.A., (eds.) Constructive Side-Channel Analysis and Secure Design - Third International Workshop, COSADE 2012, Darmstadt, Germany, 3–4 May 2012. Proceedings, vol. 7275 of Lecture Notes in Computer Science, pp. 1–16. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29912-4_1

  22. Ji, Y., Wang, R., Ngo, K., Dubrova, E., Backlund, L.: A side-channel attack on a hardware implementation of CRYSTALS-Kyber. Cryptology ePrint Archive, Paper 2022/1452 (2022). https://eprint.iacr.org/2022/1452

  23. Kannwischer, M.J.: Polynomial Multiplication for Post-Quantum Cryptography. PhD thesis, Nijmegen U. (2022)

    Google Scholar 

  24. Karatsuba, A., Ofman, Yu.: Multiplication of multidigit numbers on automata. Soviet Physics Doklady 7, 595 (1963)

    Google Scholar 

  25. Lyubashevsky, V., et al.: CRYSTALS-Dilithium (2020). https://csrc.nist.gov/projects/post-quantum-cryptography/round-3-submissions

  26. Marzougui, S., Kabin, I., Krämer, J., Aulbach, T., Seifert, J.P.: On the feasibility of single-trace attacks on the Gaussian sampler using a CDT. In: Kavun, E.B., Pehl, M. (eds.) Constructive Side-Channel Analysis and Secure Design - 14th International Workshop, COSADE 2023, Munich, Germany, 3–4 April 2023, Proceedings, vol. 13979 of Lecture Notes in Computer Science, pp. 149–169. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-29497-6_8

  27. Mujdei, C., Wouters, L., Karmakar, A., Beckers, A., Mera, J.M.B., Verbauwhede, I.: Side-channel analysis of lattice-based post-quantum cryptography: Exploiting polynomial multiplication. ACM Trans. Embed. Comput. Syst. (2022)

    Google Scholar 

  28. Ngo, K., Wang, R., Dubrova, E., Paulsrud, N.: Side-channel attacks on lattice-based kems are not prevented by higher-order masking. Cryptology ePrint Archive, Paper 2022/919 (2022). https://eprint.iacr.org/2022/919

  29. Oder, T., Schneider, T., Pöppelmann, T., Güneysu, T.: Practical CCA2-secure and masked Ring-LWE implementation. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(1), 142–174 (2018)

    Article  Google Scholar 

  30. O’Flynn, C., Chen, Z.D.: ChipWhisperer: an open-source platform for hardware embedded security research. In: Prouff, E. (ed.) Constructive Side-Channel Analysis and Secure Design - 5th International Workshop, COSADE 2014, Paris, France, 13–15 April 2014. Revised Selected Papers, vol. 8622 of Lecture Notes in Computer Science, pp. 243–260. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-319-10175-0_17

  31. Pessl, P., Primas, R.: More practical single-trace attacks on the number theoretic transform. In: Schwabe, P., Thériault, N. (eds.) Progress in Cryptology - LATINCRYPT 2019–6th International Conference on Cryptology and Information Security in Latin America, Santiago de Chile, Chile, 2–4 October 2019, Proceedings, vol. 11774 of Lecture Notes in Computer Science, pp. 130–149. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-030-30530-7_7

  32. Primas, R., Pessl, P., Mangard, S.: Single-trace side-channel attacks on masked lattice-based encryption. In: Fischer, W., Homma, N. (eds.) Cryptographic Hardware and Embedded Systems - CHES 2017–19th International Conference, Taipei, Taiwan, 25–28 September 2017, Proceedings, vol. 10529 of Lecture Notes in Computer Science, pp. 513–533. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-319-66787-4_25

  33. Ravi, P., Roy, S.S., Chattopadhyay, A., Bhasin, S.: Generic side-channel attacks on CCA-secure lattice-based PKE and kems. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020(3), 307–335 (2020)

    Google Scholar 

  34. Reparaz, O., de Clercq, R., Roy, S.S., Vercauteren, F., Verbauwhede, I.: Additively homomorphic ring-LWE masking. In: Takagi, T. (ed.) Post-Quantum Cryptography, pp. 233–244. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29360-8_15

  35. Reparaz, O., Roy, S.S., de Clercq, R., Vercauteren, F., Verbauwhede, I.: Masking ring-LWE. J. Cryptogr. Eng. 6(2), 139–153 (2016)

    Article  Google Scholar 

  36. Seiler, G.: Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. IACR Cryptol. ePrint Arch., 39 (2018)

    Google Scholar 

  37. Wang, J., Cao, W., Chen, H., Li, H.: Practical side-channel attack on masked message encoding in latticed-based kem. Cryptology ePrint Archive, Paper 2022/859 (2022). https://eprint.iacr.org/2022/859

  38. Wang, R., Brisfors, M., Dubrova, E.: A side-channel attack on a bitsliced higher-order masked crystals-kyber implementation. IACR Cryptol. ePrint Arch. 1042 (2023)

    Google Scholar 

  39. Xing, Y., Li, S.: A compact hardware implementation of CCA-secure key exchange mechanism CRYSTALS-Kyber on FPGA. IACR Trans. Cryptogr. Hardware Embed. Syst. 2021(2), 328–356 (2021)

    Article  Google Scholar 

  40. Yang, B., Ravi, P., Zhang, F., Shen, A., Bhasin, S.: STAMP-single trace attack on M-LWE pointwise multiplication in Kyber. Cryptology ePrint Archive, Paper 2023/1184 (2023). https://eprint.iacr.org/2023/1184

  41. Zijlstra, T., Bigou, K., Tisserand, A.: FPGA implementation and comparison of protections against SCAs for RLWE. In: Hao, F., Ruj, S., Gupta, S.S. (eds.) Progress in Cryptology - INDOCRYPT 2019, pp. 535–555. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35423-7_27

Download references

Acknowledgements

E. A. Bock conducted part of this research while at Aalto University. His work at Aalto and the work of K. Puniamurthy were supported by MATINE, Ministry of Defence of Finland. The work of Ł. Chmielewski and M. Šorf was supported by the Ai-SecTools (VJ02010010) project. Computational resources were provided by the e-INFRA CZ project (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Łukasz Chmielewski .

Editor information

Editors and Affiliations

Appendices

A Kyber Algorithms

Algorithms 3 and 4 describe the encryption and encapsulation functions in Kyber. The functions \(\textsc {Compress} \) and \(\textsc {Decompress} \) are defined as \(\textsc {Compress} (u) := \lfloor u \cdot 2^d / q \rceil \ \textrm{mod}\ (2)^d\) and \(\textsc {Decompress}:= \lfloor q/2^d \cdot u \rceil \), with \(d=10\) if \(k=2\) or 3 and \(d=11\) if \(k=4\). Note that the output of the encryption corresponds to a ciphertext c, which consists of two compressed ciphertexts. This ciphertext c will be the input to the decapsulation algorithm.

figure d
figure e

B Montgomery Reduction

Kyber represents elements in Montgomery representation in order to avoid expensive division by q and computation mod q and replace it by division by \(2^{16}\) (taking the top half of a register) and computation mod \(2^{16}\) (taking the bottom half of a register). In the following, we present the Montgomery reduction with general R and q, but Kyber indeed uses \(R=2^{16}\).Consider \(R = 2^k > q\), and an element \(a < qR\). To reduce the memory footprint, we can store a/R and this reduces the element a by k bits, and it can be efficiently implemented. In the Montgomery domain, the idea is to make sure that the element a is a multiple of R by introducing a correction step. More precisely, imagine that we want to find a value t, such that \(a-tq\) is divisible by R. To bring the element to the Montgomery domain, one computes t as \(aq^{-1} \pmod {R}\) in a way that \(a-aq^{-1}q \pmod {R} = 0\). Following closely Sect. 2.3.2 in [23], Algorithm 6 shows the case of signed Montgomery reduction from [36].

figure f
figure g

We now provide more details on how we determined the length of values for the Hamming weight that we use in our numerical estimates in Sect. 4.2:

  1. 1.

    \(a_1 \cdot b_1\)                          \(12 + 12 = 24\text { bits}\)

    take bottom of register                          \(16 \text { bits}\)

    then multiply by \(q_{inv}\)                          \(\left|q_\text {inv} \right| = 12 \text { bits}\)

  2. 2.

    \(({a_1 \cdot b_1})_B \cdot q_\text {inv}\)                          \(16 + 12 = 28 \text { bits}\)

    take bottom of register                          16 bits

    then multiply by q                          \(\left|q \right|= 12 \text { bits}\)

  3. 3.

    \({(({a_1 \cdot b_1})_B \cdot q_\text {inv})}_B \cdot q\)                          \(16 + 12 = 28 \text { bits}\)

    add \((a_1 \cdot b_1)\)                          \(\left|a_1 \cdot b_1 \right| = 24 \text { bits}\)

  4. 4.

    \({(({a_1 \cdot b_1})_B \cdot q_\text {inv})}_B + (a_1 \cdot b_1)\)                          \(\text {max}\{24, 48\} = 28 \text { bits}\)

    take top of register and call it c                          \(\left|c \right| = 12 \text { bits}\)

  5. 5.

    \(c \cdot \zeta \)                          \(12 + 12 = 28\text { bits}\)

C Details on Noiseless and Noisy Simulations

We now discuss our simulations for noiseless operations within the pair-point multiplications comprehensively and additionally explain how we calculated probabilities in our noisy simulations. We first focus on the first 5 instructions of the pair-point multiplication, cf. Section 4.2. Our simulations calculate which coefficients \(a_{2i+1} \in [0,\ldots ,q-1 ]\) have unique combinations of hamming weight values (hamming weight tuples) during these instructions. Recall from Eq. 3 that pair-point multiplication also computes the term \(a_1b_1\zeta \), where the value of \(\zeta \) changes for each pair-point multiplication. So for our simulations, we initially fix \(\zeta _0\) and try out all possible values for \(a_1\) and all possible values \(b_1\). We obtain the average probability that a value for \(a_1\) leads to a unique hamming weight tuple. Then, we change to \(\zeta _1\) and iterate over all possible values for \(a_3\) and all possible values for \(b_3\). We continue this process, obtaining the averages for all \(a_{2i+1}\), given all \(\zeta _i\). We thus obtain probabilities for extracting each odd coefficient, given a random ciphertext. Observe that in our simulations we do not consider micro-architectural aspects, like instruction pipelining, of our target.

As we show, most of the values for an odd coefficient indeed lead to unique hamming weight tuples. Only a small fraction of coefficients have collisions. On average, 3031 of these values have unique hamming weight tuples, i.e. there exist 3031 hamming weight tuples which map to exactly one coefficient value. 259 coefficients lead to 2-way collisions. This means that there exist \(259/2 \approx 130\) hamming weight tuples which map to exactly two different coefficient values. Subsequently, there exist 34 coefficients which have 3-way collisions and 4 coefficients which have 4-way collisions each. On the average only a 0.03125 fraction of tuples maps to more than 4 different coefficient values. We now provide further details about the results of our simulations.

Extracting Odd Coefficients (\(a_{2i+1}\)). Our simulations show that for a uniformly random \(b_{2i+1}\), the probability of extracting \(a_{2i+1}\) from the first 5 instruction is \(\approx 0.90\). This means that given a random ciphertext, we have good chances of extracting each odd coefficient. The probability of obtaining two possible candidates for each odd coefficient is \(\approx 0.085\), and the probability of obtaining three possible candidates for each odd coefficient is \(\approx 0.011\). Thus, taking a union bound, we obtain that the probability that a given \(a_{2i+1}\) has either a unique hamming weight tuple, or a 2- or 3-way collision is \(\approx 0.996\). For this reason in the rest of this analysis we only consider the case that we are dealing with coefficients with unique hamming weight tuples, or with 2- or 3-way collisions.

In the table under Number of Matches (1), we see the probability that each odd coefficient \(a_1, a_3, ..., a_{255}\) has a unique hamming weight tuple. We calculate this probability over all \(b_1 \in [1,\ldots ,q-1]\), and note that the probability is dependent on the value of \(\zeta \). Thus, the probability that \(a_1\) has a unique hamming weight tuple is different from that of \(a_3,a_5,\) etc., but the probability is always between 0.801 and 0.937, with an average of 0.90. Under Number of Matches (2) and (3), we see the analogous probabilities that each odd coefficient \(a_{2i+1}\) has a hamming weight tuple with a 2- and 3-way collision correspondingly.

We recall that in our attack using \(q+q\) templates (cf Subsect. 3.1), we use the first set of q templates for extracting the odd coefficients. According to our results, we should have a 90% chance of correctly extracting each odd coefficient - but we should recall that in Kyber, the secret keys consist of polynomials of degree 255. Thus, the probability of extracting all odd coefficients correctly is notably smaller. In fact, if we consider all probabilities of Fig. 5 for the chances that each odd coefficient has a unique hamming weight tuple, we obtain a probability of \(\varPi _{i=0}^{127} p_i\approx 1.2967 \times 10^{-6}\) of extracting all odd coefficients from one polynomial, given only q templates. We will explain later in this section how we can use the results of our simulations to outline an attack strategy that easily increases our success probabilities, with just a linear increase in the number of templates needed.

Extracting Coefficient Pairs \((a_{2i},a_{2i+1})\). The lower part of Fig. 5 gives the probabilities that each secret coefficient pair leads to a unique hamming weight tuple. We obtain these probabilities in an analogous way as for the odd coefficients. Thus, the probabilities for each pair \((a_0,a_1),(a_2,a_3),(a_4,a_5)\), \(\ldots ,(a_{254},a_{255})\) are different as they are dependent on \(\zeta \). Note that in this case, the hamming weight tuples consist of more values since we are considering all instructions within one pair-point multiplication. Hence, the very high probabilities under Number of Matches (1). We can conclude from these results that if we create templates for all possible pairs of secret coefficients, our success probabilities are fairly high, while, on the other hand, it also requires creating a total of \(q^2\) templates.

Fig. 5.
figure 5

Number of Matches: given \(\zeta _i\), probability of a 1-, 2- or 3-way collision. Upper part: the probability of extracting odd coefficients with q templates. Lower part: probability of extracting pairs of coefficients with \(q^2\) templates.

Efficiency Optimizations. While \(q^2\) is a reasonable number of template traces, collecting all of them is still quite consuming. Thus, we may indeed try extracting all odd coefficients first and then extracting all even coefficients with an additional set of templates. From the discussions above, we can conclude that our success probabilities of running a \(q+q\) attack are not as high as we would originally hope (for the mkm4 implementation in the Hamming weight model). However, the simulation results suggest a natural and very simple way of optimizing the success of the attack. In the following, we outline an attack adaptation that increases the success probability of our attack and only requires a linear increase in the number of templates.

First, we can perform a template matching using q templates (as originally done in Subsect. 3.1). For each coefficient we are trying to extract, we rank the top 3 candidate values for which we get the best matches. Now, we build templates for extracting the even coefficients. We will create 3 versions of these templates. In each version, we use a different top 3 candidate for each odd coefficient, creating an additional set of 3q templates. Thus, we first determine the top three candidates for each \(a_{2i+1}\) (with high probability) and then try all three of them in combination with all possible \(a_{2i}\), leading to an overall number of \(q+3q\) templates. When trying to extract the even coefficients, we get a very high success rate iff we are using the correct odd coefficient \(a_{2i+1}\). Namely, as we see in Fig. 5, each secret coefficient pair has a very high probability of having a unique hamming weight tuple.

We can even optimize our attack further by considering the top 4 match candidates for each coefficient, generating an additional set of 4q templates. Concretely for the optimized attacks using \(q+3q\) and \(q+4q\) templates, we obtain success probabilities of \(\varPi _{i=0}^{127} p_i\approx 0.6755 \;\; \text {and} \;\; \varPi _{i=0}^{127} p_i\approx 0.875\), respectively. With \(6q=19974\) templates, we have a very high success probability of 0.944, given a single target trace and a random ciphertext. Subsequently, we can use our analysis of the coefficients to determine the (expected) \(\approx 0.875\) fraction of coefficients that are unique, given our list of coefficients that have a unique Hamming weight pattern. For the remaining \(\approx 0.125\) coefficients, brute-forcing over \(4^{0.125\cdot 128}=2^{32}\) coefficients is feasible (Table 1).

Table 1. Simulation results for noisy traces.
Fig. 6.
figure 6

Noisy \(q+q\) attack simulations.

Fig. 7.
figure 7

Noisy \(q^2\) attack simulations.

Noise. We now add Gaussian noise with standard deviation \(\sigma \) to the target trace and see for which \(\sigma \) we can still extract one or both coefficients. Instead of searching for perfect matchings, we minimize the \(L_2\)-norm of the differences between the simulated target trace and the template. Unfortunately, even for the \(q^2\) attack, the best match under the \(L_2\) norms provides the correct \((a_{2i},a_{2i+1})\) value with probability \(\le 0.5\) when \(\sigma \ge 0.8\). All probabilities are calculated via 10,000 samples and using a random root out of all possible 128 roots.

D Comparison

To the best of our knowledge, there exist two other works in the literature that target polynomial multiplication in Kyber. In [27], the authors present a CPA attack on an unprotected polynomial multiplication implementation of Kyber. This attack led to the extraction of the long-term secret using approximately 200 traces. The main difference in comparison to our work is that the attack [27] requires multiple target traces and thus is not successful in the presence of a masking countermeasure. Our attack, on the other hand, requires a single target trace and, therefore, can successfully target masked implementations. The drawback of our approach is that we consider an adversary who can build template traces using a profiling device on which the secret can be freely changed. A classic CPA attack, as presented in [27], does not require any such profiling.

Another related work [40] presents a single-trace template attack on the polynomial multiplication of an unmasked implementation pqm4 [1] during key generationFootnote 5. There are several differences between this work and ours. First, note that they did not attack any masked implementation, but only argue about the attack’s applicability to masking schemes since it attacks single traces. The attack is performed against a non-optimized implementation, utilizing straightforward polynomial multiplication without Karatsuba, leading to each secret coefficient being loaded twice, while our attack is on the mkm4 masked implementation, which accesses the secret only once. Second, the attack [40] cannot be replicated on decapsulation since their template requires the leakage from the multiplication of k different polynomial values in the matrix A — which happens in the key generation. On the other hand, our attack can be applied to the key generation by utilizing the public polynomial values in A. Finally, their attack does not recover the full secret, but employs an extra key enumeration to finish the attack; as a result, their attack works for Kyber768 and Kyber1024, but not for Kyber512. Precise performance comparison is challenging due to uncertainties about the number of required templates in [40]. The authors mention using 500 traces to build templates for each intermediate, with approximately 14 attacked intermediates in each multiplication. This means that their attack would require only \(7\,000\) templates if one template can be created for all pairwise multiplications or \(896\,000\) if each multiplication needs to be templated separately. Consequently, it seems that the attack [40] requires fewer template traces for profiling than our approach, albeit with increased complexity and a lower success rate, necessitating final key enumeration.

Comparing our approach with [40] is intricate due to the mentioned differences. Foremost, [40] attacks key generation of the unprotected implementation, which involves a broader range of secret-dependent operations than our target. Therefore, we cannot estimate how well the attack from [40] would work against protected implementation like mkm4. In summary, the attack in [40] has advantages as it exploits various leaks and capitalizes on them. However, it is not easy to adapt to other procedures, such as the technique presented in this paper. Thus, this makes our attack more generic than the one presented in [40].

Table 2. Comparison of attacks on the long-term secret key from the polynomial multiplications; the analysis is made for Kyber768 unless stated otherwise.

In Table 2, we give a summary of the comparison with [27] and [40]. From our work, we present the two versions, i.e., “Simulation” refers to the numbers of the original introduction of our attack described in Sect. 3 and concerning the results obtained via simulations in Sect. 4. The “Experiment” work refers to the real-world attack from  Sect. 5, where 78M traces give a 43% success of extracting the secret key, while 105M traces give over 90% success rate.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bock, E.A., Banegas, G., Brzuska, C., Chmielewski, Ł., Puniamurthy, K., Šorf, M. (2024). Breaking DPA-Protected Kyber via the Pair-Pointwise Multiplication. In: Pöpper, C., Batina, L. (eds) Applied Cryptography and Network Security. ACNS 2024. Lecture Notes in Computer Science, vol 14584. Springer, Cham. https://doi.org/10.1007/978-3-031-54773-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54773-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54772-0

  • Online ISBN: 978-3-031-54773-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics