Skip to main content

Fast and Efficient Hardware Implementation of HQC

  • Conference paper
  • First Online:
Selected Areas in Cryptography – SAC 2023 (SAC 2023)

Abstract

This work presents a hardware design for constant-time implementation of the HQC (Hamming Quasi-Cyclic) code-based key encapsulation mechanism. HQC has been selected for the fourth round of NIST’s Post-Quantum Cryptography standardization process and this work presents the first, hand-optimized design of HQC key generation, encapsulation, and decapsulation written in Verilog targeting implementation on FPGAs. The three modules further share a common SHAKE256 hash module to reduce area overhead. All the hardware modules are parametrizable at compile time so that designs for the different security levels can be easily generated. The design currently outperforms the other hardware designs for HQC, and many of the fourth-round Post-Quantum Cryptography standardization process, with one of the best time-area products as well. For the combined HighSpeed design targeting the lowest security level, we show that the HQC design can perform key generation in 0.09 ms, encapsulation in 0.13 ms, and decapsulation in 0.21 ms when synthesized for an Xilinx Artix 7 FPGA. Our work shows that when hardware performance is compared, HQC can be a competitive alternative candidate from the fourth round of the NIST PQC competition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We report both Slices and LUTs in our tables since slices can be often partially used based on the optimization strategy of the synthesis tool, which makes slice utilization not a complete indication of the density of the design.

  2. 2.

    https://pqc-hqc.org/implementation.html.

References

  1. Aguilar Melchor, C., et al.: HQC. Technical report, National Institute of Standards and Technology (2020). https://pqc-hqc.org/doc/hqc-specification_021-06-06.pdf

  2. Aguilar Melchor, C., et al.: HQC. Technical report, National Institute of Standards and Technology (2023). http://pqc-hqc.org/doc/hqc-specification_2023-04-30.pdf

  3. Aguilar-Melchor, C., et al.: Towards automating cryptographic hardware implementations: a case study of HQC. Cryptology ePrint Archive, Paper 2022/1425 (2022). https://eprint.iacr.org/2022/1425

  4. Aragon, N., et al.: BIKE. Technical report, National Institute of Standards and Technology (2020). https://csrc.nist.gov/projects/post-quantum-cryptography/round-3-submissions

  5. Azad, A.A., Shahed, I.: A compact and fast FPGA based implementation of encoding and decoding algorithm using Reed Solomon codes. Int. J. Future Comput. Commun. 31–35 (2014)

    Google Scholar 

  6. Barrett, P.: Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987). https://doi.org/10.1007/3-540-47721-7_24

    Chapter  Google Scholar 

  7. Bernstein, D.D.: Fast multiplication and its applications (2008)

    Google Scholar 

  8. Chen, P., et al.: Complete and improved FPGA implementation. https://doi.org/10.46586/tches.v2022.i3.71-113

  9. Dang, V.B., Mohajerani, K., Gaj, K.: High-speed hardware architectures and FPGA benchmarking of CRYSTALS-Kyber, NTRU, and saber. Cryptology ePrint Archive, Paper 2021/1508 (2021). https://eprint.iacr.org/2021/1508

  10. Deshpande, S., del Pozo, S.M., Mateu, V., Manzano, M., Aaraj, N., Szefer, J.: Modular inverse for integers using fast constant time GCD algorithm and its applications. In: Proceedings of the International Conference on Field Programmable Logic and Applications. FPL (2021)

    Google Scholar 

  11. Deshpande, S., Xu, C., Nawan, M., Nawaz, K., Szefer, J.: Fast and efficient hardware implementation of HQC. Cryptology ePrint Archive, Paper 2022/1183 (2022). https://eprint.iacr.org/2022/1183

  12. Gigliotti, P.: Implementing barrel shifters using multipliers. Technical report, XAPP195, Xilinx (2004). https://www.xilinx.com/support/documentation/application_notes/xapp195.pdf

  13. Guo, Q., Hlauschek, C., Johansson, T., Lahr, N., Nilsson, A., Schröder, R.L.: Don’t reject this: key-recovery timing attacks due to rejection-sampling in HQC and bike. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022(3), 223–263 (2022). https://doi.org/10.46586/tches.v2022.i3.223-263. https://tches.iacr.org/index.php/TCHES/article/view/9700

  14. Hashemipour-Nazari, M., Goossens, K., Balatsoukas-Stimming, A.: Hardware implementation of iterative projection-aggregation decoding of reed-muller codes. In: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2021, pp. 8293–8297 (2021). https://doi.org/10.1109/ICASSP39728.2021.9414655

  15. Hu, J., Wang, W., Cheung, R.C., Wang, H.: Optimized polynomial multiplier over commutative rings on FPGAS: a case study on bike. In: 2019 International Conference on Field-Programmable Technology (ICFPT), pp. 231–234 (2019). https://doi.org/10.1109/ICFPT47387.2019.00035

  16. Massolino, P.M.C., Longa, P., Renes, J., Batina, L.: A compact and scalable hardware/software co-design of SIKE. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020(2), 245–271 (2020). https://doi.org/10.13154/tches.v2020.i2.245-271. https://tches.iacr.org/index.php/TCHES/article/view/8551

  17. Richter-Brockmann, J., Chen, M.S., Ghosh, S., Güneysu, T.: Racing bike: improved polynomial multiplication and inversion in hardware. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022(1), 557–588 (2021). https://doi.org/10.46586/tches.v2022.i1.557-588. https://tches.iacr.org/index.php/TCHES/article/view/9307

  18. Richter-Brockmann, J., Mono, J., Guneysu, T.: Folding bike: scalable hardware implementation for reconfigurable devices. IEEE Trans. Comput. 71(5), 1204–1215 (2022). https://doi.org/10.1109/TC.2021.3078294

    Article  Google Scholar 

  19. Sandoval-Ruiz, C.: VHDL optimized model of a multiplier in finite fields. Ingenieria y Universidad 21(2), 195–212 (2017). https://doi.org/10.11144/Javeriana.iyu21-2.vhdl. https://revistas.javeriana.edu.co/index.php/iyu/article/view/195

  20. Schöffel, M., Feldmann, J., Wehn, N.: Code-based cryptography in IoT: a HW/SW co-design of HQC. CoRR abs/2301.04888 (2023). https://doi.org/10.48550/arXiv.2301.04888

  21. Scholl, S., Wehn, N.: Hardware implementation of a Reed-Solomon soft decoder based on information set decoding. In: 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1–6 (2014). https://doi.org/10.7873/DATE.2014.222

  22. Sendrier, N.: Secure sampling of constant-weight words - application to bike. Cryptology ePrint Archive, Paper 2021/1631 (2021). https://eprint.iacr.org/2021/1631

  23. Wang, W., Tian, S., Jungk, B., Bindel, N., Longa, P., Szefer, J.: Parameterized hardware accelerators for lattice-based cryptography and their application to the HW/SW co-design of qTESLA. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020(3), 269–306 (2020). https://doi.org/10.13154/tches.v2020.i3.269-306. https://tches.iacr.org/index.php/TCHES/article/view/8591

  24. Xing, Y., Li, S.: A compact hardware implementation of CCA-secure key exchange mechanism CRYSTALS-KYBER on FPGA. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021(2), 328–356 (2021). https://doi.org/10.46586/tches.v2021.i2.328-356. https://tches.iacr.org/index.php/TCHES/article/view/8797

Download references

Acknowledgement

We would like to thank the reviewers for the valuable feedback and Dr. Cuauhtemoc Mancillas López for constructive comments and shepherding our article. We would like to thank Dr. Victor Mateu and Dr. Carlos Aguilar Melchor for their helpful discussions. We would like to thank Dr. Shanquan Tian for his optimization recommendations for the SHAKE256 module. The work was supported in part by a research grant from Technology Innovation Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjay Deshpande .

Editor information

Editors and Affiliations

Appendices

Appendix 1.A Fast and Non-Biased (FNB) Fixed-Weight Vector Generation

Although the CWW design is constant in time, it does have a small bias. As an alternative, we propose a new FNB fixed-weight vector generation design which is based on fixed-weight vector generation technique given in [1]. Our FNB fixed-weight generation module can be parametrized to create design with an arbitrarily small probability of timing attack being possible. In our hardware module, have a parameter ACCEPTABLE_REJECTIONS, which can be used to specify how many indices could be rejected in either rejection sampling or in duplicated detection and still, the design will behave constant time. The parameter (ACCEPTABLE_REJECTIONS) can be set based on user’s target failure probability. If the actual failures are within the failure probability set by the selected parameter value, then the timing side channel given in [13] is not possible.

We use SHAKE256 module described in Sect. 2.1 to expand 320-bit seed to a \(24 \times w\)-bit string. Since the SHAKE256 module has 32-bit interface, the seed is loaded in 32-bit chunks, and the seed is stored in a BRAM. The 32-bit chunk from SHAKE256 is broken into 24-bit integer by preprocess unit and stored in the ctx_RAM then threshold check and reduction are performed. For the reduction, we use Barrett reduction [6]. Unlike the variable Barrett reduction discussed in Sect. 2.1.A, this specific Barrett reduction is optimized as we always reduce the inputs to a specific fixed value (n). After the reduction, the integer values (locations) are stored in a BRAM. Once the locations BRAM is filled, the duplicate detection module is triggered. The duplicate detection module helps detect potential duplicates values in the locations BRAM by traversing through all address locations and updating the value stored in a dual-ported BRAM. While the duplicate detection module checks for duplicates, the SHAKE256 module generates the next \(24 \times w\)-bit string to tackle any potential duplicates and stores them in another BRAM. This way, we can hide any clock cycles taken for seed expansion. Our hardware design uses a PRNG to generate the uniformly random bits required for the fixed weight vector generation from an input seed of length 320-bits. Our hardware design includes this PRNG in the form of SHAKE256 and assumes that the seed will be initialized by some other hardware module implementing a true random number generator. Our FNB fixed-weight generation module can be parametrized to create design with an arbitrarily small probability of timing attack being possible. In our hardware module, have a parameter name is ACCEPTABLE_REJECTIONS, which can be used to specify how many indices could be rejected, and still, the design will behave constant time (at the cost of extra area for more storage and extra cycles). The extra area is needed because we generate additional (based on parameter value) uniformly random bits in advance and store them in the a BRAM. The extra clock cycles are needed because even after we found the required number of indices that are under the threshold value, we still go over all the locations to maintain constant time behavior. For the duplicate detection logic inside the duplicate detection module, the control logic is programmed to take the same cycles in both cases of duplicate being detected or not. The parameter (ACCEPTABLE_REJECTIONS) can be set based on the user’s target failure probability. If the actual failures are within the failure probability set by the selected parameter value, then the timing side channel given in [13] is not possible.

Table 5 shows the comparison of our new FNB design to the CCW design. The area results shown in Table 5 exclude SHAKE256 module as the SHAKE256 is shared among all primitives. The reported frequency in Although the CWW algorithm ensures the constant time behavior in generating fixed-weight vectors, there is a small bias between the uniform distribution and the algorithm’s output. Meanwhile, for the new FNB algorithm, there is no bias. Further, FNB is faster than CWW, and the time-area product is better. These benefits come at the cost of extremely small probabilities that the design is not constant time, but only if it happens that there are more rejections than \(w_r\). Table 5 shows that the probability of non-constant time behavior for FNB can be \(2^{-200}\) or even smaller. To compute the failure probability (given in Table 5) for each parameter set, we take into account both threshold check failure and duplicate detection probabilities for the respective parameter sets.

Table 14. Comparison of the time and area of state-of-the-art hardware implementations of other (NIST PQC competition) round 4 KEM candidates.

Comparison to Hardware Designs for Other Round 4 Algorithms

We also provide Table 14 where we tabulate latest hardware implementations of all other post-quantum cryptographic algorithm hardware implementations from the fourth round of NIST’s standardization process, plus the to-be standardized Kyber algorithm. We focus on comparison of the hardware designs for lowest level of security, Level 1, as all publications give clear time and area numbers.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Deshpande, S., Xu, C., Nawan, M., Nawaz, K., Szefer, J. (2024). Fast and Efficient Hardware Implementation of HQC. In: Carlet, C., Mandal, K., Rijmen, V. (eds) Selected Areas in Cryptography – SAC 2023. SAC 2023. Lecture Notes in Computer Science, vol 14201. Springer, Cham. https://doi.org/10.1007/978-3-031-53368-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53368-6_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53367-9

  • Online ISBN: 978-3-031-53368-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics