Abstract
In this paper, we propose an implementation of the parallel number-theoretic transform (NTT) using Intel Advanced Vector Extensions 512 (AVX-512) instructions. The butterfly operation of the NTT can be performed using modular addition, subtraction, and multiplication. We show that a method known as the six-step fast Fourier transform algorithm can be applied to the NTT. We vectorized NTT kernels using the Intel AVX-512 instructions and parallelized the six-step NTT using OpenMP. We successfully achieved a performance of over 83 giga-operations per second on an Intel Xeon Platinum 8368 (2.4 GHz, 38 cores) for a \(2^{20}\)-point NTT with a modulus of 51 bits.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bailey, D.H.: FFTs in external or hierarchical memory. J. Supercomput. 4, 23–35 (1990)
Boemer, F., Kim, S., Seifu, G., de Souza, F.D.M., Gopal, V.: Intel HEXL: accelerating homomorphic encryption with Intel AVX512-IFMA52. In: Proceedings of 9th Workshop on Encrypted Computing & Applied Homomorphic Cryptography (WAHC 2021), pp. 57–62 (2021)
Boemer, F., et al.: Intel HEXL. https://github.com/intel/hexl
Cochran, W.T., et al.: What is the fast Fourier transform? IEEE Trans. Audio Electroacoust. 15, 45–55 (1967)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
Free Software Foundation Inc: GCC, the GNU Compiler Collection. https://gcc.gnu.org/
Harvey, D.: Faster arithmetic for number-theoretic transforms. J. Symb. Comput. 60, 113–119 (2014)
Intel Corporation: Intel 64 and IA-32 architectures software developer’s manual, volume 1: Basic architecture. https://software.intel.com/content/dam/develop/public/us/en/documents/253665-sdm-vol-1.pdf (2020)
Intel Corporation: Intel C++ compiler 19.1 developer guide and reference (2020). https://software.intel.com/content/dam/develop/external/us/en/documents/19-1-cpp-compiler-devguide.pdf
Marr, D.T., et al.: Hyper-threading technology architecture and microarchitecture. Intel. Technol. J. 6, 1–11 (2002)
Meng, L., Johnson, J.: Automatic parallel library generation for general-size modular FFT algorithms. In: Gerdt, V.P., Koepf, W., Mayr, E.W., Vorozhtsov, E.V. (eds.) CASC 2013. LNCS, vol. 8136, pp. 243–256. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-02297-0_21
Meng, L., Johnson, J.R., Franchetti, F., Voronenko, Y., Maza, M.M., Xie, Y.: Spiral-generated modular FFT algorithms. In: Proceedings of 4th International Workshop on Parallel and Symbolic Computation (PASCO 2010), pp. 169–170 (2010)
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985)
Pollard, J.M.: The fast Fourier transform in a finite field. Math. Comput. 25, 365–374 (1971)
Shoup, V.: NTL: a library for doing number theory. https://libntl.org
Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Comput. 1, 45–63 (1984)
Takahashi, D.: An implementation of parallel 1-D real FFT on Intel Xeon phi processors. In: Gervasi, O., et al. (eds.) ICCSA 2017. LNCS, vol. 10404, pp. 401–410. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62392-4_29
Takahashi, D.: Computation of the 100 quadrillionth hexadecimal digit of \(\pi \) on a cluster of Intel Xeon phi processors. Parallel Comput. 75, 1–10 (2018)
The Clang Team: clang: a C language family frontend for LLVM. https://clang.llvm.org/
Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia, PA (1992)
Acknowledgments
This work was supported by JSPS KAKENHI Grant Number JP19K11989.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Takahashi, D. (2022). An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions. In: Boulier, F., England, M., Sadykov, T.M., Vorozhtsov, E.V. (eds) Computer Algebra in Scientific Computing. CASC 2022. Lecture Notes in Computer Science, vol 13366. Springer, Cham. https://doi.org/10.1007/978-3-031-14788-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-14788-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14787-6
Online ISBN: 978-3-031-14788-3
eBook Packages: Computer ScienceComputer Science (R0)