An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions

Takahashi, Daisuke

doi:10.1007/978-3-031-14788-3_18

Daisuke Takahashi ORCID: orcid.org/0000-0003-1357-5770¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13366))

Included in the following conference series:

International Workshop on Computer Algebra in Scientific Computing

801 Accesses

Abstract

In this paper, we propose an implementation of the parallel number-theoretic transform (NTT) using Intel Advanced Vector Extensions 512 (AVX-512) instructions. The butterfly operation of the NTT can be performed using modular addition, subtraction, and multiplication. We show that a method known as the six-step fast Fourier transform algorithm can be applied to the NTT. We vectorized NTT kernels using the Intel AVX-512 instructions and parallelized the six-step NTT using OpenMP. We successfully achieved a performance of over 83 giga-operations per second on an Intel Xeon Platinum 8368 (2.4 GHz, 38 cores) for a $2^{20}$-point NTT with a modulus of 51 bits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Parallel Vectorized Algorithms for Computing Trigonometric Sums Using AVX-512 Extensions

Experiments on Speeding Up the Recursive Fast Fourier Transform by Using AVX-512 SIMD Instructions

An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors

References

Bailey, D.H.: FFTs in external or hierarchical memory. J. Supercomput. 4, 23–35 (1990)
Article Google Scholar
Boemer, F., Kim, S., Seifu, G., de Souza, F.D.M., Gopal, V.: Intel HEXL: accelerating homomorphic encryption with Intel AVX512-IFMA52. In: Proceedings of 9th Workshop on Encrypted Computing & Applied Homomorphic Cryptography (WAHC 2021), pp. 57–62 (2021)
Google Scholar
Boemer, F., et al.: Intel HEXL. https://github.com/intel/hexl
Cochran, W.T., et al.: What is the fast Fourier transform? IEEE Trans. Audio Electroacoust. 15, 45–55 (1967)
Article Google Scholar
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
Article MathSciNet Google Scholar
Free Software Foundation Inc: GCC, the GNU Compiler Collection. https://gcc.gnu.org/
Harvey, D.: Faster arithmetic for number-theoretic transforms. J. Symb. Comput. 60, 113–119 (2014)
Article MathSciNet Google Scholar
Intel Corporation: Intel 64 and IA-32 architectures software developer’s manual, volume 1: Basic architecture. https://software.intel.com/content/dam/develop/public/us/en/documents/253665-sdm-vol-1.pdf (2020)
Intel Corporation: Intel C++ compiler 19.1 developer guide and reference (2020). https://software.intel.com/content/dam/develop/external/us/en/documents/19-1-cpp-compiler-devguide.pdf
Marr, D.T., et al.: Hyper-threading technology architecture and microarchitecture. Intel. Technol. J. 6, 1–11 (2002)
Google Scholar
Meng, L., Johnson, J.: Automatic parallel library generation for general-size modular FFT algorithms. In: Gerdt, V.P., Koepf, W., Mayr, E.W., Vorozhtsov, E.V. (eds.) CASC 2013. LNCS, vol. 8136, pp. 243–256. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-02297-0_21
Chapter Google Scholar
Meng, L., Johnson, J.R., Franchetti, F., Voronenko, Y., Maza, M.M., Xie, Y.: Spiral-generated modular FFT algorithms. In: Proceedings of 4th International Workshop on Parallel and Symbolic Computation (PASCO 2010), pp. 169–170 (2010)
Google Scholar
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985)
Article MathSciNet Google Scholar
Pollard, J.M.: The fast Fourier transform in a finite field. Math. Comput. 25, 365–374 (1971)
Article MathSciNet Google Scholar
Shoup, V.: NTL: a library for doing number theory. https://libntl.org
Swarztrauber, P.N.: FFT algorithms for vector computers. Parallel Comput. 1, 45–63 (1984)
Article Google Scholar
Takahashi, D.: An implementation of parallel 1-D real FFT on Intel Xeon phi processors. In: Gervasi, O., et al. (eds.) ICCSA 2017. LNCS, vol. 10404, pp. 401–410. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62392-4_29
Chapter Google Scholar
Takahashi, D.: Computation of the 100 quadrillionth hexadecimal digit of $\pi $ on a cluster of Intel Xeon phi processors. Parallel Comput. 75, 1–10 (2018)
Article MathSciNet Google Scholar
The Clang Team: clang: a C language family frontend for LLVM. https://clang.llvm.org/
Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia, PA (1992)
Book Google Scholar

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP19K11989.

Author information

Authors and Affiliations

Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
Daisuke Takahashi

Authors

Daisuke Takahashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daisuke Takahashi .

Editor information

Editors and Affiliations

Université de Lille, Villeneuve d'Ascq, France
François Boulier
Coventry University, Coventry, UK
Matthew England
Plekhanov Russian University of Economics, Moscow, Russia
Timur M. Sadykov
Institute of Theoretical and Applied Mechanics, Novosibirsk, Russia
Evgenii V. Vorozhtsov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Takahashi, D. (2022). An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions. In: Boulier, F., England, M., Sadykov, T.M., Vorozhtsov, E.V. (eds) Computer Algebra in Scientific Computing. CASC 2022. Lecture Notes in Computer Science, vol 13366. Springer, Cham. https://doi.org/10.1007/978-3-031-14788-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-14788-3_18
Published: 11 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14787-6
Online ISBN: 978-3-031-14788-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Vectorized Algorithms for Computing Trigonometric Sums Using AVX-512 Extensions

Experiments on Speeding Up the Recursive Fast Fourier Transform by Using AVX-512 SIMD Instructions

An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel Vectorized Algorithms for Computing Trigonometric Sums Using AVX-512 Extensions

Experiments on Speeding Up the Recursive Fast Fourier Transform by Using AVX-512 SIMD Instructions

An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation