Abstract
In this paper, we propose a fast implementation of multiple Montgomery multiplications using Intel AVX-512IFMA (Integer Fused Multiply-Add) instructions. The proposed implementation is based on a modified Montgomery multiplication. For Montgomery multiplication operands with 52 bits or fewer, the proposed implementation using Intel AVX-512IFMA instructions is up to approximately 12.22 and 4.30 times faster than the implementations using Intel 64 and Intel AVX-512F (Foundation) instructions on an Intel Core i3-8121U processor, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985)
Intel Corporation: Intel 64 and IA-32 architectures software developer’s manual, volume 1: Basic architecture (2019). https://software.intel.com/sites/default/files/managed/a4/60/253665-sdm-vol-1.pdf
Gueron, S., Krasnov, V.: Software implementation of modular exponentiation, using advanced vector instructions architectures. In: Özbudak, F., Rodríguez-Henríquez, F. (eds.) WAIFI 2012. LNCS, vol. 7369, pp. 119–135. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31662-3_9
Bos, J.W., Montgomery, P.L., Shumow, D., Zaverucha, G.M.: Montgomery multiplication using vector instructions. In: Lange, T., Lauter, K., Lisoněk, P. (eds.) SAC 2013. LNCS, vol. 8282, pp. 471–489. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43414-7_24
Drucker, N., Gueron, S.: Fast modular squaring with AVX512IFMA. In: Latifi, S. (ed.) 16th International Conference on Information Technology-New Generations (ITNG 2019). AISC, vol. 800, pp. 3–8. Springer, Cham (2019)
Page, D., Smart, N.P.: Parallel cryptographic arithmetic using a redundant Montgomery representation. IEEE Trans. Comput. 53, 1474–1482 (2004)
Bos, J.W.: High-performance modular multiplication on the Cell processor. In: Hasan, M.A., Helleseth, T. (eds.) WAIFI 2010. LNCS, vol. 6087, pp. 7–24. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13797-6_2
Meng, L., Johnson, J.R., Franchetti, F., et al.: Spiral-generated modular FFT algorithms. In: Proceedings of 4th International Workshop on Parallel and Symbolic Computation (PASCO 2010), pp. 169–170 (2010)
Takahashi, D.: Computation of the 100 quadrillionth hexadecimal digit of \(\pi \) on a cluster of Intel Xeon Phi processors. Parallel Comput. 75, 1–10 (2018)
Intel Corporation: Intel C++ compiler 19.0 developer guide and reference (2019). https://software.intel.com/sites/default/files/cpp_dev_guide_190_u5_1.pdf
Acknowledgments
This research was partially supported by JSPS KAKENHI Grant Number JP19K11989.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Takahashi, D. (2020). Fast Multiple Montgomery Multiplications Using Intel AVX-512IFMA Instructions. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12253. Springer, Cham. https://doi.org/10.1007/978-3-030-58814-4_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-58814-4_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58813-7
Online ISBN: 978-3-030-58814-4
eBook Packages: Computer ScienceComputer Science (R0)