skip to main content
10.1145/3578178.3578193acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

Efficient Large Integer Multiplication with Arm SVE Instructions

Published: 27 February 2023 Publication History

Abstract

In this study, we implement large integer multiplication with the Arm Scalable Vector Extension (SVE) instructions. SVE is a single instruction, multiple data (SIMD) instruction set for the Arm AArch64 architecture. We use a reduced-radix representation technique because SIMD instructions do not retain the carry that occurs when partial products are added in large integer multiplication computations. Furthermore, we develop and implement a multiplication algorithm based on the Basecase method, which allows the application of ordinary multiplication instructions to special integers in reduced-radix representation. To evaluate performance, we compare our multiplication implementation on an A64FX processor with the GNU Multiple Precision Arithmetic Library (GMP). We show that processing with SVE was faster than GMP for multiplication with operands larger than 2,048 bits. The performance gain was up to 36%. These results suggest that SVE instructions have the potential to be faster than scalar instructions for large integer multiplication, especially for large operands.

References

[1]
Arm. 2020. ARM C Language Extensions for SVE. https://developer.arm.com/documentation/100987/latest
[2]
Arm. 2021. Introduction to SVE2. https://developer.arm.com/documentation/102340/0001/Introducing-SVE2
[3]
Arm. 2022. Arm Architecture Reference Manual. https://developer.arm.com/documentation/ddi0487/ha/
[4]
L Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R Clint Whaley, James Demmel, Jack Dongarra, Iain Duff, Sven Hammarling, Greg Henry, 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Software 28, 2 (2002), 135–151.
[5]
Bérenger Bramas. 2021. A fast vectorized sorting implementation based on the ARM scalable vector extension (SVE). PeerJ Computer Science 7(2021), e769.
[6]
Richard Brent and Paul Zimmermann. 2010. Modern Computer Arithmetic. Cambridge University Press.
[7]
Benjamin Buhrow, Barry Gilbert, and Clifton Haider. 2022. Parallel modular multiplication using 512-bit advanced vector instructions. Journal of Cryptographic Engineering 12, 1 (2022), 95–105.
[8]
Marco Cococcioni, Federico Rossi, Emanuele Ruffaldi, and Sergio Saponara. 2020. Fast deep neural networks for image processing using posits and ARM scalable vector extension. Journal of Real-Time Image Processing 17, 3 (2020), 759–771.
[9]
Takuya Edamatsu and Daisuke Takahashi. 2019. Accelerating Large Integer Multiplication Using Intel AVX-512IFMA. In International Conference on Algorithms and Architectures for Parallel Processing. Springer, 60–74.
[10]
Fujitsu. 2020. A64FX Microarchitecture Manual. https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.3.pdf
[11]
Torbjörn Granlund. 1996. GNU MP. The GNU Multiple Precision Arithmetic Library 2, 2 (1996).
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[13]
Intel. 2000. Using Streaming SIMD Extensions (SSE2) to Perform Big Multiplications, version 2.0. Technical Report AP-941248606-001 (2000).
[14]
Intel. 2021. Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors. https://www.intel.com/content/dam/www/central-libraries/us/en/documents/cryptography-processing-with-3rd-gen-intel-xeon-scalable-processors-19-may-2021.pdf
[15]
Intel. 2021. Deep Learning with Intel AVX-512 and Intel Deep Learning Boost Tuning Guide on 3rd Generation Intel Xeon Scalable Processors. https://www.intel.com/content/dam/develop/external/us/en/documents/Deep-Learning-with-Intel-AVX512-and-Intel-Deep-Learning-Boost-Tuning-Guide-on-3rd-Generation-Intel-Xeon-Scalable-Processors.pdf
[16]
Intel. 2022. Intel 64 and IA-32 Architectures Software Developer’s Manual. https://cdrdv2.intel.com/v1/dl/getContent/671200
[17]
Intel. 2022. Intel Intrinsics Guide. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
[18]
A. Karatsuba and Y. Ofman. 1963. Multiplication of Multidigit Numbers on Automata. Soviet Physics Doklady 7(1963), 595.
[19]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
[20]
Jinpil Lee, Francesco Petrogalli, Graham Hunter, and Mitsuhisa Sato. 2017. Extending OpenMP SIMD support for target specific code and application to ARM SVE. In International Workshop on OpenMP. Springer, 62–74.
[21]
Victor S Miller. 1985. Use of elliptic curves in cryptography. In Conference on the theory and application of cryptographic techniques. Springer, 417–426.
[22]
Nigel Stephens, Stuart Biles, Matthias Boettcher, Jacob Eapen, Mbou Eyole, Giacomo Gabrielli, Matt Horsnell, Grigorios Magklis, Alejandro Martinez, Nathanael Premillieu, 2017. The ARM scalable vector extension. IEEE MICRO 37, 2 (2017), 26–39.
[23]
Takuya Edamatsu and Daisuke Takahashi. 2018. Acceleration of Large Integer Multiplication with Intel AVX-512 Instructions. In 2018 IEEE 20th International Conference on High Performance Computing and Communications (HPCC). IEEE, 211–218.
[24]
Andrei L Toom. 1963. The Complexity of a Scheme of Functional Elements Realizing the Multiplication of Integers. Soviet Mathematics Doklady 3 (1963), 714–716.
[25]
Xiuwen Wan, Naijie Gu, and Junjie Su. 2021. Accelerating Level 2 BLAS Based on ARM SVE. In 2021 4th international conference on advanced electronic materials, computers and software engineering (AEMCSE). IEEE, 1018–1022.
[26]
Stephen Wolfram. 1991. Mathematica: a system for doing mathematics by computer. Addison Wesley Longman Publishing Co., Inc.
[27]
Toshio Yoshida. 2018. Fujitsu high performance CPU for the Post-K Computer. In Hot Chips, Vol. 30. 22.

Cited By

View all
  • (2024)Optimization of block-scaled integer GeMMs for efficient DNN deployment on scalable in-order vector processorsJournal of Systems Architecture10.1016/j.sysarc.2024.103236154(103236)Online publication date: Sep-2024
  • (2023)Efficient Additions and Montgomery Reductions of Large Integers for SIMD2023 IEEE 30th Symposium on Computer Arithmetic (ARITH)10.1109/ARITH58626.2023.00034(48-59)Online publication date: 4-Sep-2023

Index Terms

  1. Efficient Large Integer Multiplication with Arm SVE Instructions

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HPCAsia '23: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
    February 2023
    161 pages
    ISBN:9781450398053
    DOI:10.1145/3578178
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 February 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Arm
    2. SIMD
    3. SVE
    4. integer multiplication
    5. large integer arithmetic

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • JSPS KAKENHI

    Conference

    HPC ASIA 2023

    Acceptance Rates

    HPCAsia '23 Paper Acceptance Rate 15 of 34 submissions, 44%;
    Overall Acceptance Rate 69 of 143 submissions, 48%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)70
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Optimization of block-scaled integer GeMMs for efficient DNN deployment on scalable in-order vector processorsJournal of Systems Architecture10.1016/j.sysarc.2024.103236154(103236)Online publication date: Sep-2024
    • (2023)Efficient Additions and Montgomery Reductions of Large Integers for SIMD2023 IEEE 30th Symposium on Computer Arithmetic (ARITH)10.1109/ARITH58626.2023.00034(48-59)Online publication date: 4-Sep-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media