Hardware Acceleration of Matrix Multiplication over Small Prime Finite Fields

Fleming, Shane T.; Thomas, David B.

doi:10.1007/978-3-642-36812-7_10

Shane T. Fleming¹⁹ &
David B. Thomas¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7806))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

1618 Accesses

Abstract

Dense matrix-matrix multiplication over small finite fields is a common operation in many application domains, such as cryptography, random numbers, and error correcting codes. This paper shows that FPGAs have the potential to greatly accelerate this time consuming operation, and in particular that systolic array based approaches are both practical and efficient when using large modern devices. A number of finite-field specific architectural optimisations are introduced, allowing n×n matrices to be processed in O(n) cycles, for matrix sizes up to n = 350. Comparison with optimised software implementations on a single-core CPU shows that an FPGA accelerator can achieve between 80x and 700x speed-up over a Virtex-7 XC7V200T for GF(2^k), but for GF(3) and larger finite fields can provide practical speed-ups of 1000x or more.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

FPGA Implementation of Systolic Array Architecture for Matrix Multiplication

Exploring the Advantages and Challenges of Fermat NTT in FHE Acceleration

Reproducible and Accurate Matrix Multiplication

References

Albrecht, M.: Algorithmic Algebraic Techniques and their Application to Block Cipher Cryptanalysis. PhD thesis, Royal Holloway, University of London (2010)
Google Scholar
Thomas, D., Luk, W.: Fpga-optimised uniform random number generators using luts and shift registers. In: 2010 International Conference on Field Programmable Logic and Applications (FPL), pp. 77–82. IEEE (2010)
Google Scholar
Zhuo, L., Prasanna, V.: Scalable and modular algorithms for floating-point matrix multiplication on fpgas. In: Proceedings of the18th International Parallel and Distributed Processing Symposium, p. 92. IEEE (2004)
Google Scholar
Bensaali, F., Amira, A., Sotudeh, R.: Floating-point matrix product on fpga. In: IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2007, pp. 466–473. IEEE (2007)
Google Scholar
Shoup, V.: A computational introduction to number theory and algebra. Cambridge University Press (2008)
Google Scholar
Shoup, V.: Ntl: A library for doing number theory (2001)
Google Scholar
Strassen, V.: Gaussian elimination is not optimal. Numerische Mathematik 13(4), 354–356 (1969)
Article MATH MathSciNet Google Scholar
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. Journal of symbolic computation 9(3), 251–280 (1990)
Article MATH MathSciNet Google Scholar
Kung, H., Leiserson, C.E.: Systolic arrays (for vlsi). Society for Industrial & Applied, 256 (1979)
Google Scholar
Dumas, J., Gautier, T., Giesbrecht, M., Giorgi, P., Hovinen, B., Kaltofen, E., Saunders, B., Turner, W., Villard, G., et al.: Linbox: A generic library for exact linear algebra. In: Proceedings of the 2002 International Congress of Mathematical Software, pp. 40–50. World Scientific Pub., Beijing (2002)
Google Scholar
Albrecht, M., Bard, G.: M4ri–linear algebra over gf (2) (2008)
Google Scholar
Dorsey, P.: Xilinx stacked silicon interconnect technology delivers breakthrough fpga capacity, bandwidth, and power efficiency. Xilinx White Paper: Virtex-7 FPGAs, 1–10 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Imperial College London, London, United Kingdom
Shane T. Fleming & David B. Thomas

Authors

Shane T. Fleming
View author publications
You can also search for this author in PubMed Google Scholar
David B. Thomas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of California, 92521, Riverside, CA, USA
Philip Brisk
Department of Computing, Imperial College, SW7 2AZ, London, UK
José Gabriel de Figueiredo Coutinho
Departamento de Engenharia Informática, Instituto Superior Técnico/INESC-ID, Av. Prof. Dr. Cavaco Silva, 2780-990, Porto Salvo, Portugal
Pedro C. Diniz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fleming, S.T., Thomas, D.B. (2013). Hardware Acceleration of Matrix Multiplication over Small Prime Finite Fields. In: Brisk, P., de Figueiredo Coutinho, J.G., Diniz, P.C. (eds) Reconfigurable Computing: Architectures, Tools and Applications. ARC 2013. Lecture Notes in Computer Science, vol 7806. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36812-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-36812-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36811-0
Online ISBN: 978-3-642-36812-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics