GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal

Lei, Jie; Martínez, Héctor; Flich, José; Quintana-Ortí, Enrique S.

doi:10.1007/978-3-031-40843-4_44

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13999))

Included in the following conference series:

International Conference on High Performance Computing

1957 Accesses
1 Citations
4 Altmetric

Abstract

We revisit a blocked formulation of the direct convolution algorithm that mimics modern realizations of the general matrix multiplication (gemm), demonstrating that the same approach can be adapted to deliver high performance for deep learning inference tasks on the AI Engine (AIE) tile embedded in Xilinx Versal platforms. Our experimental results on a Xilinx Versal VCK190 shows an arithmetic throughput close to 70% of the theoretical peak of the AIE tile for 8-bit integer operands and the convolutional layers arising in ResNet-50 v.15+ImageNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Parallel GEMM-based convolution for deep learning on multicore RISC-V processors

Article Open access 19 February 2024

A Scalable FPGA Accelerator for Convolutional Neural Networks

An FPGA-Based Solution for Convolution Operation Acceleration

References

Ahmad, S., et al.: Xilinx first 7 nm device: versal AI core (VC1902). In: 2019 IEEE Hot Chips 31 Symposium (HCS), pp. 1–28 (2019)
Google Scholar
Barrachina, S., et al.: Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors. J. Parallel Distrib. Comput. 167, 240–254 (2022)
Article Google Scholar
Barrachina, S., et al.: Reformulating the direct convolution for high-performance deep learning inference on ARM processors. J. Syst. Arch. 135, 102806 (2023)
Article Google Scholar
Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. ACM Comput. Surv. 52(4), 65:1–65:43 (2019)
Google Scholar
Castelló, A., Quintana-Ortí, E.S., Igual, F.D.: Anatomy of the BLIS family of algorithms for matrix multiplication. In: 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 92–99 (2022)
Google Scholar
Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural networks for document processing. In: International Workshop on Frontiers in Handwriting Recognition (2006). https://hal.inria.fr/inria-00112631
Dolz, M.F., et al.: Efficient and portable Winograd convolutions for multi-core processors. J. Supercomput. (2023, to appear)
Google Scholar
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)
Article MathSciNet MATH Google Scholar
Goto, K., van de Geijn, R.A.: Anatomy of a high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008)
Google Scholar
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
Google Scholar
Low, T.M., et al.: Analytical modeling is enough for high-performance BLIS. ACM Trans. Math. Softw. 43(2), 12:1–12:18 (2016)
Google Scholar
Najafabadi, M.M., et al.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1 (2015)
Article Google Scholar
Sze, V., et al.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Article Google Scholar
Van Zee, F.G., van de Geijn, R.A.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. 41(3), 14:1–14:33 (2015)
Google Scholar
Xilinx: AI Engine tools and flows user guide (UG1079) (2022). https://docs.xilinx.com/r/en-US/ug1079-ai-engine-kernel-coding/Tools
Zhang, J., Franchetti, F., Low, T.M.: High performance zero-memory overhead direct convolutions. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80 (2018)
Google Scholar
Zhao, Y., et al.: A faster algorithm for reducing the computational complexity of convolutional neural networks. Algorithms 11(10), 159 (2018)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme, and Spain, Germany, France, Italy, Poland, Switzerland, Norway.

The authors acknowledge funding from European Union’s Horizon2020 Research and Innovation programme under the Marie Skłodowska Curie Grant Agreement No. 956090 (APROPOS).

This work was also supported by the research project PID2020-113656RB-C22 of MCIN/AEI/10.13039/501100011033. H. Martínez is a postdoc fellow supported by the Junta de Andalucía.

Author information

Authors and Affiliations

Universitat Politècnica de València, Valencia, Spain
Jie Lei, José Flich & Enrique S. Quintana-Ortí
Universidad de Córdoba, Córdoba, Spain
Héctor Martínez

Authors

Jie Lei
View author publications
You can also search for this author in PubMed Google Scholar
Héctor Martínez
View author publications
You can also search for this author in PubMed Google Scholar
José Flich
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Lei .

Editor information

Editors and Affiliations

University of New Mexico, Albuquerque, NM, USA
Amanda Bienz
University of Edinburgh, Edinburgh, UK
Michèle Weiland
Université Paris-Saclay, Gif sur Yvette, France
Marc Baboulin
CERFACS, Toulouse, France
Carola Kruse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lei, J., Martínez, H., Flich, J., Quintana-Ortí, E.S. (2023). GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal. In: Bienz, A., Weiland, M., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13999. Springer, Cham. https://doi.org/10.1007/978-3-031-40843-4_44

Download citation

DOI: https://doi.org/10.1007/978-3-031-40843-4_44
Published: 25 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40842-7
Online ISBN: 978-3-031-40843-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel GEMM-based convolution for deep learning on multicore RISC-V processors

A Scalable FPGA Accelerator for Convolutional Neural Networks

An FPGA-Based Solution for Convolution Operation Acceleration

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Parallel GEMM-based convolution for deep learning on multicore RISC-V processors

A Scalable FPGA Accelerator for Convolutional Neural Networks

An FPGA-Based Solution for Convolution Operation Acceleration

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation