Abstract
We revisit a blocked formulation of the direct convolution algorithm that mimics modern realizations of the general matrix multiplication (gemm), demonstrating that the same approach can be adapted to deliver high performance for deep learning inference tasks on the AI Engine (AIE) tile embedded in Xilinx Versal platforms. Our experimental results on a Xilinx Versal VCK190 shows an arithmetic throughput close to 70% of the theoretical peak of the AIE tile for 8-bit integer operands and the convolutional layers arising in ResNet-50 v.15+ImageNet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmad, S., et al.: Xilinx first 7 nm device: versal AI core (VC1902). In: 2019 IEEE Hot Chips 31 Symposium (HCS), pp. 1–28 (2019)
Barrachina, S., et al.: Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors. J. Parallel Distrib. Comput. 167, 240–254 (2022)
Barrachina, S., et al.: Reformulating the direct convolution for high-performance deep learning inference on ARM processors. J. Syst. Arch. 135, 102806 (2023)
Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. ACM Comput. Surv. 52(4), 65:1–65:43 (2019)
Castelló, A., Quintana-Ortí, E.S., Igual, F.D.: Anatomy of the BLIS family of algorithms for matrix multiplication. In: 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 92–99 (2022)
Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural networks for document processing. In: International Workshop on Frontiers in Handwriting Recognition (2006). https://hal.inria.fr/inria-00112631
Dolz, M.F., et al.: Efficient and portable Winograd convolutions for multi-core processors. J. Supercomput. (2023, to appear)
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)
Goto, K., van de Geijn, R.A.: Anatomy of a high-performance matrix multiplication. ACM Trans. Math. Softw. 34(3), 12:1–12:25 (2008)
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
Low, T.M., et al.: Analytical modeling is enough for high-performance BLIS. ACM Trans. Math. Softw. 43(2), 12:1–12:18 (2016)
Najafabadi, M.M., et al.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1 (2015)
Sze, V., et al.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Van Zee, F.G., van de Geijn, R.A.: BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. 41(3), 14:1–14:33 (2015)
Xilinx: AI Engine tools and flows user guide (UG1079) (2022). https://docs.xilinx.com/r/en-US/ug1079-ai-engine-kernel-coding/Tools
Zhang, J., Franchetti, F., Low, T.M.: High performance zero-memory overhead direct convolutions. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80 (2018)
Zhao, Y., et al.: A faster algorithm for reducing the computational complexity of convolutional neural networks. Algorithms 11(10), 159 (2018)
Acknowledgments
This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme, and Spain, Germany, France, Italy, Poland, Switzerland, Norway.
The authors acknowledge funding from European Union’s Horizon2020 Research and Innovation programme under the Marie Skłodowska Curie Grant Agreement No. 956090 (APROPOS).
This work was also supported by the research project PID2020-113656RB-C22 of MCIN/AEI/10.13039/501100011033. H. Martínez is a postdoc fellow supported by the Junta de Andalucía.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lei, J., Martínez, H., Flich, J., Quintana-Ortí, E.S. (2023). GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal. In: Bienz, A., Weiland, M., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13999. Springer, Cham. https://doi.org/10.1007/978-3-031-40843-4_44
Download citation
DOI: https://doi.org/10.1007/978-3-031-40843-4_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40842-7
Online ISBN: 978-3-031-40843-4
eBook Packages: Computer ScienceComputer Science (R0)