Out-of-the-Box Performance of FPGAs for ML Workloads Using Vitis AI

Athur, Deepak Kumar; Pawar, Rutuparn; Arora, Aman

doi:10.1007/978-3-031-87995-1_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15594))

Included in the following conference series:

International Symposium on Applied Reconfigurable Computing

164 Accesses

Abstract

Field Programmable Gate Arrays (FPGAs) are an attractive choice for accelerating Machine Learning (ML) workloads due to the flexible fabric of configurable logic blocks, interconnects, and embedded memory. However, programming FPGAs is difficult for ML developers as it requires intricate hardware knowledge. Even though high-level implementation solutions such as HLS are available, they come with their own challenges and a steep learning curve. To address this issue, FPGA vendors have raised the level of abstraction by providing ready-to-deploy frameworks for ML. In this paper, we present an evaluation of the out-of-the-box performance of FPGAs using AMD/Xilinx Vitis AI, a development environment for deploying ML models on FPGAs. The study aims to assess the inference performance of Vitis AI for both edge and cloud platforms. We benchmark various popular and standard pre-trained models focusing on latency, throughput, and power efficiency. Since Google Tensor Processing Units (TPUs) are a platform for out-of-the-box acceleration of ML, we compare these results with cloud TPU and edge TPU in terms of performance, ease of use, and tool support. We discuss the experience of working with Vitis AI, the strengths and limitations of Vitis AI as a plug-and-play solution for FPGA-based ML acceleration, providing insights for developers looking to leverage FPGAs for their inference workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aarrestad, T., et al.: Fast convolutional neural networks on FPGAs with hls4ml. Mach. Learn. Sci. Tech. 2(4), 045015 (2021). https://doi.org/10.1088/2632-2153/ac0ea1
Article MATH Google Scholar
Abdelfattah, M.S., et al.: DLA: compiler and FPGA overlay for neural network inference acceleration. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL), pp. 411–4117 (2018). https://doi.org/10.1109/FPL.2018.00077
Ahmad, J., Jervis, M., Venkata, R.: Intel® FPGAs and SOCs with intel® FPGA AI suite and OpenVino toolkit drive embedded/edge AI/machine learning applications
Google Scholar
AMD: AMD Vitis AI (2020). https://www.amd.com/en/developer/resources/vitis-ai.html
Boutros, A., Nurvitadhi, E., Betz, V.: Specializing for efficiency: customizing AI inference processors on FPGAs. In: 2021 International Conference on Microelectronics (ICM), pp. 62–65. IEEE, New Cairo City, Egypt (2021). https://doi.org/10.1109/ICM52667.2021.9664938
Carini, D.: Comparing hls4ml and VITIS AI for CNN synthesis and evaluation on FPGA: a comprehensive study (2022)
Google Scholar
Chung, E., et al.: Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38(2), 8–20 (2018). https://doi.org/10.1109/MM.2018.022071131
Article Google Scholar
Dhilleswararao, P., Boppu, S., Manikandan, M.S., Cenkeramaddi, L.R.: Efficient hardware architectures for accelerating deep neural networks: survey. IEEE Access 10, 131788–131828 (2022). https://doi.org/10.1109/ACCESS.2022.3229767
Article Google Scholar
Guan, Y., et al.: FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 152–159. IEEE (2017)
Google Scholar
Hall, M., Betz, V.: HPIPE: heterogeneous layer-pipelined and sparse-aware CNN inference for FPGAs. arXiv preprint arXiv:2007.10451 (2020)
Hamanaka, F., Odan, T., Kise, K., Van Chu, T.: An exploration of state-of-the-art automation frameworks for FPGA-based DNN acceleration. IEEE Access 11, 5701–5713 (2023)
Article MATH Google Scholar
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)
Google Scholar
Kuon, I., Rose, J.: Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 26(2), 203–215 (2007). https://doi.org/10.1109/TCAD.2006.884574
Article MATH Google Scholar
Lahti, S., Sjövall, P., Vanne, J., Hämäläinen, T.D.: Are we there yet? A study on the state of high-level synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(5), 898–911 (2019). https://doi.org/10.1109/TCAD.2018.2834439
Article MATH Google Scholar
Li, Z., et al.: A high-performance pixel-level fully pipelined hardware accelerator for neural networks. IEEE Trans. Neural Netw. Learn. Syst. 1–14 (2024). https://doi.org/10.1109/TNNLS.2024.3423664
Lian, X., Liu, Z., Song, Z., Dai, J., Zhou, W., Ji, X.: High-performance FPGA-based CNN accelerator with block-floating-point arithmetic. IEEE Trans. Very Large Scale Integr. (VLSI) Systems 27(8), 1874–1885 (2019)
Google Scholar
Machura, M., Danilowicz, M., Kryjak, T.: Embedded object detection with custom LittleNet, FINN and VITIS AI DCNN accelerators. J. Low Power Electron. Appl. 12(2), 30 (2022)
Article Google Scholar
Miller, R.: Meta Previews New Data Center Design for an AI-Powered Future. https://www.datacenterfrontier.com/data-center-design/article/33005296/meta-previews-new-data-center-design-for-an-ai-powered-future/. Accessed 12 Sept 2024
Nurvitadhi, E., et al.: Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5–14 (2017)
Google Scholar
Ou, Y., Yu, W.H., Un, K.F., Chan, C.H., Zhu, Y.: A 119.64 GOPs/W FPGA-based resnet50 mixed-precision accelerator using the dynamic DSP packing. IEEE Trans. Circuits Syst. II: Express Briefs 71(5), 2554–2558 (2024). https://doi.org/10.1109/TCSII.2024.3377356
Semiconductors, L.: Accelerating implementation of low power artificial intelligence at the edge. In: A Lattice Semiconductor White Paper (2018)
Google Scholar
Sharma, H., et al.: DNNweaver: from high-level deep network models to FPGA acceleration. In: The Workshop on Cognitive Architectures (2016)
Google Scholar
Stillmaker, A., Baas, B.: Scaling equations for the accurate prediction of CMOS device performance from 180nm to 7nm. Integration 58, 74–81 (2017). https://doi.org/10.1016/j.vlsi.2017.02.002, https://www.sciencedirect.com/science/article/pii/S0167926017300755
Ushiroyama, A., Watanabe, M., Watanabe, N., Nagoya, A.: Convolutional neural network implementations using VITIS AI. In: 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0365–0371 (2022). https://doi.org/10.1109/CCWC54503.2022.9720794
Wang, J., Gu, S.: FPGA implementation of object detection accelerator based on VITIS-AI. In: 2021 11th International Conference on Information Science and Technology (ICIST), pp. 571–577. IEEE (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Arizona State University, Tempe, AZ, 85281, USA
Deepak Kumar Athur & Aman Arora
The University of Texas at Austin, Austin, TX, 78712, USA
Rutuparn Pawar

Authors

Deepak Kumar Athur
View author publications
You can also search for this author in PubMed Google Scholar
Rutuparn Pawar
View author publications
You can also search for this author in PubMed Google Scholar
Aman Arora
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepak Kumar Athur .

Editor information

Editors and Affiliations

University of Siena, Siena, Italy
Roberto Giorgi
EPFL, Lausanne, Switzerland
Mirjana Stojilović
Ghent University, Ghent, Belgium
Dirk Stroobandt
University of Seville, Seville, Spain
Piedad Brox Jiménez
University of Seville, Seville, Spain
Ángel Barriga Barros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Athur, D.K., Pawar, R., Arora, A. (2025). Out-of-the-Box Performance of FPGAs for ML Workloads Using Vitis AI. In: Giorgi, R., Stojilović, M., Stroobandt, D., Brox Jiménez, P., Barriga Barros, Á. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2025. Lecture Notes in Computer Science, vol 15594. Springer, Cham. https://doi.org/10.1007/978-3-031-87995-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-87995-1_8
Published: 04 April 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-87994-4
Online ISBN: 978-3-031-87995-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Out-of-the-Box Performance of FPGAs for ML Workloads Using Vitis AI