Abstract
Field Programmable Gate Arrays (FPGAs) are an attractive choice for accelerating Machine Learning (ML) workloads due to the flexible fabric of configurable logic blocks, interconnects, and embedded memory. However, programming FPGAs is difficult for ML developers as it requires intricate hardware knowledge. Even though high-level implementation solutions such as HLS are available, they come with their own challenges and a steep learning curve. To address this issue, FPGA vendors have raised the level of abstraction by providing ready-to-deploy frameworks for ML. In this paper, we present an evaluation of the out-of-the-box performance of FPGAs using AMD/Xilinx Vitis AI, a development environment for deploying ML models on FPGAs. The study aims to assess the inference performance of Vitis AI for both edge and cloud platforms. We benchmark various popular and standard pre-trained models focusing on latency, throughput, and power efficiency. Since Google Tensor Processing Units (TPUs) are a platform for out-of-the-box acceleration of ML, we compare these results with cloud TPU and edge TPU in terms of performance, ease of use, and tool support. We discuss the experience of working with Vitis AI, the strengths and limitations of Vitis AI as a plug-and-play solution for FPGA-based ML acceleration, providing insights for developers looking to leverage FPGAs for their inference workloads.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aarrestad, T., et al.: Fast convolutional neural networks on FPGAs with hls4ml. Mach. Learn. Sci. Tech. 2(4), 045015 (2021). https://doi.org/10.1088/2632-2153/ac0ea1
Abdelfattah, M.S., et al.: DLA: compiler and FPGA overlay for neural network inference acceleration. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL), pp. 411–4117 (2018). https://doi.org/10.1109/FPL.2018.00077
Ahmad, J., Jervis, M., Venkata, R.: Intel® FPGAs and SOCs with intel® FPGA AI suite and OpenVino toolkit drive embedded/edge AI/machine learning applications
AMD: AMD Vitis AI (2020). https://www.amd.com/en/developer/resources/vitis-ai.html
Boutros, A., Nurvitadhi, E., Betz, V.: Specializing for efficiency: customizing AI inference processors on FPGAs. In: 2021 International Conference on Microelectronics (ICM), pp. 62–65. IEEE, New Cairo City, Egypt (2021). https://doi.org/10.1109/ICM52667.2021.9664938
Carini, D.: Comparing hls4ml and VITIS AI for CNN synthesis and evaluation on FPGA: a comprehensive study (2022)
Chung, E., et al.: Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38(2), 8–20 (2018). https://doi.org/10.1109/MM.2018.022071131
Dhilleswararao, P., Boppu, S., Manikandan, M.S., Cenkeramaddi, L.R.: Efficient hardware architectures for accelerating deep neural networks: survey. IEEE Access 10, 131788–131828 (2022). https://doi.org/10.1109/ACCESS.2022.3229767
Guan, Y., et al.: FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 152–159. IEEE (2017)
Hall, M., Betz, V.: HPIPE: heterogeneous layer-pipelined and sparse-aware CNN inference for FPGAs. arXiv preprint arXiv:2007.10451 (2020)
Hamanaka, F., Odan, T., Kise, K., Van Chu, T.: An exploration of state-of-the-art automation frameworks for FPGA-based DNN acceleration. IEEE Access 11, 5701–5713 (2023)
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)
Kuon, I., Rose, J.: Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 26(2), 203–215 (2007). https://doi.org/10.1109/TCAD.2006.884574
Lahti, S., Sjövall, P., Vanne, J., Hämäläinen, T.D.: Are we there yet? A study on the state of high-level synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(5), 898–911 (2019). https://doi.org/10.1109/TCAD.2018.2834439
Li, Z., et al.: A high-performance pixel-level fully pipelined hardware accelerator for neural networks. IEEE Trans. Neural Netw. Learn. Syst. 1–14 (2024). https://doi.org/10.1109/TNNLS.2024.3423664
Lian, X., Liu, Z., Song, Z., Dai, J., Zhou, W., Ji, X.: High-performance FPGA-based CNN accelerator with block-floating-point arithmetic. IEEE Trans. Very Large Scale Integr. (VLSI) Systems 27(8), 1874–1885 (2019)
Machura, M., Danilowicz, M., Kryjak, T.: Embedded object detection with custom LittleNet, FINN and VITIS AI DCNN accelerators. J. Low Power Electron. Appl. 12(2), 30 (2022)
Miller, R.: Meta Previews New Data Center Design for an AI-Powered Future. https://www.datacenterfrontier.com/data-center-design/article/33005296/meta-previews-new-data-center-design-for-an-ai-powered-future/. Accessed 12 Sept 2024
Nurvitadhi, E., et al.: Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5–14 (2017)
Ou, Y., Yu, W.H., Un, K.F., Chan, C.H., Zhu, Y.: A 119.64 GOPs/W FPGA-based resnet50 mixed-precision accelerator using the dynamic DSP packing. IEEE Trans. Circuits Syst. II: Express Briefs 71(5), 2554–2558 (2024). https://doi.org/10.1109/TCSII.2024.3377356
Semiconductors, L.: Accelerating implementation of low power artificial intelligence at the edge. In: A Lattice Semiconductor White Paper (2018)
Sharma, H., et al.: DNNweaver: from high-level deep network models to FPGA acceleration. In: The Workshop on Cognitive Architectures (2016)
Stillmaker, A., Baas, B.: Scaling equations for the accurate prediction of CMOS device performance from 180nm to 7nm. Integration 58, 74–81 (2017). https://doi.org/10.1016/j.vlsi.2017.02.002, https://www.sciencedirect.com/science/article/pii/S0167926017300755
Ushiroyama, A., Watanabe, M., Watanabe, N., Nagoya, A.: Convolutional neural network implementations using VITIS AI. In: 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0365–0371 (2022). https://doi.org/10.1109/CCWC54503.2022.9720794
Wang, J., Gu, S.: FPGA implementation of object detection accelerator based on VITIS-AI. In: 2021 11th International Conference on Information Science and Technology (ICIST), pp. 571–577. IEEE (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Athur, D.K., Pawar, R., Arora, A. (2025). Out-of-the-Box Performance of FPGAs for ML Workloads Using Vitis AI. In: Giorgi, R., Stojilović, M., Stroobandt, D., Brox Jiménez, P., Barriga Barros, Á. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2025. Lecture Notes in Computer Science, vol 15594. Springer, Cham. https://doi.org/10.1007/978-3-031-87995-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-87995-1_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-87994-4
Online ISBN: 978-3-031-87995-1
eBook Packages: Computer ScienceComputer Science (R0)