Skip to main content

Out-of-the-Box Performance of FPGAs for ML Workloads Using Vitis AI

  • Conference paper
  • First Online:
Applied Reconfigurable Computing. Architectures, Tools, and Applications (ARC 2025)

Abstract

Field Programmable Gate Arrays (FPGAs) are an attractive choice for accelerating Machine Learning (ML) workloads due to the flexible fabric of configurable logic blocks, interconnects, and embedded memory. However, programming FPGAs is difficult for ML developers as it requires intricate hardware knowledge. Even though high-level implementation solutions such as HLS are available, they come with their own challenges and a steep learning curve. To address this issue, FPGA vendors have raised the level of abstraction by providing ready-to-deploy frameworks for ML. In this paper, we present an evaluation of the out-of-the-box performance of FPGAs using AMD/Xilinx Vitis AI, a development environment for deploying ML models on FPGAs. The study aims to assess the inference performance of Vitis AI for both edge and cloud platforms. We benchmark various popular and standard pre-trained models focusing on latency, throughput, and power efficiency. Since Google Tensor Processing Units (TPUs) are a platform for out-of-the-box acceleration of ML, we compare these results with cloud TPU and edge TPU in terms of performance, ease of use, and tool support. We discuss the experience of working with Vitis AI, the strengths and limitations of Vitis AI as a plug-and-play solution for FPGA-based ML acceleration, providing insights for developers looking to leverage FPGAs for their inference workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aarrestad, T., et al.: Fast convolutional neural networks on FPGAs with hls4ml. Mach. Learn. Sci. Tech. 2(4), 045015 (2021). https://doi.org/10.1088/2632-2153/ac0ea1

    Article  MATH  Google Scholar 

  2. Abdelfattah, M.S., et al.: DLA: compiler and FPGA overlay for neural network inference acceleration. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL), pp. 411–4117 (2018). https://doi.org/10.1109/FPL.2018.00077

  3. Ahmad, J., Jervis, M., Venkata, R.: Intel® FPGAs and SOCs with intel® FPGA AI suite and OpenVino toolkit drive embedded/edge AI/machine learning applications

    Google Scholar 

  4. AMD: AMD Vitis AI (2020). https://www.amd.com/en/developer/resources/vitis-ai.html

  5. Boutros, A., Nurvitadhi, E., Betz, V.: Specializing for efficiency: customizing AI inference processors on FPGAs. In: 2021 International Conference on Microelectronics (ICM), pp. 62–65. IEEE, New Cairo City, Egypt (2021). https://doi.org/10.1109/ICM52667.2021.9664938

  6. Carini, D.: Comparing hls4ml and VITIS AI for CNN synthesis and evaluation on FPGA: a comprehensive study (2022)

    Google Scholar 

  7. Chung, E., et al.: Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38(2), 8–20 (2018). https://doi.org/10.1109/MM.2018.022071131

    Article  Google Scholar 

  8. Dhilleswararao, P., Boppu, S., Manikandan, M.S., Cenkeramaddi, L.R.: Efficient hardware architectures for accelerating deep neural networks: survey. IEEE Access 10, 131788–131828 (2022). https://doi.org/10.1109/ACCESS.2022.3229767

    Article  Google Scholar 

  9. Guan, Y., et al.: FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 152–159. IEEE (2017)

    Google Scholar 

  10. Hall, M., Betz, V.: HPIPE: heterogeneous layer-pipelined and sparse-aware CNN inference for FPGAs. arXiv preprint arXiv:2007.10451 (2020)

  11. Hamanaka, F., Odan, T., Kise, K., Van Chu, T.: An exploration of state-of-the-art automation frameworks for FPGA-based DNN acceleration. IEEE Access 11, 5701–5713 (2023)

    Article  MATH  Google Scholar 

  12. Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)

    Google Scholar 

  13. Kuon, I., Rose, J.: Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 26(2), 203–215 (2007). https://doi.org/10.1109/TCAD.2006.884574

    Article  MATH  Google Scholar 

  14. Lahti, S., Sjövall, P., Vanne, J., Hämäläinen, T.D.: Are we there yet? A study on the state of high-level synthesis. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(5), 898–911 (2019). https://doi.org/10.1109/TCAD.2018.2834439

    Article  MATH  Google Scholar 

  15. Li, Z., et al.: A high-performance pixel-level fully pipelined hardware accelerator for neural networks. IEEE Trans. Neural Netw. Learn. Syst. 1–14 (2024). https://doi.org/10.1109/TNNLS.2024.3423664

  16. Lian, X., Liu, Z., Song, Z., Dai, J., Zhou, W., Ji, X.: High-performance FPGA-based CNN accelerator with block-floating-point arithmetic. IEEE Trans. Very Large Scale Integr. (VLSI) Systems 27(8), 1874–1885 (2019)

    Google Scholar 

  17. Machura, M., Danilowicz, M., Kryjak, T.: Embedded object detection with custom LittleNet, FINN and VITIS AI DCNN accelerators. J. Low Power Electron. Appl. 12(2), 30 (2022)

    Article  Google Scholar 

  18. Miller, R.: Meta Previews New Data Center Design for an AI-Powered Future. https://www.datacenterfrontier.com/data-center-design/article/33005296/meta-previews-new-data-center-design-for-an-ai-powered-future/. Accessed 12 Sept 2024

  19. Nurvitadhi, E., et al.: Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5–14 (2017)

    Google Scholar 

  20. Ou, Y., Yu, W.H., Un, K.F., Chan, C.H., Zhu, Y.: A 119.64 GOPs/W FPGA-based resnet50 mixed-precision accelerator using the dynamic DSP packing. IEEE Trans. Circuits Syst. II: Express Briefs 71(5), 2554–2558 (2024). https://doi.org/10.1109/TCSII.2024.3377356

  21. Semiconductors, L.: Accelerating implementation of low power artificial intelligence at the edge. In: A Lattice Semiconductor White Paper (2018)

    Google Scholar 

  22. Sharma, H., et al.: DNNweaver: from high-level deep network models to FPGA acceleration. In: The Workshop on Cognitive Architectures (2016)

    Google Scholar 

  23. Stillmaker, A., Baas, B.: Scaling equations for the accurate prediction of CMOS device performance from 180nm to 7nm. Integration 58, 74–81 (2017). https://doi.org/10.1016/j.vlsi.2017.02.002, https://www.sciencedirect.com/science/article/pii/S0167926017300755

  24. Ushiroyama, A., Watanabe, M., Watanabe, N., Nagoya, A.: Convolutional neural network implementations using VITIS AI. In: 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0365–0371 (2022). https://doi.org/10.1109/CCWC54503.2022.9720794

  25. Wang, J., Gu, S.: FPGA implementation of object detection accelerator based on VITIS-AI. In: 2021 11th International Conference on Information Science and Technology (ICIST), pp. 571–577. IEEE (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepak Kumar Athur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Athur, D.K., Pawar, R., Arora, A. (2025). Out-of-the-Box Performance of FPGAs for ML Workloads Using Vitis AI. In: Giorgi, R., Stojilović, M., Stroobandt, D., Brox Jiménez, P., Barriga Barros, Á. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2025. Lecture Notes in Computer Science, vol 15594. Springer, Cham. https://doi.org/10.1007/978-3-031-87995-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-87995-1_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-87994-4

  • Online ISBN: 978-3-031-87995-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics