Skip to main content

Performance/Resources Comparison of Hardware Implementations on Fully Connected Network Inference

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2022 (IDEAL 2022)

Abstract

Fully Connected Network inference is a complex algorithm that can be accelerated using edge devices like Field Programmable Gate Array (FPGA). One commonly known performance improvement for Fully Connected Network inference is quantization. This technique replaces the floating points weights of the network by integers. Frameworks like Open Neural Network Exchange (ONNX) and Tensorflow Lite provide solutions for this procedure. However, these frameworks have different inference algorithms with different operations and data types. In this article inference algorithms of common Fully Connected Networks in ONNX and Tensorflow Lite have been analysed. A performance and resource usage comparison is tested on Xilinx® Zynq UltraScale+ MPSoC. Results show that to achieve lower latency is better to avoid floating point operations in the inference algorithm. In terms of FPGA resource usage, an increase is observed when the neural network becomes more complex regardless of the algorithm. This growth in resource usage is framework-dependent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abiodun, O.I., et al.: State-of-the-art in artificial neural network applications: a survey. Heliyon 4(11), e00938 (2018). ISSN: 24058440. https://doi.org/10.1016/j.heliyon.2018.e00938

  2. Schwartz, R., et al.: Green AI. Technical report. arXiv:1907.10597, August 2019

  3. Gordienko, Y., et al.: “Last mile” optimization of edge computing ecosystem with deep learning models and specialized tensor processing architectures. In: Advances in Computers, vol. 122, pp. 303–341. Elsevier (2021). https://doi.org/10.1016/bs.adcom.2020.10.003

  4. Nurvitadhi, E., et al.: Can FPGAs beat GPUs in accelerating next- generation deep neural networks?, pp. 5–14 (2017). https://doi.org/10.1145/3020078.3021740

  5. Seng, K.P., Lee, P.J., Ang, L.M.: Embedded intelligence on FPGA: survey, applications and challenges. Electronics 10(8) (2021). ISSN: 2079–9292. https://doi.org/10.3390/electronics10080895

  6. Baptista, D., Sousa, L., Morgado-Dias, F.: Raising the abstraction level of a deep learning design on FPGAs. IEEE Access 8, 205148–205161 (2020). ISSN: 2169–3536. https://doi.org/10.1109/ACCESS.2020.3036975

  7. Nagel, M., et al.: Up or down? Adaptive rounding for post-training quantization. Number. arXiv:2004.10568, June 2020

  8. Novickis, R., et al.: An approach of feed-forward neural network throughput-optimized implementation in FPGA. Electronics 9(12), 2193 (2020). ISSN: 2079–9292. https://doi.org/10.3390/electronics9122193

  9. Abdelsalam, A.M., et al.: An efficient FPGA-based overlay inference architecture for fully connected DNNs. In: 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig). Cancun, Mexico, pp. 1–6. IEEE, December 2018. ISBN: 978-1-72811-968-7. RECONFIG.2018.8641735. https://doi.org/10.1109/RECONFIG.2018.8641735

  10. Bjerge, K. Schougaard, J.H., Larsen, D.E.: A scalable and efficient convolutional neural network accelerator using HLS for a system-on-chip design. Microprocess. Microsyst. 87, 104363 (2021). ISSN: 01419331.104363. https://doi.org/10.1016/j.micpro.2021104363

  11. Nicodemo, N., et al.: Memory requirement reduction of deep neural networks for field programmable gate arrays using low-bit quantization of parameters. In: 2020 28th European Signal Processing Conference (EUSIPCO), pp. 466–470. IEEE, Amsterdam, January 2021. ISBN: 978-90-827970-5-3. https://doi.org/10.23919/Eusipco47968.2020.9287739

  12. Mukhopadhyay, A.K., Majumder, S., Chakrabarti, I.: 11Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array. Comput. Electric. Eng. 97, 107628 (2022). ISSN: 00457906. https://doi.org/10.1016/j.compeleceng.2021.107628

  13. Gholami, A., et al.: A survey of quantization methods for efficient neural network inference (2021). https://doi.org/10.48550/ARXIV.2103.13630

  14. ONNX Runtime developers. ONNX Runtime. https://onnxruntime.ai/ Version 1.11.0. 2021

  15. Xilinx Inc.: Vitis high-level synthesis user guide. Ug1399 2, pp. 1–657 (2020)

    Google Scholar 

  16. Xilinx Inc.: UltraScale architecture DSP slice: user guide. Xilinx Tech. Documentation 579, 1–75 (2018). https://www.xilinx.com/support/documentation/user%7B%5C_%7Dguides/ug579-ultrascale-dsp.pdf

  17. Inc, X.: Zynq UltraScale + MPSoC Data Sheet: overview processing system (PS) Arm Cortex-A53 based application dual-core arm Cortex-R5 based on-chip memory. Xilinx Tech. Documentation 891, 1–42 (2018)

    Google Scholar 

  18. Xilinx Inc.: Vitis unified software platform documentation embedded software development. UG1400, p. 667 (2021)

    Google Scholar 

  19. Avnet: Ultra96-V2 Board (2022). https://www.avnet.com/wps/portal/us/products/new-product-introductions/npi/aes-ultra96-v2/

  20. LeCun, Y., Cortes, C., Burges, C.L MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/

Download references

Acknowledgments

This work has been founded by the following institutions: the Ministry of Science and Innovation under CERVERA Excellence Network project CER-20211003 (IBERUS), Missions Science and Innovation project MIG-20211008 (INMERBOT), European Union’s Horizon 2020 research and innovation programme (project DIH4CPS) under the Grant Agreement no 872548, CDTI (Centro para el Desarrollo Tecnológico Industrial) under projects CER-20211022, ICE (Junta de Castilla y León) under project CCTT3/20/BU/0002, the Spanish Ministry of Economics and Industry under the grant PID2020-112726RB-I00, the Principado de Asturias under the grant SV-PA-21-AYUD/2021/50994 and the Regional Government of Andalusia, program “Personal Investigador Doctor”, reference DOC_00235.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel L. González .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lozada, R. et al. (2022). Performance/Resources Comparison of Hardware Implementations on Fully Connected Network Inference. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21753-1_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21752-4

  • Online ISBN: 978-3-031-21753-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics