Performance/Resources Comparison of Hardware Implementations on Fully Connected Network Inference

Lozada, Randy; Ruiz, Jorge; González, Manuel L.; Sedano, Javier; Villar, José R.; García-Vico, Ángel M.; Skibinsky-Gitlin, E. S.

doi:10.1007/978-3-031-21753-1_34

Randy Lozada¹⁰,
Jorge Ruiz¹⁰,
Manuel L. González¹⁰,
Javier Sedano¹⁰,
José R. Villar¹¹,
Ángel M. García-Vico¹² &
…
E. S. Skibinsky-Gitlin¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13756))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1150 Accesses

Abstract

Fully Connected Network inference is a complex algorithm that can be accelerated using edge devices like Field Programmable Gate Array (FPGA). One commonly known performance improvement for Fully Connected Network inference is quantization. This technique replaces the floating points weights of the network by integers. Frameworks like Open Neural Network Exchange (ONNX) and Tensorflow Lite provide solutions for this procedure. However, these frameworks have different inference algorithms with different operations and data types. In this article inference algorithms of common Fully Connected Networks in ONNX and Tensorflow Lite have been analysed. A performance and resource usage comparison is tested on Xilinx^® Zynq UltraScale+^™ MPSoC. Results show that to achieve lower latency is better to avoid floating point operations in the inference algorithm. In terms of FPGA resource usage, an increase is observed when the neural network becomes more complex regardless of the algorithm. This growth in resource usage is framework-dependent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LaRed: An LSTM Accelerator on RISC-V-Based Edge Devices

Low- and Mixed-Precision Inference Accelerators

EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters

References

Abiodun, O.I., et al.: State-of-the-art in artificial neural network applications: a survey. Heliyon 4(11), e00938 (2018). ISSN: 24058440. https://doi.org/10.1016/j.heliyon.2018.e00938
Schwartz, R., et al.: Green AI. Technical report. arXiv:1907.10597, August 2019
Gordienko, Y., et al.: “Last mile” optimization of edge computing ecosystem with deep learning models and specialized tensor processing architectures. In: Advances in Computers, vol. 122, pp. 303–341. Elsevier (2021). https://doi.org/10.1016/bs.adcom.2020.10.003
Nurvitadhi, E., et al.: Can FPGAs beat GPUs in accelerating next- generation deep neural networks?, pp. 5–14 (2017). https://doi.org/10.1145/3020078.3021740
Seng, K.P., Lee, P.J., Ang, L.M.: Embedded intelligence on FPGA: survey, applications and challenges. Electronics 10(8) (2021). ISSN: 2079–9292. https://doi.org/10.3390/electronics10080895
Baptista, D., Sousa, L., Morgado-Dias, F.: Raising the abstraction level of a deep learning design on FPGAs. IEEE Access 8, 205148–205161 (2020). ISSN: 2169–3536. https://doi.org/10.1109/ACCESS.2020.3036975
Nagel, M., et al.: Up or down? Adaptive rounding for post-training quantization. Number. arXiv:2004.10568, June 2020
Novickis, R., et al.: An approach of feed-forward neural network throughput-optimized implementation in FPGA. Electronics 9(12), 2193 (2020). ISSN: 2079–9292. https://doi.org/10.3390/electronics9122193
Abdelsalam, A.M., et al.: An efficient FPGA-based overlay inference architecture for fully connected DNNs. In: 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig). Cancun, Mexico, pp. 1–6. IEEE, December 2018. ISBN: 978-1-72811-968-7. RECONFIG.2018.8641735. https://doi.org/10.1109/RECONFIG.2018.8641735
Bjerge, K. Schougaard, J.H., Larsen, D.E.: A scalable and efficient convolutional neural network accelerator using HLS for a system-on-chip design. Microprocess. Microsyst. 87, 104363 (2021). ISSN: 01419331.104363. https://doi.org/10.1016/j.micpro.2021104363
Nicodemo, N., et al.: Memory requirement reduction of deep neural networks for field programmable gate arrays using low-bit quantization of parameters. In: 2020 28th European Signal Processing Conference (EUSIPCO), pp. 466–470. IEEE, Amsterdam, January 2021. ISBN: 978-90-827970-5-3. https://doi.org/10.23919/Eusipco47968.2020.9287739
Mukhopadhyay, A.K., Majumder, S., Chakrabarti, I.: 11Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array. Comput. Electric. Eng. 97, 107628 (2022). ISSN: 00457906. https://doi.org/10.1016/j.compeleceng.2021.107628
Gholami, A., et al.: A survey of quantization methods for efficient neural network inference (2021). https://doi.org/10.48550/ARXIV.2103.13630
ONNX Runtime developers. ONNX Runtime. https://onnxruntime.ai/ Version 1.11.0. 2021
Xilinx Inc.: Vitis high-level synthesis user guide. Ug1399 2, pp. 1–657 (2020)
Google Scholar
Xilinx Inc.: UltraScale architecture DSP slice: user guide. Xilinx Tech. Documentation 579, 1–75 (2018). https://www.xilinx.com/support/documentation/user%7B%5C_%7Dguides/ug579-ultrascale-dsp.pdf
Inc, X.: Zynq UltraScale + MPSoC Data Sheet: overview processing system (PS) Arm Cortex-A53 based application dual-core arm Cortex-R5 based on-chip memory. Xilinx Tech. Documentation 891, 1–42 (2018)
Google Scholar
Xilinx Inc.: Vitis unified software platform documentation embedded software development. UG1400, p. 667 (2021)
Google Scholar
Avnet: Ultra96-V2 Board (2022). https://www.avnet.com/wps/portal/us/products/new-product-introductions/npi/aes-ultra96-v2/
LeCun, Y., Cortes, C., Burges, C.L MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/

Download references

Acknowledgments

This work has been founded by the following institutions: the Ministry of Science and Innovation under CERVERA Excellence Network project CER-20211003 (IBERUS), Missions Science and Innovation project MIG-20211008 (INMERBOT), European Union’s Horizon 2020 research and innovation programme (project DIH4CPS) under the Grant Agreement no 872548, CDTI (Centro para el Desarrollo Tecnológico Industrial) under projects CER-20211022, ICE (Junta de Castilla y León) under project CCTT3/20/BU/0002, the Spanish Ministry of Economics and Industry under the grant PID2020-112726RB-I00, the Principado de Asturias under the grant SV-PA-21-AYUD/2021/50994 and the Regional Government of Andalusia, program “Personal Investigador Doctor”, reference DOC_00235.

Author information

Authors and Affiliations

Instituto Tecnológico de Castilla y León, Burgos, Spain
Randy Lozada, Jorge Ruiz, Manuel L. González, Javier Sedano & E. S. Skibinsky-Gitlin
University of Oviedo, Oviedo, Spain
José R. Villar
Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, 18071, Granada, Spain
Ángel M. García-Vico

Authors

Randy Lozada
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Manuel L. González
View author publications
You can also search for this author in PubMed Google Scholar
Javier Sedano
View author publications
You can also search for this author in PubMed Google Scholar
José R. Villar
View author publications
You can also search for this author in PubMed Google Scholar
Ángel M. García-Vico
View author publications
You can also search for this author in PubMed Google Scholar
E. S. Skibinsky-Gitlin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel L. González .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Technical University of Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lozada, R. et al. (2022). Performance/Resources Comparison of Hardware Implementations on Fully Connected Network Inference. In: Yin, H., Camacho, D., Tino, P. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2022. IDEAL 2022. Lecture Notes in Computer Science, vol 13756. Springer, Cham. https://doi.org/10.1007/978-3-031-21753-1_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-21753-1_34
Published: 21 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21752-4
Online ISBN: 978-3-031-21753-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance/Resources Comparison of Hardware Implementations on Fully Connected Network Inference