Abstract
AI solutions, such as Deep Learning (DL), are becoming increasingly prevalent in edge devices. Many of these applications require low latency processing of large amounts of data within a tight power budget. In this context, reconfigurable embedded devices make a compelling option. Deploying DL models to reconfigurable devices does, however, present considerable challenges. One key issue is reconciling the often large compute requirements of DL models with the limited available resources on edge devices. In this paper, we present a hardware-aware optimization strategy for deploying DL neural networks to FPGAs, which automatically identifies hardware configurations that maximize resource utilization for a given level of computation throughput. We demonstrate our optimization approach on a sample neural network containing a combination of convolutional and fully connected layers, running on a sample FPGA target device, achieving a factor of 3.5 reduction in DSP block usage without affecting throughput when using performance mode. When using the compact mode, a factor of 7.4 reduction in DSP block usage is achieved, at the cost of 1.8 times decrease in throughput. Our approach works completely automatically without the need for human intervention or domain knowledge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bierzynski, K., et al.: AI at the edge. 2021 EPoSS White Paper (2021)
Duarte, J., et al.: Fast inference of deep neural networks in FPGAs for particle physics. J. Instr. 13(07), P07027 (2018)
Fahim, F., et al.: hls4ml: an open-source codesign workow to empower scientifc low-power machine learning devices. arXiv (2021)
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2, 359–366 (1989)
Li, P., et al.: Resource-aware throughput optimization for high-level synthesis. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 200–209 (2015)
Nakahara, H., et al.: High-throughput convolutional neural network on an FPGA by customized JPEG compression. In: IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE (2020)
Oppermann, J., et al.: SkyCastle: a resource-aware multi-loop scheduler for high-level synthesis. In: 2019 International Conference on Field-Programmable Technology (ICFPT), pp. 36–44. IEEE (2019)
Que, Z., et al.: Accelerating recurrent neural networks for gravitational wave experiments. arXiv (2021)
Tridgell, S., et al.: Unrolling ternary neural networks. ACM Trans. Recongurable Technol. Syst. (TRETS) 12(4), 1–23 (2019)
Vandebon, J., et al.: Enhancing high-level synthesis using a meta-programming approach. IEEE (2021)
Zhong, G., et al.: Design space exploration of FPGA-based accelerators with multi-level parallelism. In: 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1141–1146. IEEE (2017)
Acknowledgements
The support of the UK EPSRC (grant number EP/V028251/1, EP/L016796/1, EP/S030069/1 and EP/N031768/1) and AMD is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Rognlien, M., Que, Z., Coutinho, J.G.F., Luk, W. (2022). Hardware-Aware Optimizations for Deep Learning Inference on Edge Devices. In: Gan, L., Wang, Y., Xue, W., Chau, T. (eds) Applied Reconfigurable Computing. Architectures, Tools, and Applications. ARC 2022. Lecture Notes in Computer Science, vol 13569. Springer, Cham. https://doi.org/10.1007/978-3-031-19983-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-19983-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19982-0
Online ISBN: 978-3-031-19983-7
eBook Packages: Computer ScienceComputer Science (R0)