skip to main content
research-article

Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge

Published: 17 September 2021 Publication History

Abstract

FPGAs, because of their energy efficiency, reconfigurability, and easily tunable HLS designs, have been used to accelerate an increasing number of machine learning, especially CNN-based, applications. As a representative example, IoT Edge applications, which require low latency processing of resource-hungry CNNs, offload the inferences from resource-limited IoT end nodes to Edge servers featuring FPGAs. However, the ever-increasing number of end nodes pressures these FPGA-based servers with new performance and adaptability challenges. While some works have exploited CNN optimizations to alleviate inferences’ computation and memory burdens, others have exploited HLS to tune accelerators for statically defined optimization goals. However, these works have not tackled both CNN and HLS optimizations altogether; neither have they provided any adaptability at runtime, where the workload’s characteristics are unpredictable. In this context, we propose a hybrid two-step approach that, first, creates new optimization opportunities at design-time through the automatic training of CNN model variants (obtained via pruning) and the automatic generation of versions of convolutional accelerators (obtained during HLS synthesis); and, second, synergistically exploits these created CNN and HLS optimization opportunities to deliver a fully dynamic Multi-FPGA system that adapts its resources in a fully automatic or user-configurable manner. We implement this two-step approach as the AdaServ Framework and show, through a smart video surveillance Edge application as a case study, that it adapts to the always-changing Edge conditions: AdaServ processes at least 3.37× more inferences (using the automatic approach) and is at least 6.68× more energy-efficient (user-configurable approach) than original convolutional accelerators and CNN Models (VGG-16 and AlexNet). We also show that AdaServ achieves better results than solutions dynamically changing only the CNN model or HLS version, highlighting the importance of exploring both; and that it is always better than the best statically chosen CNN model and HLS version, showing the need for dynamic adaptability.

References

[1]
Chaim Baskin, Natan Liss, Evgenii Zheltonozhskii, et al. 2018. Streaming architecture for large-scale quantized neural networks on an FPGA-Based dataflow platform. In IPDPS. IEEE Computer Society, 162–169.
[2]
Antonio Carlos S. Beck, Carlos Arthur Lang Lisbôa, and Luigi Carro. 2012. Adaptable embedded systems. Springer.
[3]
Nathan L. Binkert, Bradford M. Beckmann, Gabriel Black, et al. 2011. The gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (2011), 1–7.
[4]
Michaela Blott, Thomas B. Preußer, Nicholas J. Fraser, et al. 2018. FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Trans. Reconfigurable Technol. Syst. 11, 3 (2018), 16:1–16:23.
[5]
Gaetano Borriello, Giovanni Pau, et al. 2015. Starfish: Efficient concurrency support for computer vision applications. In MobiSys. ACM, 213–226.
[6]
Alejandro Cartas, Martin Kocour, Aravindh Raman, et al. 2019. A reality check on inference at mobile networks edge. In EdgeSys. ACM, 54–59.
[7]
Jiasi Chen and Xukan Ran. 2019. Deep learning with edge computing: A Review. Proc. IEEE 107, 8 (2019), 1655–1674.
[8]
Tejalal Choudhary, Vipul Mishra, Anurag Goswami, and Jagannathan Sarangapani. 2020. A comprehensive survey on model compression and acceleration. Artif. Intell. Rev. 53, 7 (2020), 5113–5155.
[9]
Daniel Crankshaw, Xin Wang, Giulio Zhou, et al. 2017. Clipper: A low-latency online prediction serving system. In USENIX-NSDI. 613–627.
[10]
Misha Denil, Babak Shakibi, Laurent Dinh, et al. 2013. Predicting parameters in deep learning. In NIPS. 2148–2156.
[11]
Biyi Fang, Xiao Zeng, Faen Zhang, et al. 2020. FlexDNN: Input-adaptive on-device deep learning for efficient mobile vision. In SEC. IEEE, 84–95.
[12]
Biyi Fang, Xiao Zeng, and Mi Zhang. 2018. NestDNN: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. In MobiCom. 115–127.
[13]
Julian Faraone, Giulio Gambardella, Nicholas J. Fraser, et al. 2018. Customizing low-precision deep neural networks for FPGAs. In FPL. IEEE Computer Society, 97–100.
[14]
Tomoya Fujii, Simpei Sato, Hiroki Nakahara, and Masato Motomura. 2017. An FPGA realization of a deep convolutional neural network using a threshold neuron pruning. In ARC(Lecture Notes in Computer Science, Vol. 10216). 268–280.
[15]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
[16]
Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 object category dataset. Technical Report (2007).
[17]
Adam Paszke Sam Gross, Francisco Massa, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In NeurIPS. 8024–8035.
[18]
Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient DNNs. In NIPS. 1379–1387.
[19]
Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In NIPS. 1135–1143.
[20]
Cong Hao, Xiaofan Zhang, Yuhong Li, et al. 2019. FPGA/DNN Co-Design: An efficient design methodology for iot intelligence on the edge. In DAC. ACM, 206.
[21]
Johann Hauswald, Yiping Kang, Michael A. Laurenzano, et al. 2015. DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers. In ISCA. ACM, 27–40.
[22]
Sebastian Houben, Johannes Stallkamp, Jan Salmen, et al. 2013. Detection of Traffic signs in real-world images: The German traffic sign detection benchmark. In IJCNN. IEEE, 1–8.
[23]
Quan Huynh-Thu and Mohammed Ghanbari. 2008. Temporal aspect of perceived quality in mobile video broadcasting. IEEE Trans. Broadcast. 54, 3 (2008), 641–651.
[24]
Nitthilan Jayakodi, Syrine Belakaria, Aryan Deshwal, et al. 2020. Design and optimization of energy-accuracy tradeoff networks for mobile platforms via pretrained deep models. ACM Trans. Embed. Comput. Syst. 19, 1 (2020), 4:1–4:24.
[25]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, et al. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv (2014).
[26]
Shuang Jiang, Dong He, Chenxi Yang, et al. 2018. Accelerating mobile applications at the network edge with software-programmable FPGAs. In INFOCOM. IEEE, 55–62.
[27]
Shuang Jiang, Zhiyao Ma, Xiao Zeng, et al. 2020. SCYLLA: QoE-aware continuous mobile vision with FPGA-based dynamic deep neural network reconfiguration. In INFOCOM. IEEE, 1369–1378.
[28]
Weiwen Jiang, Edwin H.-M. Sha, Xinyi Zhang, et al. 2019. Achieving super-linear speedup across multi-FPGA for real-time DNN Inference. ACM Trans. Embed. Comput. Syst. 18, 5s, Article 67 (Oct. 2019), 23 pages.
[29]
Woochul Kang, Daeyeon Kim, and Junyoung Park. 2019. DMS: Dynamic model scaling for quality-aware deep learning inference in mobile and embedded devices. IEEE Access 7 (2019), 168048–168059.
[30]
Pham Nam Khanh, Amit Kumar Singh, Akash Kumar, and Khin Mi Mi Aung. 2015. Exploiting loop-array dependencies to accelerate the design space exploration with high level synthesis. In DATE. ACM, 157–162.
[31]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report (2009).
[32]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In NIPS. 1106–1114.
[33]
Hao Li, Asim Kadav, Igor Durdanovic, et al. 2017. Pruning filters for efficient convnets. In ICLR. OpenReview.net.
[34]
Liangzhi Li, Kaoru Ota, and Mianxiong Dong. 2017. When weather matters: IoT-based electrical load forecasting for smart grid. IEEE Commun. Mag. 55, 10 (2017), 46–51.
[35]
Liangzhi Li, Kaoru Ota, and Mianxiong Dong. 2018. Deep learning for smart industry: Efficient manufacture inspection system with fog computing. IEEE Transactions on Industrial Informatics 14, 10 (2018), 4665–4673.
[36]
Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, et al. 2017. Can FPGAs Beat GPUs in accelerating next-generation deep neural networks? In FPGA. ACM, 5–14.
[37]
Wanli Ouyang and Xiaogang Wang. 2013. Joint deep learning for pedestrian detection. In ICCV. IEEE Computer Society, 2056–2063.
[38]
Priyadarshini Panda, Abhronil Sengupta, and Kaushik Roy. 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In DATE. IEEE, 475–480.
[39]
Tiago Peres, Ana Gonçalves, and Mário P. Véstias. 2019. Faster convolutional neural networks in low density FPGAs using block pruning. In ARC(Lecture Notes in Computer Science, Vol. 11444). Springer, 402–416.
[40]
Thorbjörn Posewsky and Daniel Ziener. 2018. A Flexible FPGA-based inference architecture for pruned deep neural networks. In ARCS(Lecture Notes in Computer Science, Vol. 10793). Springer, 311–323.
[41]
Olga Russakovsky, Jia Deng, Hao Su, et al. 2015. ImageNet large scale visual recognition challenge. IJCV 115, 3 (2015), 211–252.
[42]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep convolutional networks for large-scale image recognition. In ICLR.
[43]
Hsin-Yu Ting, Tootiya Giyahchi, Ardalan Amiri Sani, and Eli Bozorgzadeh. 2020. Dynamic sharing in multi-accelerators of neural networks on an FPGA edge device. In ASAP. IEEE, 197–204.
[44]
Xilinx Inc. 2020. A HLS-based Deep Neural Network Accelerator library for Xilinx Ultrascale+ MPSoC devices. https://github.com/Xilinx/CHaiDNN.
[45]
Xilinx Inc. 2020. Smart World AI Video Analytics: Real-Time Analytics For A Smarter, Safer World. https://www.xilinx.com/applications/data-center/video-imaging/v ideo-ai-analytics.html.
[46]
Zirui Xu, Fuxun Yu, Chenchen Liu, and Xiang Chen. 2019. ReForm: Static and dynamic resource-aware DNN reconfiguration framework for mobile device. In DAC. ACM, 1–6.
[47]
Jiecao Yu, Andrew Lukefahr, David J. Palframan, Ganesh S. Dasika, Reetuparna Das, and Scott A. Mahlke. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In ISCA. ACM, 548–560.
[48]
Chen Zhang, Peng Li, Guangyu Sun, et al. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In FPGA. 161–170.
[49]
Tan Zhang, Aakanksha Chowdhery, Paramvir Bahl, Kyle Jamieson, and Suman Banerjee. 2015. The design and implementation of a wireless video surveillance system. In MobiCom. ACM, 426–438.
[50]
Guanwen Zhong, Alok Prakash, Yun Liang, et al. 2016. Lin-analyzer: A high-level performance analysis tool for FPGA-based accelerators. In DAC. IEEE, 1–6.

Cited By

View all
  • (2024)APPQ-CNN: An Adaptive CNNs Inference Accelerator for Synergistically Exploiting Pruning and Quantization Based on FPGAIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33821579:6(874-888)Online publication date: Nov-2024
  • (2024)ADARE: Adaptive Resource Provisioning in Multi-FPGA Edge Environments2024 37th SBC/SBMicro/IEEE Symposium on Integrated Circuits and Systems Design (SBCCI)10.1109/SBCCI62366.2024.10704009(1-5)Online publication date: 2-Sep-2024
  • (2024)Exploiting Virtual Layers and Pruning for FPGA-Based Adaptive Traffic Classification2024 27th Euromicro Conference on Digital System Design (DSD)10.1109/DSD64264.2024.00034(194-201)Online publication date: 28-Aug-2024
  • Show More Cited By

Index Terms

  1. Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 20, Issue 5s
      Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
      October 2021
      1367 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3481713
      • Editor:
      • Tulika Mitra
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 17 September 2021
      Accepted: 01 July 2021
      Revised: 01 June 2021
      Received: 01 April 2021
      Published in TECS Volume 20, Issue 5s

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. FPGA
      2. CNN inference
      3. pruning
      4. High-Level Synthesis
      5. Edge computing

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • CAPES - Brasil - Finance Code
      • FAPERGS and CNPq

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)44
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)APPQ-CNN: An Adaptive CNNs Inference Accelerator for Synergistically Exploiting Pruning and Quantization Based on FPGAIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33821579:6(874-888)Online publication date: Nov-2024
      • (2024)ADARE: Adaptive Resource Provisioning in Multi-FPGA Edge Environments2024 37th SBC/SBMicro/IEEE Symposium on Integrated Circuits and Systems Design (SBCCI)10.1109/SBCCI62366.2024.10704009(1-5)Online publication date: 2-Sep-2024
      • (2024)Exploiting Virtual Layers and Pruning for FPGA-Based Adaptive Traffic Classification2024 27th Euromicro Conference on Digital System Design (DSD)10.1109/DSD64264.2024.00034(194-201)Online publication date: 28-Aug-2024
      • (2023)Pruning and Early-Exit Co-Optimization for CNN Acceleration on FPGAs2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137244(1-6)Online publication date: Apr-2023
      • (2023)A Comprehensive Evaluation of Convolutional Hardware AcceleratorsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.322392570:3(1149-1153)Online publication date: Mar-2023
      • (2023)Design Space Exploration for CNN Offloading to FPGAs at the Edge2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI59464.2023.10238644(1-6)Online publication date: 20-Jun-2023
      • (2023)Dynamic Offloading for Improved Performance and Energy Efficiency in Heterogeneous IoT-Edge-Cloud Continuum2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI59464.2023.10238564(1-6)Online publication date: 20-Jun-2023
      • (2023)Adaptive Inference on Reconfigurable SmartNICs for Traffic ClassificationAdvanced Information Networking and Applications10.1007/978-3-031-28451-9_12(137-148)Online publication date: 15-Mar-2023
      • (2022)AdaFlow: A Framework for Adaptive Dataflow CNN Acceleration on FPGAs2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774727(244-249)Online publication date: 14-Mar-2022
      • (2022)ConfAx: Exploiting Approximate Computing for Configurable FPGA CNN Acceleration at the Edge2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937676(1650-1654)Online publication date: 28-May-2022

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media