research-article

Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge

Authors:

Guilherme Korol,

Michael Guilherme Jordan,

Mateus Beck Rutzig,

Antonio Carlos Schneider BeckAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 5s

Article No.: 59, Pages 1 - 26

https://doi.org/10.1145/3476990

Published: 17 September 2021 Publication History

Get Access

Abstract

FPGAs, because of their energy efficiency, reconfigurability, and easily tunable HLS designs, have been used to accelerate an increasing number of machine learning, especially CNN-based, applications. As a representative example, IoT Edge applications, which require low latency processing of resource-hungry CNNs, offload the inferences from resource-limited IoT end nodes to Edge servers featuring FPGAs. However, the ever-increasing number of end nodes pressures these FPGA-based servers with new performance and adaptability challenges. While some works have exploited CNN optimizations to alleviate inferences’ computation and memory burdens, others have exploited HLS to tune accelerators for statically defined optimization goals. However, these works have not tackled both CNN and HLS optimizations altogether; neither have they provided any adaptability at runtime, where the workload’s characteristics are unpredictable. In this context, we propose a hybrid two-step approach that, first, creates new optimization opportunities at design-time through the automatic training of CNN model variants (obtained via pruning) and the automatic generation of versions of convolutional accelerators (obtained during HLS synthesis); and, second, synergistically exploits these created CNN and HLS optimization opportunities to deliver a fully dynamic Multi-FPGA system that adapts its resources in a fully automatic or user-configurable manner. We implement this two-step approach as the AdaServ Framework and show, through a smart video surveillance Edge application as a case study, that it adapts to the always-changing Edge conditions: AdaServ processes at least 3.37× more inferences (using the automatic approach) and is at least 6.68× more energy-efficient (user-configurable approach) than original convolutional accelerators and CNN Models (VGG-16 and AlexNet). We also show that AdaServ achieves better results than solutions dynamically changing only the CNN model or HLS version, highlighting the importance of exploring both; and that it is always better than the best statically chosen CNN model and HLS version, showing the need for dynamic adaptability.

References

[1]

Chaim Baskin, Natan Liss, Evgenii Zheltonozhskii, et al. 2018. Streaming architecture for large-scale quantized neural networks on an FPGA-Based dataflow platform. In IPDPS. IEEE Computer Society, 162–169.

Abstract

References

Cited By

Index Terms

Recommendations

A scalable and efficient convolutional neural network accelerator using HLS for a system-on-chip design

From software to accelerators with LegUp high-level synthesis

Using Dynamic Signal-Tracing to Debug Compiler-Optimized HLS Circuits on FPGAs

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations