skip to main content
10.1145/2966986.2967068guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices

Published: 07 November 2016 Publication History

Abstract

The rapid development of deep learning are enabling a plenty of novel applications such as image and speech recognition for embedded systems, robotics or smart wearable devices. However, typical deep learning models like deep convolutional neural networks (CNNs) consume so much on-chip storage and high-throughput compute resources that they cannot be easily handled by mobile or embedded devices with thrifty silicon and power budget. In order to enable large CNN models in mobile or more cutting-edge devices for IoT or cyberphysics applications, we proposed an efficient on-chip memory architecture for CNN inference acceleration, and showed its application to our in-house general-purpose deep learning accelerator. The redesigned on-chip memory subsystem, Memsqueezer, includes an active weight buffer set and data buffer set that embrace specialized compression methods to reduce the footprint of CNN weight and data set respectively. The Memsqueezer buffer can compress the data and weight set according to their distinct features, and it also includes a built-in redundancy detection mechanism that actively scans through the work-set of CNNs to boost their inference performance by eliminating the data redundancy. In our experiment, it is shown that the CNN accelerators with Memsqueezer buffers achieves more than 2× performance improvement and reduces 80% energy consumption on average over the conventional buffer design with the same area budget.

References

[1]
S. Han et al., “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding”, arXiv preprint, 2015.
[2]
S. Cadambi et al., “A programmable parallel accelerator for learning and classification”, in Proc. PACT, 2010.
[3]
T. Chen et al., “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning”, in Proc. of ASPLOS, 2014.
[4]
Y. Wang et al., “DeepBurning: Automatic Generation of FPGA-based Learning Accelerators for the Neural Network Family”, in Proc. DAC, 2016.
[5]
C. Farabet et al., “NeuFlow: A runtime reconfigurable dataflow processor for vision”, In CVPR Workshop, 2011.
[6]
S. Chakradhar et al., “A dynamically configurable coprocessor for convolutional neural networks”, in Proc ISCA, 2010.
[7]
J. Albericio et al., “Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing”, in Proc. ISCA, 2016.
[8]
S. Han et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network”, in Proc. ISCA, 2016.
[9]
Y. Chen et al., “Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks”, in Proc. ISCA, 2016.
[10]
A. R. Alameldeen et al., “Adaptive cache compression for highperformance processors”, in Proc. ISCA, 2004.
[12]
L. Song et al., “C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization”, in Proc. DAC, 2016.
[13]
O. Russakovsky et al., “Imagenet large scale visual recognition challenge. International Journal of Computer Vision”, pp. 1–42, 2014.

Cited By

View all
  • (2024)基于拐角流量检测的视觉特征提取与跟踪方法智能机器人10.52810/JIR.2024.0011:1(1-10)Online publication date: 2-Mar-2024
  • (2024)REC: REtime Convolutional Layers to Fully Exploit Harvested Energy for ReRAM-based CNN AcceleratorsACM Transactions on Embedded Computing Systems10.1145/365259323:6(1-25)Online publication date: 11-Sep-2024
  • (2024)A lightweight distillation recurrent convolution network on FPGA for real-time video super-resolutionMultimedia Systems10.1007/s00530-024-01528-030:6Online publication date: 15-Oct-2024
  • Show More Cited By

Index Terms

  1. Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
        Nov 2016
        946 pages

        Publisher

        IEEE Press

        Publication History

        Published: 07 November 2016

        Permissions

        Request permissions for this article.

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 15 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)基于拐角流量检测的视觉特征提取与跟踪方法智能机器人10.52810/JIR.2024.0011:1(1-10)Online publication date: 2-Mar-2024
        • (2024)REC: REtime Convolutional Layers to Fully Exploit Harvested Energy for ReRAM-based CNN AcceleratorsACM Transactions on Embedded Computing Systems10.1145/365259323:6(1-25)Online publication date: 11-Sep-2024
        • (2024)A lightweight distillation recurrent convolution network on FPGA for real-time video super-resolutionMultimedia Systems10.1007/s00530-024-01528-030:6Online publication date: 15-Oct-2024
        • (2023)Topological Dependencies in Deep Learning for Mobile Edge: Distributed and Collaborative High-Speed Inference2023 Second International Conference on Electronics and Renewable Systems (ICEARS)10.1109/ICEARS56392.2023.10084935(1165-1171)Online publication date: 2-Mar-2023
        • (2022)On Minimizing the Read Latency of Flash Memory to Preserve Inter-Tree Locality in Random ForestProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549365(1-9)Online publication date: 30-Oct-2022
        • (2022)Distributed and Collaborative High-Speed Inference Deep Learning for Mobile Edge with Topological DependenciesIEEE Transactions on Cloud Computing10.1109/TCC.2020.297884610:2(821-834)Online publication date: 1-Apr-2022
        • (2022)Compression of Deep Neural Networks based on quantized tensor decomposition to implement on reconfigurable hardware platformsNeural Networks10.1016/j.neunet.2022.02.024150:C(350-363)Online publication date: 18-May-2022
        • (2022)Aerial Robotics for Precision Agriculture: Weeds Detection Through UAV and Machine VisionOptoelectronic Devices in Robotic Systems10.1007/978-3-031-09791-1_2(23-51)Online publication date: 15-Jun-2022
        • (2021)Edge computing tied in artificial neural network classifiers10.20334/2021-021-MOnline publication date: 2021
        • (2021)On designing the adaptive computation framework of distributed deep learning models for Internet-of-Things applicationsThe Journal of Supercomputing10.1007/s11227-021-03795-4Online publication date: 21-Apr-2021
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media