skip to main content
10.1145/3665314.3670808acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article
Open access

iSPADE: End-to-end Sparse Architecture for Dense DNN Acceleration via Inverted-bit Representation

Published: 09 September 2024 Publication History

Abstract

While recent cutting-edge deep neural network (DNN) models, such as large language models (LLMs), demonstrate remarkable capabilities, their inherent dense data characteristics limit the performance and energy gains achievable through sparse acceleration. In this paper, we introduce the iSPADE architecture, which sparsifies end-to-end execution of dense DNNs to directly adapt the advantages of sparse acceleration without applying accuracy-sensitive techniques such as pruning. First, we propose inverted-bit representation to eliminate repetitive sign bits in 2's complement representation. Leveraging the inverted-bit representation that generates a significant number of zero bits, we propose data packing and computation skipping techniques to reduce both redundant data movement and computation. Finally, we present an iSPADE bit-slice hardware architecture that efficiently supports and accelerates the proposed sparse dataflow. In the evaluation results, we assess performance across general DNN workloads using 8 popular DNNs. iSPADE achieves 4.1X and 4.5X improvements in energy efficiency and speedup, respectively, over the previous state-of-the-art bit-slice accelerators, and it realizes a 1.7X reduction in memory footprint.

References

[1]
Jacob Devlin et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, pages 4171--4186, 2019.
[2]
Zhenzhong Lan et al. Albert: A lite bert for self-supervised learning of language representations. In ICLR, 2020.
[3]
Alec Radford et al. Language models are unsupervised multitask learners. In OpenAI blog, 1(8):9, 2019.
[4]
Kaiming He et al. Deep residual learning for image recognition. In CVPR, pages 770--778, 2016.
[5]
Alexey Dosovitskiy et al. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
[6]
Zhuang Liu et al. A convnet for the 2020s. In CVPR, pages 11976--11986, 2022.
[7]
Glenn Jocher et al. Ultralytics yolov5, 2020.
[8]
Ben Mildenhall et al. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99--106, 2021.
[9]
Angshuman Parashar et al. Scnn: An accelerator for compressed-sparse convolutional neural networks. In ISCA, pages 27--40, 2017.
[10]
Alberto Delmas Lascorz et al. Bit-tactical: A software/hardware approach to exploiting value and bit sparsity in neural networks. In ASPLOS, pages 749--763, 2019.
[11]
Song Han et al. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR, 2016.
[12]
Tianlong Chen et al. Chasing sparsity in vision transformers: An end-to-end exploration. In NeurIPS, volume 34, pages 19974--19988, 2021.
[13]
Hardik Sharma et al. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network. In ISCA, pages 764--775, 2018.
[14]
Sungju Ryu et al. Bitblade: Area and energy-efficient precision-scalable neural network accelerator with bitwise summation. In DAC, pages 1--6, 2019.
[15]
Dongseok Im et al. Sibia: Signed bit-slice architecture for dense dnn acceleration with slice-level sparsity exploitation. In HPCA, pages 69--80, 2023.
[16]
Yann LeCun et al. Efficient backprop. In Neural networks: Tricks of the trade, pages 9--50. Springer, 2002.
[17]
Sukhan Lee et al. Leveraging power-performance relationship of energy-efficient modern dram devices. IEEE Access, 6:31387--31398, 2018.

Index Terms

  1. iSPADE: End-to-end Sparse Architecture for Dense DNN Acceleration via Inverted-bit Representation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design
    August 2024
    384 pages
    ISBN:9798400706882
    DOI:10.1145/3665314
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 September 2024

    Check for updates

    Author Tags

    1. deep neural network
    2. binary representation
    3. sparse acceleration

    Qualifiers

    • Research-article

    Conference

    ISLPED '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 398 of 1,159 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 133
      Total Downloads
    • Downloads (Last 12 months)133
    • Downloads (Last 6 weeks)35
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media