skip to main content
10.1145/3649329.3655936acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article
Open access

SpARC: Token Similarity-Aware Sparse Attention Transformer Accelerator via Row-wise Clustering

Published: 07 November 2024 Publication History

Abstract

Self-attention mechanisms, the key enabler of transformers' remarkable performance, account for a significant portion of the overall transformer computation. Despite its effectiveness, self-attention inherently contains considerable redundancies, making sparse attention an attractive approach. In this paper, we propose SpARC, a sparse attention transformer accelerator that enhances throughput and energy efficiency by reducing the computational complexity of the self-attention mechanism. Our approach exploits inherent row-level redundancies in transformer attention maps to reduce the overall self-attention computation. By employing row-wise clustering, attention scores are calculated only once per cluster to achieve approximate attention without seriously compromising accuracy. To leverage the high parallelism of the proposed clustering approximate attention, we develop a fully pipelined accelerator with a dedicated memory hierarchy. Experimental results demonstrate that SpARC achieves attention map sparsity levels of 85-90% with negligible accuracy loss. SpARC achieves up to 4× core attention speedup and 6× energy efficiency improvement compared to prior sparse attention transformer accelerators.

References

[1]
Ashish Vaswani et al. Attention is all you need. In Proc. of NeurIPS, volume 30, pages 6000--6010, 2017.
[2]
Jacob Devlin et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. of NAACL-HLT, pages 4171--4186, 2019.
[3]
Alec Radford et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[4]
Tom Brown et al. Language models are few-shot learners. In Proc. of NeurIPS, volume 33, pages 1877--1901, 2020.
[5]
Alexey Dosovitskiy et al. An image is worth 16×16 words: Transformers for image recognition at scale. In ICLR, 2020.
[6]
Nikita Kitaev et al. Reformer: The efficient transformer. In ICLR, 2019.
[7]
Aurko Roy et al. Efficient content-based sparse attention with routing transformers. TACL, 9:53--68, 2021.
[8]
Gonçalo M Correia et al. Adaptively sparse transformers. In Proc. of EMNLP-IJCNLP, pages 2174--2184, 2019.
[9]
Baiyun Cui et al. Fine-tune bert with sparse self-attention mechanism. In Proc. of EMNLP-IJCNLP, pages 3548--3553, 2019.
[10]
Yi Tay et al. Sparse sinkhorn attention. In Proc. of ICML, pages 9438--9447. PMLR, 2020.
[11]
Eric Qin et al. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In Proc. of HPCA, pages 58--70. IEEE, 2020.
[12]
Radhakrishna Achanta et al. Slic superpixels compared to state-of-the-art superpixel methods. TPAMI, 34(11):2274--2282, 2012.
[13]
Mike O'Connor et al. Fine-grained dram: Energy-efficient dram for extreme bandwidth systems. In Proc. of MICRO, pages 41--54, 2017.
[14]
Hanrui Wang et al. Spatten: Efficient sparse attention architecture with cascade token and head pruning. In Proc. of HPCA, pages 97--110. IEEE, 2021.
[15]
Liqiang Lu et al. Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture. In Proc. of MICRO, pages 977--991, 2021.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference
June 2024
2159 pages
ISBN:9798400706011
DOI:10.1145/3649329
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

DAC '24
Sponsor:
DAC '24: 61st ACM/IEEE Design Automation Conference
June 23 - 27, 2024
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 267
    Total Downloads
  • Downloads (Last 12 months)267
  • Downloads (Last 6 weeks)94
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media