research-article

Open access

SpARC: Token Similarity-Aware Sparse Attention Transformer Accelerator via Row-wise Clustering

Authors:

Seung-Eon Hwang,

Jongsun ParkAuthors Info & Claims

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

Article No.: 181, Pages 1 - 6

https://doi.org/10.1145/3649329.3655936

Published: 07 November 2024 Publication History

Abstract

Self-attention mechanisms, the key enabler of transformers' remarkable performance, account for a significant portion of the overall transformer computation. Despite its effectiveness, self-attention inherently contains considerable redundancies, making sparse attention an attractive approach. In this paper, we propose SpARC, a sparse attention transformer accelerator that enhances throughput and energy efficiency by reducing the computational complexity of the self-attention mechanism. Our approach exploits inherent row-level redundancies in transformer attention maps to reduce the overall self-attention computation. By employing row-wise clustering, attention scores are calculated only once per cluster to achieve approximate attention without seriously compromising accuracy. To leverage the high parallelism of the proposed clustering approximate attention, we develop a fully pipelined accelerator with a dedicated memory hierarchy. Experimental results demonstrate that SpARC achieves attention map sparsity levels of 85-90% with negligible accuracy loss. SpARC achieves up to 4× core attention speedup and 6× energy efficiency improvement compared to prior sparse attention transformer accelerators.

References

[1]

Ashish Vaswani et al. Attention is all you need. In Proc. of NeurIPS, volume 30, pages 6000--6010, 2017.

[2]

Jacob Devlin et al. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. of NAACL-HLT, pages 4171--4186, 2019.

[3]

Alec Radford et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.

[4]

Tom Brown et al. Language models are few-shot learners. In Proc. of NeurIPS, volume 33, pages 1877--1901, 2020.

[5]

Alexey Dosovitskiy et al. An image is worth 16×16 words: Transformers for image recognition at scale. In ICLR, 2020.

[6]

Nikita Kitaev et al. Reformer: The efficient transformer. In ICLR, 2019.

[7]

Aurko Roy et al. Efficient content-based sparse attention with routing transformers. TACL, 9:53--68, 2021.

[8]

Gonçalo M Correia et al. Adaptively sparse transformers. In Proc. of EMNLP-IJCNLP, pages 2174--2184, 2019.

[9]

Baiyun Cui et al. Fine-tune bert with sparse self-attention mechanism. In Proc. of EMNLP-IJCNLP, pages 3548--3553, 2019.

[10]

Yi Tay et al. Sparse sinkhorn attention. In Proc. of ICML, pages 9438--9447. PMLR, 2020.

[11]

Eric Qin et al. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In Proc. of HPCA, pages 58--70. IEEE, 2020.

[12]

Radhakrishna Achanta et al. Slic superpixels compared to state-of-the-art superpixel methods. TPAMI, 34(11):2274--2282, 2012.

Digital Library

[13]

Mike O'Connor et al. Fine-grained dram: Energy-efficient dram for extreme bandwidth systems. In Proc. of MICRO, pages 41--54, 2017.

[14]

Hanrui Wang et al. Spatten: Efficient sparse attention architecture with cascade token and head pruning. In Proc. of HPCA, pages 97--110. IEEE, 2021.

[15]

Liqiang Lu et al. Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture. In Proc. of MICRO, pages 977--991, 2021.

Index Terms

SpARC: Token Similarity-Aware Sparse Attention Transformer Accelerator via Row-wise Clustering

Index terms have been assigned to the content through auto-classification.

Recommendations

Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling
Advances in Knowledge Discovery and Data Mining
Abstract
Sparsifying the Transformer has garnered considerable interest, as training the Transformer is very computationally demanding. Prior efforts to sparsify the Transformer have either used a fixed pattern or data-driven approach to reduce the number ...
Sparse self-attention transformer for image inpainting
Abstract
Learning-based image inpainting methods have made remarkable progress in recent years. Nevertheless, these methods still suffer from issues such as blurring, artifacts, and inconsistent contents. The use of vanilla convolution kernels, which have ...
Highlights
- To accommodate the long-range modeling capacity of transformer blocks while reducing the computational burden, we introduce a novel U-Net style transformer-based network, called sparse self-attention transformer (Spa-former), to approach ...
Sparse subspace clustering via nonconvex approximation

Among existing clustering methods, sparse subspace clustering (SSC) obtains superior clustering performance in grouping data points from a union of subspaces by solving a relaxed $$\ell _{0}$$ℓ0-minimization problem by $$\ell _{1}$$ℓ1-norm. The use of $$...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

June 2024

2159 pages

ISBN:9798400706011

DOI:10.1145/3649329

Chair:
Vivek De

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea
Samsung Research Funding and Incubation Center of Samsung Electronics
Institute of Information & communications Technology Planning & Evaluation

Conference

DAC '24

Sponsor:

SIGDA

DAC '24: 61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
270
Total Downloads

Downloads (Last 12 months)270
Downloads (Last 6 weeks)93

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten