research-article

CSTrans-OPU: An FPGA-based Overlay Processor with Full Compilation for Transformer Networks via Sparsity Exploration

Authors:

Kun WangAuthors Info & Claims

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

Article No.: 263, Pages 1 - 6

https://doi.org/10.1145/3649329.3657325

Published: 07 November 2024 Publication History

Abstract

A few overlay processors for transformer networks emerge to achieve reconfigurable architectures and dynamic instructions. However, these processors consistently neglect exploring network sparsity, while existing sparse accelerators inefficiently utilize resources with separate computation parts. Furthermore, mainstream compilers for instruction generation are intricate and demand significant engineering efforts. In this work, we propose CSTrans-OPU, an FPGA-based overlay processor with full compilation for transformer networks via sparsity exploration. Specifically, we customize a multi-precision processing element (PE) array with DSP-packing for unified computation format with full resource utilization. Additionally, the introduced sorting and computation mode selection modules make it possible to explore the token sparsity. Moreover, equipped with a user-friendly compiler, CSTrans-OPU enables model parsing, operation fusion, model quantization, instruction generation and reordering directly from model files. Experimental results show that CSTrans-OPU achieves 6.92-20.06× speedup and 182.48× higher energy efficiency compared with CPU, and 1.47-3.85× latency reduction with 4.63-52.53× better energy efficiency compared with GPU. Furthermore, we observe up to 4.28× better latency and 4.94× higher energy efficiency compared with previously customized accelerators, and can be up to 1.93× faster and 4.39× more energy efficient than FPGA processors. To the best of our knowledge, our CSTrans-OPU is the first overlay processor for transformer networks considering sparsity.

References

[1]

Bai et al. 2023. FET-OPU: A Flexible and Efficient FPGA-Based Overlay Processor for Transformer Networks. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD).

[2]

Cao et al. 2022. Swin-Unet: Unet-like pure transformer for medical image segmentation. In Proceedings of European Conference on Computer Vision (ECCV).

[3]

Cao et al. 2023. PP-Transformer: Enable Efficient Deployment of Transformers Through Pattern Pruning. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD).

[4]

Devlin et al. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[5]

Dosovitskiy et al. 2020. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[6]

Hu et al. 2021. Vis-top: Visual transformer overlay processor. arXiv preprint arXiv:2110.10957 (2021).

[7]

Han et al. 2023. HPTA: A High Performance Transformer Accelerator Based on FPGA. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL).

[8]

Khan et al. 2021. NPE: An FPGA-based overlay processor for natural language processing. arXiv preprint arXiv:2104.06535 (2021).

[9]

Li et al. 2020. FTRANS: Energy-efficient acceleration of transformers using FPGA. In Proceedings of ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED).

Digital Library

[10]

Liu et al. 2021. Hardware acceleration of fully quantized bert for efficient natural language processing. In Proceedings of IEEE Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]

Nag et al. 2023. ViTA: A Vision Transformer Inference Accelerator for Edge Applications. arXiv preprint arXiv:2302.09108 (2023).

[12]

Papaphilippou et al. 2018. FLiMS: Fast lightweight merge sorter. In 2018 International Conference on Field-Programmable Technology (FPT).

[13]

Peng et al. 2022. A length adaptive algorithm-hardware co-design of transformer on fpga through sparse attention and dynamic pipelining. In Proceedings of the 59th ACM/IEEE Design Automation Conference.

Digital Library

[14]

Radford et al. 2018. Improving language understanding by generative pre-training. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf (2018).

[15]

Sun et al. 2022. Vaqf: Fully automatic software-hardware co-design framework for low-bit vision transformer. arXiv preprint arXiv:2201.06618 (2022).

[16]

Wang et al. 2021. Spatten: Efficient sparse attention architecture with cascade token and head pruning. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[17]

Yu et al. 2019. OPU: An FPGA-based overlay processor for convolutional neural networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2019).

[18]

Yang et al. 2022. DTQAtten: Leveraging dynamic token-based quantization for efficient attention architecture. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]

Yu et al. 2022. NN-LUT: Neural approximation of non-linear operations for efficient transformer inference. In Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC).

Digital Library

Index Terms

CSTrans-OPU: An FPGA-based Overlay Processor with Full Compilation for Transformer Networks via Sparsity Exploration
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
      2. Reconfigurable computing
    2. Parallel architectures
      1. Single instruction, multiple data
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Index terms have been assigned to the content through auto-classification.

Recommendations

An out-of-order superscalar processor on FPGA: the ReOrder buffer design
DATE '12: Proceedings of the Conference on Design, Automation and Test in Europe

Embedded systems based on FPGA (Field-Programmable Gate Arrays) must exhibit more performance for new applications. However, no high-performance superscalar soft processor is available on the FPGA, because the superscalar architecture is not suitable ...
Graph-OPU: An FPGA-Based Overlay Processor for Graph Neural Networks
FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Graph Neural Networks (GNNs) have outstanding performance on graph-structured data and have been extensively accelerated by field-programmable gate array (FPGA) in various ways. However, existing accelerators significantly lack flexibility, especially in ...
Superthreading: Integrating Compilation Technology and Processor Architecture for Cost-Effective Concurrent Multithreading

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

June 2024

2159 pages

ISBN:9798400706011

DOI:10.1145/3649329

Chair:
Vivek De

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Qualifiers

Research-article

Funding Sources

National Key Research and Development Program of China
Shanghai Pujiang Program

Conference

DAC '24

Sponsor:

SIGDA

DAC '24: 61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
132
Total Downloads

Downloads (Last 12 months)132
Downloads (Last 6 weeks)57

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten