skip to main content
10.1145/3649329.3657325acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

CSTrans-OPU: An FPGA-based Overlay Processor with Full Compilation for Transformer Networks via Sparsity Exploration

Published: 07 November 2024 Publication History

Abstract

A few overlay processors for transformer networks emerge to achieve reconfigurable architectures and dynamic instructions. However, these processors consistently neglect exploring network sparsity, while existing sparse accelerators inefficiently utilize resources with separate computation parts. Furthermore, mainstream compilers for instruction generation are intricate and demand significant engineering efforts. In this work, we propose CSTrans-OPU, an FPGA-based overlay processor with full compilation for transformer networks via sparsity exploration. Specifically, we customize a multi-precision processing element (PE) array with DSP-packing for unified computation format with full resource utilization. Additionally, the introduced sorting and computation mode selection modules make it possible to explore the token sparsity. Moreover, equipped with a user-friendly compiler, CSTrans-OPU enables model parsing, operation fusion, model quantization, instruction generation and reordering directly from model files. Experimental results show that CSTrans-OPU achieves 6.92-20.06× speedup and 182.48× higher energy efficiency compared with CPU, and 1.47-3.85× latency reduction with 4.63-52.53× better energy efficiency compared with GPU. Furthermore, we observe up to 4.28× better latency and 4.94× higher energy efficiency compared with previously customized accelerators, and can be up to 1.93× faster and 4.39× more energy efficient than FPGA processors. To the best of our knowledge, our CSTrans-OPU is the first overlay processor for transformer networks considering sparsity.

References

[1]
Bai et al. 2023. FET-OPU: A Flexible and Efficient FPGA-Based Overlay Processor for Transformer Networks. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD).
[2]
Cao et al. 2022. Swin-Unet: Unet-like pure transformer for medical image segmentation. In Proceedings of European Conference on Computer Vision (ECCV).
[3]
Cao et al. 2023. PP-Transformer: Enable Efficient Deployment of Transformers Through Pattern Pruning. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD).
[4]
Devlin et al. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[5]
Dosovitskiy et al. 2020. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[6]
Hu et al. 2021. Vis-top: Visual transformer overlay processor. arXiv preprint arXiv:2110.10957 (2021).
[7]
Han et al. 2023. HPTA: A High Performance Transformer Accelerator Based on FPGA. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL).
[8]
Khan et al. 2021. NPE: An FPGA-based overlay processor for natural language processing. arXiv preprint arXiv:2104.06535 (2021).
[9]
Li et al. 2020. FTRANS: Energy-efficient acceleration of transformers using FPGA. In Proceedings of ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED).
[10]
Liu et al. 2021. Hardware acceleration of fully quantized bert for efficient natural language processing. In Proceedings of IEEE Design, Automation & Test in Europe Conference & Exhibition (DATE).
[11]
Nag et al. 2023. ViTA: A Vision Transformer Inference Accelerator for Edge Applications. arXiv preprint arXiv:2302.09108 (2023).
[12]
Papaphilippou et al. 2018. FLiMS: Fast lightweight merge sorter. In 2018 International Conference on Field-Programmable Technology (FPT).
[13]
Peng et al. 2022. A length adaptive algorithm-hardware co-design of transformer on fpga through sparse attention and dynamic pipelining. In Proceedings of the 59th ACM/IEEE Design Automation Conference.
[14]
Radford et al. 2018. Improving language understanding by generative pre-training. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf (2018).
[15]
Sun et al. 2022. Vaqf: Fully automatic software-hardware co-design framework for low-bit vision transformer. arXiv preprint arXiv:2201.06618 (2022).
[16]
Wang et al. 2021. Spatten: Efficient sparse attention architecture with cascade token and head pruning. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[17]
Yu et al. 2019. OPU: An FPGA-based overlay processor for convolutional neural networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2019).
[18]
Yang et al. 2022. DTQAtten: Leveraging dynamic token-based quantization for efficient attention architecture. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[19]
Yu et al. 2022. NN-LUT: Neural approximation of non-linear operations for efficient transformer inference. In Proceedings of the 59th ACM/IEEE Design Automation Conference (DAC).

Index Terms

  1. CSTrans-OPU: An FPGA-based Overlay Processor with Full Compilation for Transformer Networks via Sparsity Exploration
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference
          June 2024
          2159 pages
          ISBN:9798400706011
          DOI:10.1145/3649329
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Sponsors

          In-Cooperation

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 07 November 2024

          Check for updates

          Qualifiers

          • Research-article

          Funding Sources

          Conference

          DAC '24
          Sponsor:
          DAC '24: 61st ACM/IEEE Design Automation Conference
          June 23 - 27, 2024
          CA, San Francisco, USA

          Acceptance Rates

          Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

          Upcoming Conference

          DAC '25
          62nd ACM/IEEE Design Automation Conference
          June 22 - 26, 2025
          San Francisco , CA , USA

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 132
            Total Downloads
          • Downloads (Last 12 months)132
          • Downloads (Last 6 weeks)57
          Reflects downloads up to 03 Mar 2025

          Other Metrics

          Citations

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media