skip to main content
10.1145/3649329.3657317acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Enabling Multiple Tensor-wise Operator Fusion for Transformer Models on Spatial Accelerators

Published: 07 November 2024 Publication History

Abstract

In transformer models, data reuse within an operator is insufficient, which prompts more aggressive multiple tensor-wise operator fusion (multi-tensor fusion). Due to the complexity in tensor-wise operator dataflow, conventional fusion techniques often fall short by limited dataflow options and short fusion length. In this study, we first identify three challenges on multi-tensor fusion that result in inferior fusions. Then we propose dataflow adaptive tiling (DAT), a novel inter-operator dataflow to enable an efficient fusion of multiple operators connected in any form and chained in any length. Then, we broaden the dataflow exploration from intraoperator to inter-operator and develop an exploration framework to quickly find the best dataflow on spatial accelerators with given on-chip buffer size. Experiment results show that DAT delivers 2.24× and 1.74× speedup and 35.5% and 15.5% energy savings on average for edge and cloud accelerators, respectively, comparing to the state-of-the-art dataflow explorer FLAT. DAT is open-sourced at https://github.com/lxu28973/DAT.git.

References

[1]
Manoj Alwani et al. 2016. Fused-Layer CNN Accelerators. In IEEE Micro.
[2]
Yu-Hsin Chen et al. 2019. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. In JETCAS.
[3]
Tri Dao et al. 2022. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. In NeurIPS.
[4]
Jacob Devlin et al. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.
[5]
Mingyu Gao et al. 2019. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators. In ASPLOS.
[6]
Gurobi Optimization, LLC. 2023. Gurobi Optimizer Reference Manual. https://www.gurobi.com
[7]
Qijing Huang et al. 2021. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators. In ISCA.
[8]
Norman P. Jouppi et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In ISCA.
[9]
Norman P. Jouppi et al. 2021. Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial Product. In ISCA.
[10]
Sheng-Chun Kao et al. 2023. FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks. In ASPLOS.
[11]
Sheng-Chun Kao and Tushar Krishna. 2020. GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm. In ICCAD.
[12]
Hyoukjun Kwon et al. 2020. MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings. In IEEE Micro.
[13]
Liqiang Lu et al. 2021. TENET: A Framework for Modeling Tensor Dataflow Based on Relation-Centric Notation. In ISCA.
[14]
Wei Niu et al. 2021. DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion. In PLDI.
[15]
Kiran Seshadri et al. 2022. An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks. In IISWC.
[16]
Yining Shi et al. 2023. Welder: Scheduling Deep Learning Memory Access via Tile-graph. In OSDI.
[17]
Ashish Vaswani et al. 2017. Attention is all you need. In NeurIPS.
[18]
Xuan Yang et al. 2020. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators. In ASPLOS.
[19]
Shixuan Zheng et al. 2022. Atomic Dataflow Based Graph-Level Workload Orchestration for Scalable DNN Accelerators. In HPCA.
[20]
Size Zheng et al. 2023. Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion. In HPCA.
[21]
Haoyi Zhou et al. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In AAAI.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference
June 2024
2159 pages
ISBN:9798400706011
DOI:10.1145/3649329
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

  1. dataflow
  2. inter-operator
  3. fusion
  4. accelerator
  5. transformer

Qualifiers

  • Research-article

Conference

DAC '24
Sponsor:
DAC '24: 61st ACM/IEEE Design Automation Conference
June 23 - 27, 2024
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 140
    Total Downloads
  • Downloads (Last 12 months)140
  • Downloads (Last 6 weeks)37
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media