research-article

Enabling Multiple Tensor-wise Operator Fusion for Transformer Models on Spatial Accelerators

Authors:

Naifeng JingAuthors Info & Claims

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

Article No.: 232, Pages 1 - 6

https://doi.org/10.1145/3649329.3657317

Published: 07 November 2024 Publication History

Abstract

In transformer models, data reuse within an operator is insufficient, which prompts more aggressive multiple tensor-wise operator fusion (multi-tensor fusion). Due to the complexity in tensor-wise operator dataflow, conventional fusion techniques often fall short by limited dataflow options and short fusion length. In this study, we first identify three challenges on multi-tensor fusion that result in inferior fusions. Then we propose dataflow adaptive tiling (DAT), a novel inter-operator dataflow to enable an efficient fusion of multiple operators connected in any form and chained in any length. Then, we broaden the dataflow exploration from intraoperator to inter-operator and develop an exploration framework to quickly find the best dataflow on spatial accelerators with given on-chip buffer size. Experiment results show that DAT delivers 2.24× and 1.74× speedup and 35.5% and 15.5% energy savings on average for edge and cloud accelerators, respectively, comparing to the state-of-the-art dataflow explorer FLAT. DAT is open-sourced at https://github.com/lxu28973/DAT.git.

References

[1]

Manoj Alwani et al. 2016. Fused-Layer CNN Accelerators. In IEEE Micro.

[2]

Yu-Hsin Chen et al. 2019. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. In JETCAS.

[3]

Tri Dao et al. 2022. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. In NeurIPS.

[4]

Jacob Devlin et al. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.

[5]

Mingyu Gao et al. 2019. TANGRAM: Optimized Coarse-Grained Dataflow for Scalable NN Accelerators. In ASPLOS.

[6]

Gurobi Optimization, LLC. 2023. Gurobi Optimizer Reference Manual. https://www.gurobi.com

[7]

Qijing Huang et al. 2021. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators. In ISCA.

[8]

Norman P. Jouppi et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In ISCA.

[9]

Norman P. Jouppi et al. 2021. Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial Product. In ISCA.

[10]

Sheng-Chun Kao et al. 2023. FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks. In ASPLOS.

[11]

Sheng-Chun Kao and Tushar Krishna. 2020. GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm. In ICCAD.

[12]

Hyoukjun Kwon et al. 2020. MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings. In IEEE Micro.

[13]

Liqiang Lu et al. 2021. TENET: A Framework for Modeling Tensor Dataflow Based on Relation-Centric Notation. In ISCA.

[14]

Wei Niu et al. 2021. DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion. In PLDI.

[15]

Kiran Seshadri et al. 2022. An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks. In IISWC.

[16]

Yining Shi et al. 2023. Welder: Scheduling Deep Learning Memory Access via Tile-graph. In OSDI.

[17]

Ashish Vaswani et al. 2017. Attention is all you need. In NeurIPS.

[18]

Xuan Yang et al. 2020. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators. In ASPLOS.

[19]

Shixuan Zheng et al. 2022. Atomic Dataflow Based Graph-Level Workload Orchestration for Scalable DNN Accelerators. In HPCA.

[20]

Size Zheng et al. 2023. Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion. In HPCA.

[21]

Haoyi Zhou et al. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In AAAI.

Index Terms

Index terms have been assigned to the content through auto-classification.

Recommendations

TileFlow: A Framework for Modeling Fusion Dataflow via Tree-based Analysis
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

With the increasing size of DNN models and the growing discrepancy between compute performance and memory bandwidth, fusing multiple layers together to reduce off-chip memory access has become a popular approach in dataflow design. However, designing ...
FusionFrame: A Fusion Dataflow Scheduling Framework for DNN Accelerators via Analytical Modeling
Algorithms and Architectures for Parallel Processing
Abstract
The growing complexity of DNN models and the widening gap between compute power and memory bandwidth necessitate fusion dataflows to reduce off-chip memory access. However, designing these dataflows across diverse DNNs and DNN accelerator ...
Multimedia Evidence Fusion for Video Concept Detection via OWA Operator
MMM '09: Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling

We present a novel multi-modal evidence fusion method for highlevel feature (HLF) detection in videos. The uni-modal features, such as color histogram, transcript texts, etc, tend to capture different aspects of HLFs and hence share complementariness ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

June 2024

2159 pages

ISBN:9798400706011

DOI:10.1145/3649329

Chair:
Vivek De

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '24

Sponsor:

SIGDA

DAC '24: 61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
140
Total Downloads

Downloads (Last 12 months)140
Downloads (Last 6 weeks)37

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten