skip to main content
10.1145/3489517.3530584acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

GTuner: tuning DNN computations on GPU via graph attention network

Published: 23 August 2022 Publication History

Abstract

It is an open problem to compile DNN models on GPU and improve the performance. A novel framework, GTuner, is proposed to jointly learn from the structures of computational graphs and the statistical features of codes to find the optimal code implementations. A Graph ATtention network (GAT) is designed as the performance estimator in GTuner. In GAT, graph neural layers are used to propagate the information in the graph and a multi-head self-attention module is designed to learn the complicated relationships between the features. Under the guidance of GAT, the GPU codes are generated through auto-tuning. Experimental results demonstrate that our method outperforms the previous arts remarkably.

References

[1]
S. Han et al., "Deep Compression: Compressing deep neural networks with pruning, trained quantization and huffman coding," in Proc. ICLR, 2016.
[2]
T. Chen et al., "An efficient sharing grouped convolution via Bayesian learning," IEEE TNNLS, 2021.
[3]
X. Zhang et al., "Exploring HW/SW co-design for video analysis on CPU-FPGA heterogeneous systems," IEEE TCAD, 2021.
[4]
T.-M. Li et al., "Differentiable programming for image processing and deep learning in Halide," ACM SIGGRAPH, 2018.
[5]
T. Chen et al., "TVM: An automated end-to-end optimizing compiler for deep learning," in Proc. OSDI, 2018.
[6]
B. H. Ahn et al., "CHAMELEON: Adaptive code optimization for expedited deep neural network compilation," in Proc. ICLR, 2020.
[7]
L. Zheng et al., "Ansor: Generating high-performance tensor programs for deep learning," in Proc. OSDI, 2020.
[8]
T. Chen et al., "Learning to optimize tensor programs," in Proc. NeurIPS, 2018.
[9]
Q. Sun et al., "Deep neural network hardware deployment optimization via advanced active learning," in Proc. DATE, 2021.
[10]
J. Mu et al., "A history-based auto-tuning framework for fast and high-performance DNN design on GPU," in Proc. DAC, 2020.
[11]
Q. Sun et al., "Fast and efficient DNN deployment via deep Gaussian transfer learning," in Proc. ICCV, 2021.
[12]
C. Hao et al., "FPGA/DNN co-design: An efficient design methodology for IoT intelligence on the edge," in Proc. DAC, 2019.
[13]
Z. Song et al., "GPNPU: enabling efficient hardware-based direct convolution with multi-precision support in GPU tensor cores," in Proc. DAC, 2020.
[14]
Y.-H. Chen et al., "Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices," IEEE JETCAS, 2019.
[15]
N. P. Jouppi et al., "In-datacenter performance analysis of a tensor processing unit," in Proc. ISCA 2017.
[16]
Z. Wu et al., "A comprehensive survey on graph neural networks," IEEE TNNLS, 2020.
[17]
K. Xu et al., "How powerful are graph neural networks?" in Proc. ICLR, 2019.
[18]
J. Lee et al., "Self-attention graph pooling," in Proc. ICML, 2019.
[19]
J. Atwood et al., "Diffusion-convolutional neural networks," in Proc. NeurIPS, 2016.
[20]
A. Vaswani et al., "Attention is all you need," in Proc. NeurIPS, 2017.
[21]
A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," in Proc. ICLR, 2021.
[22]
N. Carion et al., "End-to-end object detection with transformers," in Proc. ECCV, 2020.
[23]
C. Morris et al., "Weisfeiler and Leman go neural: Higher-order graph neural networks," in Proc. AAAI, 2019.
[24]
K. He et al., "Deep residual learning for image recognition," in Proc. CVPR, 2016.
[25]
A. G. Howard et al., "MobileNets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[26]
F. N. Iandola et al., "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 mb model size," arXiv preprint arXiv:1602.07360, 2016.
[27]
A. Paszke et al., "Automatic differentiation in PyTorch," in NIPS Workshop, 2017.
[28]
T. N. Kipf et al., "Semi-supervised classification with graph convolutional networks," in Proc. ICLR, 2017.
[29]
P. Veličković et al., "Graph attention networks," in Proc. ICLR, 2018.
[30]
W. Hamilton et al., "Inductive representation learning on large graphs," in Proc. NeurIPS, 2017.

Cited By

View all
  • (2024)Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332563035:1(20-33)Online publication date: 1-Jan-2024
  • (2024)GPU Performance Optimization via Intergroup Cache CooperationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370743:11(4142-4153)Online publication date: Nov-2024
  • (2024)ATA-Cache: Contention Mitigation for GPU Shared L1 Cache With Aggregated Tag ArrayIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333719243:5(1429-1441)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
July 2022
1462 pages
ISBN:9781450391429
DOI:10.1145/3489517
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

DAC '22
Sponsor:
DAC '22: 59th ACM/IEEE Design Automation Conference
July 10 - 14, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332563035:1(20-33)Online publication date: 1-Jan-2024
  • (2024)GPU Performance Optimization via Intergroup Cache CooperationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370743:11(4142-4153)Online publication date: Nov-2024
  • (2024)ATA-Cache: Contention Mitigation for GPU Shared L1 Cache With Aggregated Tag ArrayIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.333719243:5(1429-1441)Online publication date: May-2024
  • (2024)Convergence-aware operator-wise mixed-precision trainingCCF Transactions on High Performance Computing10.1007/s42514-024-00208-97:1(43-57)Online publication date: 31-Dec-2024
  • (2023)GTCO: Graph and Tensor Co-Design for Transformer-Based Image Recognition on Tensor CoresIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.331716943:2(586-599)Online publication date: 19-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media