research-article

NN-LUT: neural approximation of non-linear operations for efficient transformer inference

Authors:
Joonsang Yu

NAVER Clova

NAVER Clova
View Profile

,
Junki Park

Samsung Advanced Institute of Technology

Samsung Advanced Institute of Technology
View Profile

,
Seongmin Park

Hanyang University

Hanyang University
View Profile

,
Minsoo Kim

Hanyang University

Hanyang University
View Profile

,
Sihwa Lee

Hanyang University

Hanyang University
View Profile

,
Dong Hyun Lee

Samsung Advanced Institute of Technology

Samsung Advanced Institute of Technology
View Profile

,
Jungwook Choi

Hanyang University

Hanyang University
View Profile

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation ConferenceJuly 2022Pages 577–582https://doi.org/10.1145/3489517.3530505

Published:23 August 2022Publication History

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 577–582

ABSTRACT

Non-linear operations such as GELU, Layer normalization, and Soft-max are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a Look-up table(LUT). The proposed framework called Neural network generated LUT(NN-LUT) can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.

References

NVIDIA Deep Learning Accelerator. http://nvdla.org/primer.html.Google Scholar
A. Cantoni. 1971. Optimal Curve Fitting With Piecewise Linear Functions. IEEE Trans. Comput. C-20, 1 (1971), 59--67.Google ScholarDigital Library
J. Chen and X. Liu. 2017. A high-performance deeply pipelined architecture for elementary transcendental function evaluation. In ICCD.Google Scholar
G. Cybenko. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 2, 4 (1989), 303--314.Google Scholar
S. Eldridge, F. Raudies, D. Zou, and A. Joshi. 2014. Neural network-based accelerators for transcendental function approximation. In GLSVLSI.Google Scholar
H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. 2012. Neural acceleration for general-purpose approximate programs. In MICRO.Google Scholar
J.-W. Jang et al. 2021. Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC. In ISCA.Google Scholar
S. Kim et al. 2021. I-BERT: Integer-only BERT Quantizatio. In ICML.Google Scholar
Z. Lu et al. 2017. The expressive power of neural networks. In NeurIPS.Google Scholar
J. R. Stevens et al. 2021. Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers. In DAC.Google Scholar
A. Vaswani et al. 2017. Attention is all you need. In NeurIPS.Google Scholar
H. Wang, Z. Zhang, and S. Han. 2021. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. In HPCA.Google Scholar
W. Zhang et al. 2020. TernaryBERT: Distillation-aware Ultra-low Bit BERT. In EMNLP.Google Scholar

Recommendations

An Efficient FIR Filter Structure Based on Technology-Optimized Multiply-Adder Unit Targeting LUT-Based FPGAs

Finite impulse response (FIR) filter is a fundamental element in digital signal processing (DSP) systems. Traditional implementations have been using application specific integrated circuits (ASICs) or DSP processors. However, the increase in logic ...
Read More
A novel defect classification system of cast-resin transformers by neural network under acoustic emission signal
IMCAS'07: Proceedings of the 6th WSEAS International Conference on Instrumentation, Measurement, Circuits and Systems

Degraded insulating property of electric equipments will lead to serious accident and great loss for the utilities and customers. Partial discharge detection is an efficient diagnosis method to prevent the failure of electric equipments arising from ...
Read More
A Non-linear Function Approximation from Small Samples Based on Nadaraya-Watson Kernel Regression
CICSYN '10: Proceedings of the 2010 2nd International Conference on Computational Intelligence, Communication Systems and Networks

Solving function approximation problem is to appropriately find the relationship between dependent variable and independent variable(s). Function approximation algorithms normally require sufficient amount of samples to approximate a function. However, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
July 2022
1462 pages
ISBN:9781450391429
DOI:10.1145/3489517
General Chair:
Rob Oshana
NXP
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
look-up table
neural network
non-linear function
transformer
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 679
  Total Downloads
- Downloads (Last 12 months)395
- Downloads (Last 6 weeks)50
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

NN-LUT: neural approximation of non-linear operations for efficient transformer inference

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

An Efficient FIR Filter Structure Based on Technology-Optimized Multiply-Adder Unit Targeting LUT-Based FPGAs

A novel defect classification system of cast-resin transformers by neural network under acoustic emission signal

A Non-linear Function Approximation from Small Samples Based on Nadaraya-Watson Kernel Regression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

NN-LUT: neural approximation of non-linear operations for efficient transformer inference

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

An Efficient FIR Filter Structure Based on Technology-Optimized Multiply-Adder Unit Targeting LUT-Based FPGAs

A novel defect classification system of cast-resin transformers by neural network under acoustic emission signal

A Non-linear Function Approximation from Small Samples Based on Nadaraya-Watson Kernel Regression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media