research-article

Enabling fast uncertainty estimation: accelerating bayesian transformers via algorithmic and hardware optimizations

Authors:

Martin Ferianc,

Wayne LukAuthors Info & Claims

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 325 - 330

https://doi.org/10.1145/3489517.3530451

Published: 23 August 2022 Publication History

Abstract

Quantifying the uncertainty of neural networks (NNs) has been required by many safety-critical applications such as autonomous driving or medical diagnosis. Recently, Bayesian transformers have demonstrated their capabilities in providing high-quality uncertainty estimates paired with excellent accuracy. However, their real-time deployment is limited by the compute-intensive attention mechanism that is core to the transformer architecture, and the repeated Monte Carlo sampling to quantify the predictive uncertainty. To address these limitations, this paper accelerates Bayesian transformers via both algorithmic and hardware optimizations. On the algorithmic level, an evolutionary algorithm (EA)-based framework is proposed to exploit the sparsity in Bayesian transformers and ease their computational workload. On the hardware level, we demonstrate that the sparsity brings hardware performance improvement on our optimized CPU and GPU implementations. An adaptable hardware architecture is also proposed to accelerate Bayesian transformers on an FPGA. Extensive experiments demonstrate that the EA-based framework, together with hardware optimizations, reduce the latency of Bayesian transformers by up to 13, 12 and 20 times on CPU, GPU and FPGA platforms respectively, while achieving higher algorithmic performance.

References

[1]

H. Awano and M. Hashimoto. 2020. BYNQNet: Bayesian Neural Network with Quadratic Activations for Sampling-Free Uncertainty Estimation on FPGA. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 1402--1407.

[2]

Ruizhe Cai et al. 2018. VIBNN: Hardware acceleration of Bayesian neural networks. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Vol. 53. 476--488.

[3]

Kevin Clark et al. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).

[4]

Jacob Devlin et al. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. CoRR (2018).

[5]

Hongxiang Fan et al. 2019. Static block floating-point quantization for convolutional neural networks on fpga. In International Conference on Field-Programmable Technology (ICFPT). IEEE, 28--35.

[6]

Hongxiang Fan et al. 2021. High-Performance FPGA-based Accelerator for Bayesian Neural Networks. In Proceedings of the 2021 ACM/IEEE Design Automation Conference (DAC). IEEE, 1--6.

[7]

Jeremy Fowers et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1--14.

[8]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (ICML). 1050--1059.

[9]

Tae Jun Ham et al. 2020. A^ 3: Accelerating Attention Mechanisms in Neural Networks with Approximation. In IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 328--341.

[10]

Ranganath Krishnan et al. 2020. Specifying weight priors in bayesian deep neural networks with empirical bayes. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4477--4484.

[11]

Christian Leibig et al. 2017. Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports 7, 1 (2017), 1--14.

[12]

Bingbing Li et al. 2020. FTRANS: energy-efficient acceleration of transformers using FPGA. In ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED). 175--180.

[13]

Yinhan Liu et al. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[14]

Siyuan Lu et al. 2020. Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer. arXiv preprint arXiv:2009.08605 (2020).

[15]

Rowan McAllister et al. 2017. Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI),. 4745--4753.

[16]

Adam Paszke et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Proceedings of the 2019 Advances in neural information processing systems (NeurIPS) 32 (2019), 8026--8037.

[17]

Aditya Prakash, Kashyap Chitta, and Andreas Geiger. 2021. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7077--7087.

[18]

Alec Radford et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.

[19]

Colin Raffel et al. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).

[20]

Johanna Rock, Tiago Azevedo, René de Jong, Daniel Ruiz-Muñoz, and Partha Maji. 2021. On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications. arXiv preprint arXiv:2111.09838 (2021).

[21]

Victor Sanh et al. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR (2019).

[22]

Artem Shelmanov et al. 2021. How Certain is Your Transformer?. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL). 1833--1840.

[23]

Zhourui Song et al. 2018. Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[24]

Nitish Srivastava et al. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.

[25]

Evgenii Tsymbalov et al. 2020. Dropout Strikes Back: Improved Uncertainty Estimation via Diversity Sampling. arXiv preprint arXiv:2003.03274 (2020).

[26]

Ashish Vaswani et al. 2017. Attention is all you need. In Advances in neural information processing systems (NeurIPS). 5998--6008.

[27]

Qiyu Wan and Xin Fu. 2020. Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks. In Proceedings of the 2020 Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 229--240.

[28]

Alex Wang et al. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).

[29]

Hanrui Wang et al. 2021. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. IEEE International Symposium on High Performance Computer Architecture (HPCA) (2021).

[30]

Thomas Wolf et al. 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).

[31]

Bichen Wu et al. 2020. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 (2020).

[32]

Boyang Xue et al. 2021. Bayesian transformer language models for speech recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7378--7382.

[33]

Zhilu Zhang, Adrian V Dalca, and Mert R Sabuncu. 2019. Confidence calibration for convolutional neural networks using structured dropout. arXiv preprint arXiv:1906.09551 (2019).

[34]

Wangchunshu Zhou et al. 2020. Scheduled drophead: A regularization method for transformer models. arXiv preprint arXiv:2004.13342 (2020).

Cited By

Jha SJha SEwetz RVelasquez ADe V(2024)On the Design of Novel Attention Mechanism for Enhanced Efficiency of TransformersProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3658253(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3658253
Zhang ZFan HChen HDudziak LLuk WDe V(2024)Hardware-Aware Neural Dropout Search for Reliable Uncertainty Prediction on FPGAProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656528(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3656528
Zhang ZGenci MFan HWetscherek ALuk W(2024)Accelerating MRI Uncertainty Estimation with Mask-Based Bayesian Neural Network2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00030(107-115)Online publication date: 24-Jul-2024
https://doi.org/10.1109/ASAP61560.2024.00030

Recommendations

FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Stencil computations comprise an important class of kernels in many scientific computing applications. As the diversity of both architectures and programming models grow, autotuning is emerging as a critical strategy for achieving portable performance ...
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
SOSP '24: Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles

This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key principle underlying the design of PowerInfer is exploiting the high locality ...
Heterogeneous parallel computing accelerated generalized likelihood uncertainty estimation (GLUE) method for fast hydrological model uncertainty analysis purpose
Abstract
The generalized likelihood uncertainty estimation (GLUE) is a famous and widely used sensitivity and uncertainty analysis method. It provides a new way to solve the “equifinality” problem encountered in the hydrological model parameter estimation. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

July 2022

1462 pages

ISBN:9781450391429

DOI:10.1145/3489517

General Chair:
Rob Oshana
NXP

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Engineering and Physical Sciences Research Council

Conference

DAC '22

Sponsor:

SIGDA

DAC '22: 59th ACM/IEEE Design Automation Conference

July 10 - 14, 2022

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
367
Total Downloads

Downloads (Last 12 months)100
Downloads (Last 6 weeks)8

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jha SJha SEwetz RVelasquez ADe V(2024)On the Design of Novel Attention Mechanism for Enhanced Efficiency of TransformersProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3658253(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3658253
Zhang ZFan HChen HDudziak LLuk WDe V(2024)Hardware-Aware Neural Dropout Search for Reliable Uncertainty Prediction on FPGAProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656528(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3656528
Zhang ZGenci MFan HWetscherek ALuk W(2024)Accelerating MRI Uncertainty Estimation with Mask-Based Bayesian Neural Network2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00030(107-115)Online publication date: 24-Jul-2024
https://doi.org/10.1109/ASAP61560.2024.00030

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten