skip to main content
10.1145/3489517.3530451acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections

Enabling fast uncertainty estimation: accelerating bayesian transformers via algorithmic and hardware optimizations

Published: 23 August 2022 Publication History


Quantifying the uncertainty of neural networks (NNs) has been required by many safety-critical applications such as autonomous driving or medical diagnosis. Recently, Bayesian transformers have demonstrated their capabilities in providing high-quality uncertainty estimates paired with excellent accuracy. However, their real-time deployment is limited by the compute-intensive attention mechanism that is core to the transformer architecture, and the repeated Monte Carlo sampling to quantify the predictive uncertainty. To address these limitations, this paper accelerates Bayesian transformers via both algorithmic and hardware optimizations. On the algorithmic level, an evolutionary algorithm (EA)-based framework is proposed to exploit the sparsity in Bayesian transformers and ease their computational workload. On the hardware level, we demonstrate that the sparsity brings hardware performance improvement on our optimized CPU and GPU implementations. An adaptable hardware architecture is also proposed to accelerate Bayesian transformers on an FPGA. Extensive experiments demonstrate that the EA-based framework, together with hardware optimizations, reduce the latency of Bayesian transformers by up to 13, 12 and 20 times on CPU, GPU and FPGA platforms respectively, while achieving higher algorithmic performance.


H. Awano and M. Hashimoto. 2020. BYNQNet: Bayesian Neural Network with Quadratic Activations for Sampling-Free Uncertainty Estimation on FPGA. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 1402--1407.
Ruizhe Cai et al. 2018. VIBNN: Hardware acceleration of Bayesian neural networks. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Vol. 53. 476--488.
Kevin Clark et al. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).
Jacob Devlin et al. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. CoRR (2018).
Hongxiang Fan et al. 2019. Static block floating-point quantization for convolutional neural networks on fpga. In International Conference on Field-Programmable Technology (ICFPT). IEEE, 28--35.
Hongxiang Fan et al. 2021. High-Performance FPGA-based Accelerator for Bayesian Neural Networks. In Proceedings of the 2021 ACM/IEEE Design Automation Conference (DAC). IEEE, 1--6.
Jeremy Fowers et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1--14.
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (ICML). 1050--1059.
Tae Jun Ham et al. 2020. A^ 3: Accelerating Attention Mechanisms in Neural Networks with Approximation. In IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 328--341.
Ranganath Krishnan et al. 2020. Specifying weight priors in bayesian deep neural networks with empirical bayes. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4477--4484.
Christian Leibig et al. 2017. Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports 7, 1 (2017), 1--14.
Bingbing Li et al. 2020. FTRANS: energy-efficient acceleration of transformers using FPGA. In ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED). 175--180.
Yinhan Liu et al. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
Siyuan Lu et al. 2020. Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer. arXiv preprint arXiv:2009.08605 (2020).
Rowan McAllister et al. 2017. Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI),. 4745--4753.
Adam Paszke et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Proceedings of the 2019 Advances in neural information processing systems (NeurIPS) 32 (2019), 8026--8037.
Aditya Prakash, Kashyap Chitta, and Andreas Geiger. 2021. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7077--7087.
Alec Radford et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
Colin Raffel et al. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).
Johanna Rock, Tiago Azevedo, René de Jong, Daniel Ruiz-Muñoz, and Partha Maji. 2021. On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications. arXiv preprint arXiv:2111.09838 (2021).
Victor Sanh et al. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR (2019).
Artem Shelmanov et al. 2021. How Certain is Your Transformer?. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL). 1833--1840.
Zhourui Song et al. 2018. Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
Nitish Srivastava et al. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.
Evgenii Tsymbalov et al. 2020. Dropout Strikes Back: Improved Uncertainty Estimation via Diversity Sampling. arXiv preprint arXiv:2003.03274 (2020).
Ashish Vaswani et al. 2017. Attention is all you need. In Advances in neural information processing systems (NeurIPS). 5998--6008.
Qiyu Wan and Xin Fu. 2020. Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks. In Proceedings of the 2020 Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 229--240.
Alex Wang et al. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).
Hanrui Wang et al. 2021. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. IEEE International Symposium on High Performance Computer Architecture (HPCA) (2021).
Thomas Wolf et al. 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
Bichen Wu et al. 2020. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 (2020).
Boyang Xue et al. 2021. Bayesian transformer language models for speech recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7378--7382.
Zhilu Zhang, Adrian V Dalca, and Mert R Sabuncu. 2019. Confidence calibration for convolutional neural networks using structured dropout. arXiv preprint arXiv:1906.09551 (2019).
Wangchunshu Zhou et al. 2020. Scheduled drophead: A regularization method for transformer models. arXiv preprint arXiv:2004.13342 (2020).

Cited By

View all
  • (2024)On the Design of Novel Attention Mechanism for Enhanced Efficiency of TransformersProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3658253(1-6)Online publication date: 23-Jun-2024
  • (2024)Hardware-Aware Neural Dropout Search for Reliable Uncertainty Prediction on FPGAProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656528(1-6)Online publication date: 23-Jun-2024
  • (2024)Accelerating MRI Uncertainty Estimation with Mask-Based Bayesian Neural Network2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00030(107-115)Online publication date: 24-Jul-2024



Information & Contributors


Published In

cover image ACM Conferences
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
July 2022
1462 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022


Request permissions for this article.

Check for updates

Author Tags

  1. adaptable micro-architecture
  2. bayesian transformer
  3. sparsity


  • Research-article

Funding Sources


DAC '22
DAC '22: 59th ACM/IEEE Design Automation Conference
July 10 - 14, 2022
California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)100
  • Downloads (Last 6 weeks)8
Reflects downloads up to 13 Feb 2025

Other Metrics


Cited By

View all
  • (2024)On the Design of Novel Attention Mechanism for Enhanced Efficiency of TransformersProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3658253(1-6)Online publication date: 23-Jun-2024
  • (2024)Hardware-Aware Neural Dropout Search for Reliable Uncertainty Prediction on FPGAProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656528(1-6)Online publication date: 23-Jun-2024
  • (2024)Accelerating MRI Uncertainty Estimation with Mask-Based Bayesian Neural Network2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP61560.2024.00030(107-115)Online publication date: 24-Jul-2024

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media