skip to main content
10.1145/3489517.3530451acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Enabling fast uncertainty estimation: accelerating bayesian transformers via algorithmic and hardware optimizations

Published:23 August 2022Publication History

ABSTRACT

Quantifying the uncertainty of neural networks (NNs) has been required by many safety-critical applications such as autonomous driving or medical diagnosis. Recently, Bayesian transformers have demonstrated their capabilities in providing high-quality uncertainty estimates paired with excellent accuracy. However, their real-time deployment is limited by the compute-intensive attention mechanism that is core to the transformer architecture, and the repeated Monte Carlo sampling to quantify the predictive uncertainty. To address these limitations, this paper accelerates Bayesian transformers via both algorithmic and hardware optimizations. On the algorithmic level, an evolutionary algorithm (EA)-based framework is proposed to exploit the sparsity in Bayesian transformers and ease their computational workload. On the hardware level, we demonstrate that the sparsity brings hardware performance improvement on our optimized CPU and GPU implementations. An adaptable hardware architecture is also proposed to accelerate Bayesian transformers on an FPGA. Extensive experiments demonstrate that the EA-based framework, together with hardware optimizations, reduce the latency of Bayesian transformers by up to 13, 12 and 20 times on CPU, GPU and FPGA platforms respectively, while achieving higher algorithmic performance.

References

  1. H. Awano and M. Hashimoto. 2020. BYNQNet: Bayesian Neural Network with Quadratic Activations for Sampling-Free Uncertainty Estimation on FPGA. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 1402--1407.Google ScholarGoogle Scholar
  2. Ruizhe Cai et al. 2018. VIBNN: Hardware acceleration of Bayesian neural networks. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Vol. 53. 476--488.Google ScholarGoogle Scholar
  3. Kevin Clark et al. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).Google ScholarGoogle Scholar
  4. Jacob Devlin et al. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. CoRR (2018).Google ScholarGoogle Scholar
  5. Hongxiang Fan et al. 2019. Static block floating-point quantization for convolutional neural networks on fpga. In International Conference on Field-Programmable Technology (ICFPT). IEEE, 28--35.Google ScholarGoogle Scholar
  6. Hongxiang Fan et al. 2021. High-Performance FPGA-based Accelerator for Bayesian Neural Networks. In Proceedings of the 2021 ACM/IEEE Design Automation Conference (DAC). IEEE, 1--6.Google ScholarGoogle Scholar
  7. Jeremy Fowers et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1--14.Google ScholarGoogle Scholar
  8. Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (ICML). 1050--1059.Google ScholarGoogle Scholar
  9. Tae Jun Ham et al. 2020. A^ 3: Accelerating Attention Mechanisms in Neural Networks with Approximation. In IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 328--341.Google ScholarGoogle Scholar
  10. Ranganath Krishnan et al. 2020. Specifying weight priors in bayesian deep neural networks with empirical bayes. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4477--4484.Google ScholarGoogle Scholar
  11. Christian Leibig et al. 2017. Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports 7, 1 (2017), 1--14.Google ScholarGoogle Scholar
  12. Bingbing Li et al. 2020. FTRANS: energy-efficient acceleration of transformers using FPGA. In ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED). 175--180.Google ScholarGoogle Scholar
  13. Yinhan Liu et al. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google ScholarGoogle Scholar
  14. Siyuan Lu et al. 2020. Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer. arXiv preprint arXiv:2009.08605 (2020).Google ScholarGoogle Scholar
  15. Rowan McAllister et al. 2017. Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI),. 4745--4753.Google ScholarGoogle Scholar
  16. Adam Paszke et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Proceedings of the 2019 Advances in neural information processing systems (NeurIPS) 32 (2019), 8026--8037.Google ScholarGoogle Scholar
  17. Aditya Prakash, Kashyap Chitta, and Andreas Geiger. 2021. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7077--7087.Google ScholarGoogle ScholarCross RefCross Ref
  18. Alec Radford et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  19. Colin Raffel et al. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019).Google ScholarGoogle Scholar
  20. Johanna Rock, Tiago Azevedo, René de Jong, Daniel Ruiz-Muñoz, and Partha Maji. 2021. On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications. arXiv preprint arXiv:2111.09838 (2021).Google ScholarGoogle Scholar
  21. Victor Sanh et al. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR (2019).Google ScholarGoogle Scholar
  22. Artem Shelmanov et al. 2021. How Certain is Your Transformer?. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL). 1833--1840.Google ScholarGoogle Scholar
  23. Zhourui Song et al. 2018. Computation error analysis of block floating point arithmetic oriented convolution neural network accelerator design. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle Scholar
  24. Nitish Srivastava et al. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.Google ScholarGoogle Scholar
  25. Evgenii Tsymbalov et al. 2020. Dropout Strikes Back: Improved Uncertainty Estimation via Diversity Sampling. arXiv preprint arXiv:2003.03274 (2020).Google ScholarGoogle Scholar
  26. Ashish Vaswani et al. 2017. Attention is all you need. In Advances in neural information processing systems (NeurIPS). 5998--6008.Google ScholarGoogle Scholar
  27. Qiyu Wan and Xin Fu. 2020. Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks. In Proceedings of the 2020 Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 229--240.Google ScholarGoogle ScholarCross RefCross Ref
  28. Alex Wang et al. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).Google ScholarGoogle Scholar
  29. Hanrui Wang et al. 2021. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. IEEE International Symposium on High Performance Computer Architecture (HPCA) (2021).Google ScholarGoogle Scholar
  30. Thomas Wolf et al. 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).Google ScholarGoogle Scholar
  31. Bichen Wu et al. 2020. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 (2020).Google ScholarGoogle Scholar
  32. Boyang Xue et al. 2021. Bayesian transformer language models for speech recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7378--7382.Google ScholarGoogle Scholar
  33. Zhilu Zhang, Adrian V Dalca, and Mert R Sabuncu. 2019. Confidence calibration for convolutional neural networks using structured dropout. arXiv preprint arXiv:1906.09551 (2019).Google ScholarGoogle Scholar
  34. Wangchunshu Zhou et al. 2020. Scheduled drophead: A regularization method for transformer models. arXiv preprint arXiv:2004.13342 (2020).Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
    July 2022
    1462 pages
    ISBN:9781450391429
    DOI:10.1145/3489517

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 23 August 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate1,770of5,499submissions,32%

    Upcoming Conference

    DAC '24
    61st ACM/IEEE Design Automation Conference
    June 23 - 27, 2024
    San Francisco , CA , USA
  • Article Metrics

    • Downloads (Last 12 months)123
    • Downloads (Last 6 weeks)9

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader