skip to main content
10.1145/3649329.3656502acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article
Open access

FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing

Published: 07 November 2024 Publication History

Abstract

With the continuous advancement of artificial intelligence, neural networks exhibit an escalating parameter size, demanding increased computational power and excessive memory access. Low bit-width quantization emerges as a viable solution to address this challenge. However, conventional low bit-width uniform quantization suffers from a mismatch with the weight and activation data distribution in neural networks, resulting in accuracy degradation.
We propose Fibonacci Quantization, which matches the distribution of weights and activations by using Fibonacci numbers. It achieves negligible accuracy loss for ResNet50 on ImageNet1k with both activations and weights quantized to 4-bit. Based on the Fibonacci Quantization, we present the Fibonacci Quantization Processor. It comprises two types of multiplication-free computing units: the Dualistic-Transformation Adder (DTA) and the Bit-Exclusive Adder (BEA), both capable of transforming the multiplication of Fibonacci numbers into simple addition. In addition, to effectively map multiplications of small and large Fibonacci numbers onto BEA and DTA, we propose Topological-Order Routing (TOR) that routes data either to the previous or current position. Our 4-bit Fibonacci quantization achieves a 0.98% higher accuracy compared with 4-bit uniform quantization for ResNet50 on ImageNet1k. For equivalent accuracy, our proposed processor outperforms uniform quantization with 2.17× higher energy efficiency.

References

[1]
Alexey Dosovitskiy, Lucas Beyer, et al. An image is worth 16×16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.
[2]
Kaiming He, Xiangyu Zhang, et al. Identity mappings in deep residual networks. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part IV 14, pages 630--645. Springer, 2016.
[3]
Tailin Liang, John Glossner, et al. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461:370--403, 2021.
[4]
Ben Keller, Rangharajan Venkatesan, et al. A 95.6-tops/w deep learning inference accelerator with per-vector scaled 4-bit quantization in 5 nm. IEEE Journal of Solid-State Circuits, 58(4):1129--1141, 2023.
[5]
Wenhao Sun, Grace Li Zhang, et al. Class-based quantization for neural networks. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1--6, 2023.
[6]
Jacob Devlin, Ming-Wei Chang, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, pages 4171--4186, Minneapolis, Minnesota, June 2019.
[7]
Sijie Zhao, Tao Yue, and Xuemei Hu. Distribution-aware adaptive multi-bit quantization. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9277--9286, 2021.
[8]
Sugil Lee, Hyeonuk Sim, et al. Successive log quantization for cost-efficient neural networks using stochastic computing. In Proceedings of the 56th Annual Design Automation Conference 2019, DAC '19, New York, NY, USA, 2019. Association for Computing Machinery.
[9]
Sebastian Vogel, Jannik Springer, et al. Self-supervised quantization of pretrained neural networks for multiplierless acceleration. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1094--1099, 2019.
[10]
Chien-Hung Lin, Chih-Chung Cheng, et al. 7.1 a 3.4-to-13.3tops/w 3.6tops dual-core deep-learning accelerator for versatile ai applications in 7nm 5g smartphone soc. In 2020 IEEE International Solid-State Circuits Conference - (ISSCC), pages 134--136, 2020.
[11]
Huiyu Mo, Wenping Zhu, et al. 9.2 a 28nm 12.1tops/w dual-mode cnn processor using effective-weight-based convolution and error-compensation-based prediction. In 2021 IEEE International Solid-State Circuits Conference (ISSCC), volume 64, pages 146--148, 2021.
[12]
Ankur Agrawal, Sae Kyu Lee, et al. 9.1 a 7nm 4-core ai chip with 25.6tflops hybrid fp8 training, 102.4tops int4 inference and workload-aware throttling. In 2021 IEEE International Solid-State Circuits Conference (ISSCC), volume 64, pages 144--146, 2021.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference
June 2024
2159 pages
ISBN:9798400706011
DOI:10.1145/3649329
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

  1. quantization
  2. fibonacci numbers
  3. DNN accelerator
  4. BERT
  5. ResNet

Qualifiers

  • Research-article

Funding Sources

Conference

DAC '24
Sponsor:
DAC '24: 61st ACM/IEEE Design Automation Conference
June 23 - 27, 2024
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 126
    Total Downloads
  • Downloads (Last 12 months)126
  • Downloads (Last 6 weeks)50
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media