research-article

Open access

FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing

Authors:

Yang Hu,

Shouyi YinAuthors Info & Claims

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

Article No.: 230, Pages 1 - 6

https://doi.org/10.1145/3649329.3656502

Published: 07 November 2024 Publication History

PDF eReader

Abstract

With the continuous advancement of artificial intelligence, neural networks exhibit an escalating parameter size, demanding increased computational power and excessive memory access. Low bit-width quantization emerges as a viable solution to address this challenge. However, conventional low bit-width uniform quantization suffers from a mismatch with the weight and activation data distribution in neural networks, resulting in accuracy degradation.

We propose Fibonacci Quantization, which matches the distribution of weights and activations by using Fibonacci numbers. It achieves negligible accuracy loss for ResNet50 on ImageNet1k with both activations and weights quantized to 4-bit. Based on the Fibonacci Quantization, we present the Fibonacci Quantization Processor. It comprises two types of multiplication-free computing units: the Dualistic-Transformation Adder (DTA) and the Bit-Exclusive Adder (BEA), both capable of transforming the multiplication of Fibonacci numbers into simple addition. In addition, to effectively map multiplications of small and large Fibonacci numbers onto BEA and DTA, we propose Topological-Order Routing (TOR) that routes data either to the previous or current position. Our 4-bit Fibonacci quantization achieves a 0.98% higher accuracy compared with 4-bit uniform quantization for ResNet50 on ImageNet1k. For equivalent accuracy, our proposed processor outperforms uniform quantization with 2.17× higher energy efficiency.

References

[1]

Alexey Dosovitskiy, Lucas Beyer, et al. An image is worth 16×16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.

Google Scholar

[2]

Kaiming He, Xiangyu Zhang, et al. Identity mappings in deep residual networks. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part IV 14, pages 630--645. Springer, 2016.

Google Scholar

[3]

Tailin Liang, John Glossner, et al. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461:370--403, 2021.

Digital Library

Google Scholar

[4]

Ben Keller, Rangharajan Venkatesan, et al. A 95.6-tops/w deep learning inference accelerator with per-vector scaled 4-bit quantization in 5 nm. IEEE Journal of Solid-State Circuits, 58(4):1129--1141, 2023.

Crossref

Google Scholar

[5]

Wenhao Sun, Grace Li Zhang, et al. Class-based quantization for neural networks. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1--6, 2023.

Google Scholar

[6]

Jacob Devlin, Ming-Wei Chang, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, pages 4171--4186, Minneapolis, Minnesota, June 2019.

Google Scholar

[7]

Sijie Zhao, Tao Yue, and Xuemei Hu. Distribution-aware adaptive multi-bit quantization. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9277--9286, 2021.

Crossref

Google Scholar

[8]

Sugil Lee, Hyeonuk Sim, et al. Successive log quantization for cost-efficient neural networks using stochastic computing. In Proceedings of the 56th Annual Design Automation Conference 2019, DAC '19, New York, NY, USA, 2019. Association for Computing Machinery.

Digital Library

Google Scholar

[9]

Sebastian Vogel, Jannik Springer, et al. Self-supervised quantization of pretrained neural networks for multiplierless acceleration. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1094--1099, 2019.

Google Scholar

[10]

Chien-Hung Lin, Chih-Chung Cheng, et al. 7.1 a 3.4-to-13.3tops/w 3.6tops dual-core deep-learning accelerator for versatile ai applications in 7nm 5g smartphone soc. In 2020 IEEE International Solid-State Circuits Conference - (ISSCC), pages 134--136, 2020.

Crossref

Google Scholar

[11]

Huiyu Mo, Wenping Zhu, et al. 9.2 a 28nm 12.1tops/w dual-mode cnn processor using effective-weight-based convolution and error-compensation-based prediction. In 2021 IEEE International Solid-State Circuits Conference (ISSCC), volume 64, pages 146--148, 2021.

Crossref

Google Scholar

[12]

Ankur Agrawal, Sae Kyu Lee, et al. 9.1 a 7nm 4-core ai chip with 25.6tflops hybrid fp8 training, 102.4tops int4 inference and workload-aware throttling. In 2021 IEEE International Solid-State Circuits Conference (ISSCC), volume 64, pages 144--146, 2021.

Crossref

Google Scholar

Index Terms

FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing

Index terms have been assigned to the content through auto-classification.

Recommendations

Adaptive quantization with balanced distortion distribution and its application to H.264 intra coding
ICIP'09: Proceedings of the 16th IEEE international conference on Image processing

Quantization in H.264 is achieved in the DCT domain using scalar quantizers, which assume a sum distortion constraint and often produce considerably larger distortions on block boundaries than inside a block in the pixel domain. This biased distortion ...
On the Operational Rate-Distortion Performance of Uniform Scalar Quantization-Based Wyner–Ziv Coding of Laplace–Markov Sources

Wyner-Ziv (WZ) coding has recently been proposed as a low encoding complexity alternative to traditional DPCM coding for compression of sources with memory, in particular, in applications like multimedia compression. The viability of this alternative ...
Near lossless coding of sparse histogram images based on zero-skip quantization

This paper introduces a zero-skip quantization (ZS.Q) scheme for the near lossless coding of sparse histogram images. Increases in the range of pixel values and various tone mapping operations on those pixel values mean that the histogram bins often ...

Comments

Information & Contributors

Information

Published In

DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference

June 2024

2159 pages

ISBN:9798400706011

DOI:10.1145/3649329

Chair:
Vivek De

This work is licensed under a Creative Commons Attribution International 4.0 License.

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science and Technology Major Project
NSFC
Beijing S&T Project
2022 Special Project on Industrial Foundation Reconstruction and High Quality Development of Manufacturing Industry
Beijing National Research Center For Information Science And Technology

Conference

DAC '24

Sponsor:

SIGDA

DAC '24: 61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
126
Total Downloads

Downloads (Last 12 months)126
Downloads (Last 6 weeks)50

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Adaptive quantization with balanced distortion distribution and its application to H.264 intra coding

On the Operational Rate-Distortion Performance of Uniform Scalar Quantization-Based Wyner–Ziv Coding of Laplace–Markov Sources

Near lossless coding of sparse histogram images based on zero-skip quantization

Comments

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

PDF

eReader

Login options

Full Access

Abstract

References

Index Terms

Recommendations

Adaptive quantization with balanced distortion distribution and its application to H.264 intra coding

On the Operational Rate-Distortion Performance of Uniform Scalar Quantization-Based Wyner–Ziv Coding of Laplace–Markov Sources

Near lossless coding of sparse histogram images based on zero-skip quantization

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations