short-paper

Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units

Authors:

Xinfei GuoAuthors Info & Claims

GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023

Pages 467 - 471

https://doi.org/10.1145/3583781.3590292

Published: 05 June 2023 Publication History

Abstract

Layer-wise mixed-precision quantization (MPQ) has become prevailing for edge inference since it strikes a better balance between accuracy and efficiency compared to the uniform quantization scheme. Existing MPQ strategies either lacked hardware awareness or incurred huge computation costs, which gated their deployment at the edge. In this work, we propose a novel MPQ search algorithm that obtains an optimal scheme by "sampling" layer-wise sensitivity with respect to a newly proposed metric that incorporates both accuracy and proxy of hardware cost. To further efficiently deploy post-training MPQ on edge chips, we propose to tightly integrate the quantized inference units as part of the processor pipeline through micro-architecture and Instruction Set Architecture (ISA) co-design. Evaluation results show that the proposed search algorithm achieves 3% ~ 11% higher inference accuracy with similar hardware cost compared to the state-of-the-art MPQ strategies. In addition, the tightly integrated MPQ units achieve speedup of 15.13x ~ 29.65x compared to a baseline RISC-V processor.

References

[1]

Chaim Baskin, Natan Liss, Eli Schwartz, Evgenii Zheltonozhskii, Raja Giryes, Alex M. Bronstein, and Avi Mendelson. 2021. UNIQ: Uniform Noise Injection for Non-Uniform Qantization of Neural Networks. ACM TOCS 37, 1--4 (jun 2021). https://doi.org/10.1145/3444943 arXiv:1804.10969

Digital Library

[2]

Logan Beal, Daniel Hill, R Martin, and John Hedengren. 2018. GEKKO Optimiza- tion Suite. Processes 6, 8 (2018), 106. https://doi.org/10.3390/pr6080106

[3]

Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. ZeroQ: A novel zero shot quantization framework. Proceedings of the IEEE CVPR (2020), 13166--13175. https://doi.org/10.1109/CVPR42600.2020. 01318 arXiv:2001.00281

[4]

Chen Chen, Xiaoyan Xiang, Chang Liu, Yunhai Shang, Ren Guo, Dongqi Liu, Yimin Lu, Ziyi Hao, Jiahui Luo, Zhijian Chen, et al. 2020. Xuantie-910: A commer- cial multi-core 12-stage pipeline out-of-order 64-bit high performance RISC-V processor with vector extension: Industrial product. In ISCA. IEEE, 52--64.

[5]

Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).

[6]

Zhen Dong, Zhewei Yao, Amir Gholami, Michael W Mahoney, and Kurt Keutzer. 2019. Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of ICCV. 293--302.

[7]

Ahmed T. Elthakeb, Prannoy Pilligundla, Fatemehsadat Mireshghallah, Amir Yazdanbakhsh, and Hadi Esmaeilzadeh. 2020. ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks. IEEE Micro 40, 5 (2020), 37--45.

Digital Library

[8]

Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2021. A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630 (2021).

[9]

Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020. Single Path One-Shot Neural Architecture Search with Uniform Sampling. Lecture Notes in Computer Science 12361 LNCS, 2017 (2020), 544--560. https://doi.org/10.1007/978-3-030-58517-4_32 arXiv:1904.00420

Digital Library

[10]

Hai Victor Habi, Roy H Jennings, and Arnon Netzer. 2020. Hmq: Hardware friendly mixed precision quantization block for cnns. In ECCV. Springer, 448--463.

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE CVPR. 770--778.

[12]

John D Hedengren, Reza Asgharzadeh Shishavan, Kody M Powell, and Thomas F Edgar. 2014. Nonlinear modeling, estimation and predictive control in APMonitor. Computers & Chemical Engineering 70 (2014), 133--148.

[13]

Yimin Huang, Kai Chen, Zhuang Shao, Yichuan Bai, Yafeng Huang, Yuan Du, Li Du, and Zhongfeng Wang. 2021. LSMQ: A Layer-Wise Sensitivity-Based Mixed-Precision Quantization Method for Bit-Flexible CNN Accelerator. ISOCC (2021), 256--257. https://doi.org/10.1109/ISOCC53507.2021.9613969

[14]

Yimin Huang, Kai Chen, Zhuang Shao, Yichuan Bai, Yafeng Huang, Yuan Du, Li Du, and Zhongfeng Wang. 2021. LSMQ: A Layer-Wise Sensitivity-Based Mixed- Precision Quantization Method for Bit-Flexible CNN Accelerator. In 18th ISOCC. IEEE, 256--257.

[15]

Zhenhua Liu, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao. 2021. Evolutionary quantization of neural networks with mixed-precision. In ICASSP. IEEE, 2785--2789.

[16]

Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, and Carlo Luschi. 2022. 8-bit Numerical Formats for Deep Neural Networks. arXiv preprint arXiv:2206.02915 (2022).

[17]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. https://doi.org/ 10.1007/s11263-015-0816-y

Digital Library

[18]

Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, Yaowei Wang, Wen Ji, and Wenwu Zhu. 2022. Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance. arXiv preprint arXiv:2203.08368 (2022).

[19]

Mart van Baalen, Brian Kahne, Eric Mahurin, Andrey Kuzmin, Andrii Skliar, Markus Nagel, and Tijmen Blankevoort. 2022. Simulated Quantization, Real Power Savings. In Proceedings of the IEEE/CVF CVPR. 2757--2761.

[20]

Mart Van Baalen, Christos Louizos, Markus Nagel, Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, and Max Welling. 2020. Bayesian bits: Unifying quantization and pruning. Advances in neural information processing systems 33 (2020), 5741--5752.

[21]

Vaibhav Verma, Tommy Tracy II, and Mircea R Stan. 2022. EXTREM- EDGE-EXtensions To RISC-V for Energy-efficient ML inference at the EDGE of IoT. Sustainable Computing: Informatics and Systems 35 (2022), 100742.

[22]

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. Haq: Hardware- aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF CVPR. 8612--8620.

[23]

Dingcheng Yang, Wenjian Yu, Haoyuan Mu, and Gary Yao. 2021. Dynamic programming assisted quantization approaches for compressing Normal and robust DNN models. In Proceedings of the 26th ASPDAC. 351--357.

Digital Library

[24]

Huanrui Yang, Lin Duan, Yiran Chen, and Hai Li. 2021. BSQ: Exploring bit-level sparsity for mixed-precision neural network quantization. arXiv preprint arXiv:2102.10462 (2021).

[25]

Linjie Yang and Qing Jin. 2021. Fracbits: Mixed precision quantization via frac- tional bit-widths. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10612--10620.

[26]

Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, et al. 2021. Hawq-v3: Dyadic neural network quantization. In International Conference on Machine Learning. PMLR, 11875--11886.

[27]

Haibao Yu, Qi Han, Jianbo Li, Jianping Shi, Guangliang Cheng, and Bin Fan. 2020. Search what you want: Barrier panelty NAS for mixed precision quantization. In ECCV. Springer, 1--16.

[28]

Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the ECCV. 365--382.

Digital Library

Cited By

Sun SBai JShi ZZhao WKang W(2024)CIM²PQ: An Arraywise and Hardware-Friendly Mixed Precision Quantization Method for Analog Computing-In-MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335860943:7(2084-2097)Online publication date: Jul-2024
https://doi.org/10.1109/TCAD.2024.3358609
Zhao XXu RGao YVerma VStan MGuo X(2024)Edge-MPQ: Layer-Wise Mixed-Precision Quantization With Tightly Integrated Versatile Inference Units for Edge ComputingIEEE Transactions on Computers10.1109/TC.2024.344186073:11(2504-2519)Online publication date: Nov-2024
https://doi.org/10.1109/TC.2024.3441860
Wang RXu RZhao XJiang KGuo X(2024)CINEMA: A Configurable Binary Segmentation Based Arithmetic Module for Mixed-Precision In-Memory Acceleration2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10557983(1-5)Online publication date: 19-May-2024
https://doi.org/10.1109/ISCAS58744.2024.10557983

Index Terms

Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units
1. Computing methodologies
  1. Machine learning
2. Hardware
  1. Electronic design automation
  2. Very large scale integration design
    1. Application-specific VLSI designs
      1. Application specific processors

Recommendations

FILM-QNN: Efficient FPGA Acceleration of Deep Neural Networks with Intra-Layer, Mixed-Precision Quantization
FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

With the trend to deploy Deep Neural Network (DNN) inference models on edge devices with limited resources, quantization techniques have been widely used to reduce on-chip storage and improve computation throughput. However, existing DNN quantization ...
Edge-MPQ: Layer-Wise Mixed-Precision Quantization With Tightly Integrated Versatile Inference Units for Edge Computing
As one of the prevailing deep neural networks compression techniques, layer-wise mixed-precision quantization (MPQ) strikes a better balance between accuracy and efficiency than uniform quantization schemes. However, existing MPQ strategies either lack ...
Mixed-Precision Collaborative Quantization for Fast Object Tracking
Advances in Brain Inspired Cognitive Systems
Abstract
To address the non-differentiability of quantizers and inaccurate gradient propagation in training low-bit quantized tracking models, we propose a mixed-precision collaborative quantization method for fast object tracking that combines a full-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023

June 2023

731 pages

ISBN:9798400701252

DOI:10.1145/3583781

General Chairs:
Himanshu Thapliyal
University of Tennessee, Knoxville, USA
,
Ronald DeMara
University of Central Florida, USA
,
Program Chairs:
Inna Partin-Vaisband
University of Illinois Chicago, USA
,
Srinivas Katkoori
University of South Florida, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

UM-SJTU Startup Fund
SRC AI Hardware program
CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA
SJTU Explore-X Research Grant
CCF-Tencent Open Fund

Conference

GLSVLSI '23

Sponsor:

SIGDA

GLSVLSI '23: Great Lakes Symposium on VLSI 2023

June 5 - 7, 2023

TN, Knoxville, USA

Acceptance Rates

Overall Acceptance Rate 263 of 977 submissions, 27%

Upcoming Conference

GLSVLSI '25

Sponsor:
sigda

Great Lakes Symposium on VLSI 2025

June 30 - July 2, 2025

New Orleans , LA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
196
Total Downloads

Downloads (Last 12 months)76
Downloads (Last 6 weeks)10

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sun SBai JShi ZZhao WKang W(2024)CIM²PQ: An Arraywise and Hardware-Friendly Mixed Precision Quantization Method for Analog Computing-In-MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335860943:7(2084-2097)Online publication date: Jul-2024
https://doi.org/10.1109/TCAD.2024.3358609
Zhao XXu RGao YVerma VStan MGuo X(2024)Edge-MPQ: Layer-Wise Mixed-Precision Quantization With Tightly Integrated Versatile Inference Units for Edge ComputingIEEE Transactions on Computers10.1109/TC.2024.344186073:11(2504-2519)Online publication date: Nov-2024
https://doi.org/10.1109/TC.2024.3441860
Wang RXu RZhao XJiang KGuo X(2024)CINEMA: A Configurable Binary Segmentation Based Arithmetic Module for Mixed-Precision In-Memory Acceleration2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10557983(1-5)Online publication date: 19-May-2024
https://doi.org/10.1109/ISCAS58744.2024.10557983
Xu RDuan QChen QGuo X(2024)ILD-MPQ: Learning-Free Mixed-Precision Quantization with Inter-Layer Dependency Awareness2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS)10.1109/AICAS59952.2024.10595945(512-516)Online publication date: 22-Apr-2024
https://doi.org/10.1109/AICAS59952.2024.10595945

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten