skip to main content
10.1145/3583781.3590292acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
short-paper

Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units

Published: 05 June 2023 Publication History

Abstract

Layer-wise mixed-precision quantization (MPQ) has become prevailing for edge inference since it strikes a better balance between accuracy and efficiency compared to the uniform quantization scheme. Existing MPQ strategies either lacked hardware awareness or incurred huge computation costs, which gated their deployment at the edge. In this work, we propose a novel MPQ search algorithm that obtains an optimal scheme by "sampling" layer-wise sensitivity with respect to a newly proposed metric that incorporates both accuracy and proxy of hardware cost. To further efficiently deploy post-training MPQ on edge chips, we propose to tightly integrate the quantized inference units as part of the processor pipeline through micro-architecture and Instruction Set Architecture (ISA) co-design. Evaluation results show that the proposed search algorithm achieves 3% ~ 11% higher inference accuracy with similar hardware cost compared to the state-of-the-art MPQ strategies. In addition, the tightly integrated MPQ units achieve speedup of 15.13x ~ 29.65x compared to a baseline RISC-V processor.

References

[1]
Chaim Baskin, Natan Liss, Eli Schwartz, Evgenii Zheltonozhskii, Raja Giryes, Alex M. Bronstein, and Avi Mendelson. 2021. UNIQ: Uniform Noise Injection for Non-Uniform Qantization of Neural Networks. ACM TOCS 37, 1--4 (jun 2021). https://doi.org/10.1145/3444943 arXiv:1804.10969
[2]
Logan Beal, Daniel Hill, R Martin, and John Hedengren. 2018. GEKKO Optimiza- tion Suite. Processes 6, 8 (2018), 106. https://doi.org/10.3390/pr6080106
[3]
Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. ZeroQ: A novel zero shot quantization framework. Proceedings of the IEEE CVPR (2020), 13166--13175. https://doi.org/10.1109/CVPR42600.2020. 01318 arXiv:2001.00281
[4]
Chen Chen, Xiaoyan Xiang, Chang Liu, Yunhai Shang, Ren Guo, Dongqi Liu, Yimin Lu, Ziyi Hao, Jiahui Luo, Zhijian Chen, et al. 2020. Xuantie-910: A commer- cial multi-core 12-stage pipeline out-of-order 64-bit high performance RISC-V processor with vector extension: Industrial product. In ISCA. IEEE, 52--64.
[5]
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).
[6]
Zhen Dong, Zhewei Yao, Amir Gholami, Michael W Mahoney, and Kurt Keutzer. 2019. Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of ICCV. 293--302.
[7]
Ahmed T. Elthakeb, Prannoy Pilligundla, Fatemehsadat Mireshghallah, Amir Yazdanbakhsh, and Hadi Esmaeilzadeh. 2020. ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks. IEEE Micro 40, 5 (2020), 37--45.
[8]
Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2021. A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630 (2021).
[9]
Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020. Single Path One-Shot Neural Architecture Search with Uniform Sampling. Lecture Notes in Computer Science 12361 LNCS, 2017 (2020), 544--560. https://doi.org/10.1007/978-3-030-58517-4_32 arXiv:1904.00420
[10]
Hai Victor Habi, Roy H Jennings, and Arnon Netzer. 2020. Hmq: Hardware friendly mixed precision quantization block for cnns. In ECCV. Springer, 448--463.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE CVPR. 770--778.
[12]
John D Hedengren, Reza Asgharzadeh Shishavan, Kody M Powell, and Thomas F Edgar. 2014. Nonlinear modeling, estimation and predictive control in APMonitor. Computers & Chemical Engineering 70 (2014), 133--148.
[13]
Yimin Huang, Kai Chen, Zhuang Shao, Yichuan Bai, Yafeng Huang, Yuan Du, Li Du, and Zhongfeng Wang. 2021. LSMQ: A Layer-Wise Sensitivity-Based Mixed-Precision Quantization Method for Bit-Flexible CNN Accelerator. ISOCC (2021), 256--257. https://doi.org/10.1109/ISOCC53507.2021.9613969
[14]
Yimin Huang, Kai Chen, Zhuang Shao, Yichuan Bai, Yafeng Huang, Yuan Du, Li Du, and Zhongfeng Wang. 2021. LSMQ: A Layer-Wise Sensitivity-Based Mixed- Precision Quantization Method for Bit-Flexible CNN Accelerator. In 18th ISOCC. IEEE, 256--257.
[15]
Zhenhua Liu, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao. 2021. Evolutionary quantization of neural networks with mixed-precision. In ICASSP. IEEE, 2785--2789.
[16]
Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, and Carlo Luschi. 2022. 8-bit Numerical Formats for Deep Neural Networks. arXiv preprint arXiv:2206.02915 (2022).
[17]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. https://doi.org/ 10.1007/s11263-015-0816-y
[18]
Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, Yaowei Wang, Wen Ji, and Wenwu Zhu. 2022. Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance. arXiv preprint arXiv:2203.08368 (2022).
[19]
Mart van Baalen, Brian Kahne, Eric Mahurin, Andrey Kuzmin, Andrii Skliar, Markus Nagel, and Tijmen Blankevoort. 2022. Simulated Quantization, Real Power Savings. In Proceedings of the IEEE/CVF CVPR. 2757--2761.
[20]
Mart Van Baalen, Christos Louizos, Markus Nagel, Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, and Max Welling. 2020. Bayesian bits: Unifying quantization and pruning. Advances in neural information processing systems 33 (2020), 5741--5752.
[21]
Vaibhav Verma, Tommy Tracy II, and Mircea R Stan. 2022. EXTREM- EDGE-EXtensions To RISC-V for Energy-efficient ML inference at the EDGE of IoT. Sustainable Computing: Informatics and Systems 35 (2022), 100742.
[22]
Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. Haq: Hardware- aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF CVPR. 8612--8620.
[23]
Dingcheng Yang, Wenjian Yu, Haoyuan Mu, and Gary Yao. 2021. Dynamic programming assisted quantization approaches for compressing Normal and robust DNN models. In Proceedings of the 26th ASPDAC. 351--357.
[24]
Huanrui Yang, Lin Duan, Yiran Chen, and Hai Li. 2021. BSQ: Exploring bit-level sparsity for mixed-precision neural network quantization. arXiv preprint arXiv:2102.10462 (2021).
[25]
Linjie Yang and Qing Jin. 2021. Fracbits: Mixed precision quantization via frac- tional bit-widths. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10612--10620.
[26]
Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, et al. 2021. Hawq-v3: Dyadic neural network quantization. In International Conference on Machine Learning. PMLR, 11875--11886.
[27]
Haibao Yu, Qi Han, Jianbo Li, Jianping Shi, Guangliang Cheng, and Bin Fan. 2020. Search what you want: Barrier panelty NAS for mixed precision quantization. In ECCV. Springer, 1--16.
[28]
Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the ECCV. 365--382.

Cited By

View all
  • (2024)CIM²PQ: An Arraywise and Hardware-Friendly Mixed Precision Quantization Method for Analog Computing-In-MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335860943:7(2084-2097)Online publication date: Jul-2024
  • (2024)Edge-MPQ: Layer-Wise Mixed-Precision Quantization With Tightly Integrated Versatile Inference Units for Edge ComputingIEEE Transactions on Computers10.1109/TC.2024.344186073:11(2504-2519)Online publication date: Nov-2024
  • (2024)CINEMA: A Configurable Binary Segmentation Based Arithmetic Module for Mixed-Precision In-Memory Acceleration2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10557983(1-5)Online publication date: 19-May-2024

Index Terms

  1. Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023
        June 2023
        731 pages
        ISBN:9798400701252
        DOI:10.1145/3583781
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 05 June 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. edge ai
        2. mixed-precision quantization
        3. neural networks
        4. post-training quantization
        5. scalable architectures

        Qualifiers

        • Short-paper

        Funding Sources

        • UM-SJTU Startup Fund
        • SRC AI Hardware program
        • CRISP, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA
        • SJTU Explore-X Research Grant
        • CCF-Tencent Open Fund

        Conference

        GLSVLSI '23
        Sponsor:
        GLSVLSI '23: Great Lakes Symposium on VLSI 2023
        June 5 - 7, 2023
        TN, Knoxville, USA

        Acceptance Rates

        Overall Acceptance Rate 263 of 977 submissions, 27%

        Upcoming Conference

        GLSVLSI '25
        Great Lakes Symposium on VLSI 2025
        June 30 - July 2, 2025
        New Orleans , LA , USA

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)76
        • Downloads (Last 6 weeks)10
        Reflects downloads up to 15 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)CIM²PQ: An Arraywise and Hardware-Friendly Mixed Precision Quantization Method for Analog Computing-In-MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335860943:7(2084-2097)Online publication date: Jul-2024
        • (2024)Edge-MPQ: Layer-Wise Mixed-Precision Quantization With Tightly Integrated Versatile Inference Units for Edge ComputingIEEE Transactions on Computers10.1109/TC.2024.344186073:11(2504-2519)Online publication date: Nov-2024
        • (2024)CINEMA: A Configurable Binary Segmentation Based Arithmetic Module for Mixed-Precision In-Memory Acceleration2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10557983(1-5)Online publication date: 19-May-2024
        • (2024)ILD-MPQ: Learning-Free Mixed-Precision Quantization with Inter-Layer Dependency Awareness2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS)10.1109/AICAS59952.2024.10595945(512-516)Online publication date: 22-Apr-2024

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media