skip to main content
10.1145/3583781.3590292acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
short-paper

Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units

Published:05 June 2023Publication History

ABSTRACT

Layer-wise mixed-precision quantization (MPQ) has become prevailing for edge inference since it strikes a better balance between accuracy and efficiency compared to the uniform quantization scheme. Existing MPQ strategies either lacked hardware awareness or incurred huge computation costs, which gated their deployment at the edge. In this work, we propose a novel MPQ search algorithm that obtains an optimal scheme by "sampling" layer-wise sensitivity with respect to a newly proposed metric that incorporates both accuracy and proxy of hardware cost. To further efficiently deploy post-training MPQ on edge chips, we propose to tightly integrate the quantized inference units as part of the processor pipeline through micro-architecture and Instruction Set Architecture (ISA) co-design. Evaluation results show that the proposed search algorithm achieves 3% ~ 11% higher inference accuracy with similar hardware cost compared to the state-of-the-art MPQ strategies. In addition, the tightly integrated MPQ units achieve speedup of 15.13x ~ 29.65x compared to a baseline RISC-V processor.

References

  1. Chaim Baskin, Natan Liss, Eli Schwartz, Evgenii Zheltonozhskii, Raja Giryes, Alex M. Bronstein, and Avi Mendelson. 2021. UNIQ: Uniform Noise Injection for Non-Uniform Qantization of Neural Networks. ACM TOCS 37, 1--4 (jun 2021). https://doi.org/10.1145/3444943 arXiv:1804.10969Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Logan Beal, Daniel Hill, R Martin, and John Hedengren. 2018. GEKKO Optimiza- tion Suite. Processes 6, 8 (2018), 106. https://doi.org/10.3390/pr6080106Google ScholarGoogle ScholarCross RefCross Ref
  3. Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. ZeroQ: A novel zero shot quantization framework. Proceedings of the IEEE CVPR (2020), 13166--13175. https://doi.org/10.1109/CVPR42600.2020. 01318 arXiv:2001.00281Google ScholarGoogle ScholarCross RefCross Ref
  4. Chen Chen, Xiaoyan Xiang, Chang Liu, Yunhai Shang, Ren Guo, Dongqi Liu, Yimin Lu, Ziyi Hao, Jiahui Luo, Zhijian Chen, et al. 2020. Xuantie-910: A commer- cial multi-core 12-stage pipeline out-of-order 64-bit high performance RISC-V processor with vector extension: Industrial product. In ISCA. IEEE, 52--64.Google ScholarGoogle Scholar
  5. Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).Google ScholarGoogle Scholar
  6. Zhen Dong, Zhewei Yao, Amir Gholami, Michael W Mahoney, and Kurt Keutzer. 2019. Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of ICCV. 293--302.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ahmed T. Elthakeb, Prannoy Pilligundla, Fatemehsadat Mireshghallah, Amir Yazdanbakhsh, and Hadi Esmaeilzadeh. 2020. ReLeQ : A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks. IEEE Micro 40, 5 (2020), 37--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2021. A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630 (2021).Google ScholarGoogle Scholar
  9. Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020. Single Path One-Shot Neural Architecture Search with Uniform Sampling. Lecture Notes in Computer Science 12361 LNCS, 2017 (2020), 544--560. https://doi.org/10.1007/978-3-030-58517-4_32 arXiv:1904.00420Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hai Victor Habi, Roy H Jennings, and Arnon Netzer. 2020. Hmq: Hardware friendly mixed precision quantization block for cnns. In ECCV. Springer, 448--463.Google ScholarGoogle Scholar
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE CVPR. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  12. John D Hedengren, Reza Asgharzadeh Shishavan, Kody M Powell, and Thomas F Edgar. 2014. Nonlinear modeling, estimation and predictive control in APMonitor. Computers & Chemical Engineering 70 (2014), 133--148.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yimin Huang, Kai Chen, Zhuang Shao, Yichuan Bai, Yafeng Huang, Yuan Du, Li Du, and Zhongfeng Wang. 2021. LSMQ: A Layer-Wise Sensitivity-Based Mixed-Precision Quantization Method for Bit-Flexible CNN Accelerator. ISOCC (2021), 256--257. https://doi.org/10.1109/ISOCC53507.2021.9613969Google ScholarGoogle Scholar
  14. Yimin Huang, Kai Chen, Zhuang Shao, Yichuan Bai, Yafeng Huang, Yuan Du, Li Du, and Zhongfeng Wang. 2021. LSMQ: A Layer-Wise Sensitivity-Based Mixed- Precision Quantization Method for Bit-Flexible CNN Accelerator. In 18th ISOCC. IEEE, 256--257.Google ScholarGoogle Scholar
  15. Zhenhua Liu, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao. 2021. Evolutionary quantization of neural networks with mixed-precision. In ICASSP. IEEE, 2785--2789.Google ScholarGoogle Scholar
  16. Badreddine Noune, Philip Jones, Daniel Justus, Dominic Masters, and Carlo Luschi. 2022. 8-bit Numerical Formats for Deep Neural Networks. arXiv preprint arXiv:2206.02915 (2022).Google ScholarGoogle Scholar
  17. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. https://doi.org/ 10.1007/s11263-015-0816-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chen Tang, Kai Ouyang, Zhi Wang, Yifei Zhu, Yaowei Wang, Wen Ji, and Wenwu Zhu. 2022. Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance. arXiv preprint arXiv:2203.08368 (2022).Google ScholarGoogle Scholar
  19. Mart van Baalen, Brian Kahne, Eric Mahurin, Andrey Kuzmin, Andrii Skliar, Markus Nagel, and Tijmen Blankevoort. 2022. Simulated Quantization, Real Power Savings. In Proceedings of the IEEE/CVF CVPR. 2757--2761.Google ScholarGoogle ScholarCross RefCross Ref
  20. Mart Van Baalen, Christos Louizos, Markus Nagel, Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, and Max Welling. 2020. Bayesian bits: Unifying quantization and pruning. Advances in neural information processing systems 33 (2020), 5741--5752.Google ScholarGoogle Scholar
  21. Vaibhav Verma, Tommy Tracy II, and Mircea R Stan. 2022. EXTREM- EDGE-EXtensions To RISC-V for Energy-efficient ML inference at the EDGE of IoT. Sustainable Computing: Informatics and Systems 35 (2022), 100742.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. Haq: Hardware- aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF CVPR. 8612--8620.Google ScholarGoogle ScholarCross RefCross Ref
  23. Dingcheng Yang, Wenjian Yu, Haoyuan Mu, and Gary Yao. 2021. Dynamic programming assisted quantization approaches for compressing Normal and robust DNN models. In Proceedings of the 26th ASPDAC. 351--357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Huanrui Yang, Lin Duan, Yiran Chen, and Hai Li. 2021. BSQ: Exploring bit-level sparsity for mixed-precision neural network quantization. arXiv preprint arXiv:2102.10462 (2021).Google ScholarGoogle Scholar
  25. Linjie Yang and Qing Jin. 2021. Fracbits: Mixed precision quantization via frac- tional bit-widths. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10612--10620.Google ScholarGoogle Scholar
  26. Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, et al. 2021. Hawq-v3: Dyadic neural network quantization. In International Conference on Machine Learning. PMLR, 11875--11886.Google ScholarGoogle Scholar
  27. Haibao Yu, Qi Han, Jianbo Li, Jianping Shi, Guangliang Cheng, and Bin Fan. 2020. Search what you want: Barrier panelty NAS for mixed precision quantization. In ECCV. Springer, 1--16.Google ScholarGoogle Scholar
  28. Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the ECCV. 365--382.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Design Space Exploration of Layer-Wise Mixed-Precision Quantization with Tightly Integrated Edge Inference Units

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023
          June 2023
          731 pages
          ISBN:9798400701252
          DOI:10.1145/3583781

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 June 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          Overall Acceptance Rate312of1,156submissions,27%

          Upcoming Conference

          GLSVLSI '24
          Great Lakes Symposium on VLSI 2024
          June 12 - 14, 2024
          Clearwater , FL , USA
        • Article Metrics

          • Downloads (Last 12 months)151
          • Downloads (Last 6 weeks)18

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader