An Adaptive Logarithm Quantization Method for DNN Compression

Wang, Yuan; He, Zhaoliang; Tang, Chen; Wang, Zhi; Zhu, Wenwu

doi:10.1007/978-3-030-92307-5_41

Yuan Wang¹⁰,
Zhaoliang He¹¹,
Chen Tang¹²,
Zhi Wang¹² &
…
Wenwu Zhu¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1516))

Included in the following conference series:

International Conference on Neural Information Processing

2428 Accesses

Abstract

The size and complexity of Neural Network models grow rapidly in recent years, which makes the inference of these models require more computational and memory resources. To reduce the required resources, quantization is one of the promising methods. Logarithm quantization can both reduce the model size and the computational complexity because the time-consuming multiplication operation can be replaced with the addition operation in logarithm domain. However, the previous logarithm quantization methods use a fixed logarithm base. Therefore, they cannot adapt according to the distribution of data and bit-width budgets, which causes performance degradation. To address such a problem, we propose an adaptive quantization method to optimize the quantization function. Our method first finds an optimized weight quantization function by minimizing the quantization loss of the model’s weight data under a given bit-width budget. Then we use a zero-shot way to find an optimized quantization function for activation data. Finding the optimized parameters is time-consuming. We propose a heuristic algorithm to solve the optimization problem fast. Compared to the previous logarithm quantization methods, our method can achieve up to 72.53% higher Top-1 accuracy under the same bit-width constraint.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cai, J., Takemoto, M., Nakajo, H.: A deep look into logarithmic quantization of model parameters in neural networks. In: Proceedings of the 10th International Conference on Advances in Information Technology, IAIT 2018, Association for Computing Machinery, New York (2018)
Google Scholar
Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: ZeroQ: a novel zero shot quantization framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)
Google Scholar
Chen, Y., et al.: T-DLA: An open-source deep learning accelerator for ternarized DNN models on embedded FPGA. In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 13–18 (2019)
Google Scholar
Deng, L., Li, G., Han, S., Shi, L., Xie, Y.: Model compression and hardware acceleration for neural networks: a comprehensive survey. Proc. IEEE 108(4), 485–532 (2020)
Article Google Scholar
Gong, C., Chen, Y., Lu, Y., Li, T., Hao, C., Chen, D.: VecQ: minimal loss DNN model compression with vectorized weight quantization. IEEE Trans. Comput. 70(5), 696–710 (2021). https://doi.org/10.1109/TC.2020.2995593
Gong, C., Li, et al.: \( \mu \)L2Q: An ultra-low loss quantization method for DNN compression. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. ArXiv arXiv:1502.03167 (2015)
Miyashita, D., Lee, E.G., Murmann, B.: Convolutional neural networks using logarithmic data representation. ArXiv arXiv:1603.01025 (2016)
Nogami, W., Ikegami, T., O’uchi, S., Takano, R., Kudoh, T.: Optimizing weight value quantization for CNN inference. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
Google Scholar
Vogel, S., Liang, M., Guntoro, A., Stechele, W., Ascheid, G.: Efficient hardware acceleration of CNNs using logarithmic data representation with arbitrary log-base. In: Proceedings of the International Conference on Computer-Aided Design, ICCAD 2018, Association for Computing Machinery, New York (2018)
Google Scholar
Wang, E., et al.: Deep neural network approximation for custom hardware: Where we’ve been, where we’re going. ACM Comput. Surv. 52(2), 1–39 (2019)
Google Scholar
Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: hardware-aware automated quantization with mixed precision. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8604–8612 (2019)
Google Scholar

Download references

Acknowledgements

This work is supported in part by NSFC (Grant No. 61872215), and Shenzhen Science and Technology Program (Grant No. RCYX20200714114523079).

Author information

Authors and Affiliations

Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, Shenzhen, China
Yuan Wang
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Zhaoliang He & Wenwu Zhu
Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Chen Tang & Zhi Wang

Authors

Yuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoliang He
View author publications
You can also search for this author in PubMed Google Scholar
Chen Tang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenwu Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhi Wang or Wenwu Zhu .

Editor information

Editors and Affiliations

Sampoerna University, Jakarta, Indonesia
Teddy Mantoro
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Sampoerna University, Jakarta, Indonesia
Media Anugerah Ayu
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Universitas Indonesia, Depok, Indonesia
Achmad Nizar Hidayanto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., He, Z., Tang, C., Wang, Z., Zhu, W. (2021). An Adaptive Logarithm Quantization Method for DNN Compression. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds) Neural Information Processing. ICONIP 2021. Communications in Computer and Information Science, vol 1516. Springer, Cham. https://doi.org/10.1007/978-3-030-92307-5_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-92307-5_41
Published: 02 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92306-8
Online ISBN: 978-3-030-92307-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Adaptive Logarithm Quantization Method for DNN Compression