Base-Reconfigurable Segmented Logarithmic Quantization and Hardware Design for Deep Neural Networks

Xu, Jiawei; Huan, Yuxiang; Jin, Yi; Chu, Haoming; Zheng, Li-Rong; Zou, Zhuo

doi:10.1007/s11265-020-01557-8

Base-Reconfigurable Segmented Logarithmic Quantization and Hardware Design for Deep Neural Networks

Published: 20 July 2020

Volume 92, pages 1263–1276, (2020)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Jiawei Xu¹,
Yuxiang Huan¹,
Yi Jin¹,
Haoming Chu¹,
Li-Rong Zheng¹ &
…
Zhuo Zou ORCID: orcid.org/0000-0002-8546-1329¹

592 Accesses
8 Citations
Explore all metrics

Abstract

The growth in the size of deep neural network (DNN) models poses both computational and memory challenges to the efficient and effective implementation of DNNs on platforms with limited hardware resources. Our work on segmented logarithmic (SegLog) quantization, adopting both base-2 and base-\(\sqrt {2}\) logarithmic encoding, is able to reduce inference cost with a little accuracy penalty. However, weight distribution varies among layers in different DNN models, and requires different base-2 : base-\(\sqrt {2}\) ratios to reach the best accuracy. This means different hardware designs for the decoding and computing parts are required. This paper extends the idea of SegLog quantization by using layer-wise base-2 : base-\(\sqrt {2}\) ratio on weight quantization. The proposed base-reconfigurable segmented logarithmic (BRSLog) quantization is able to achieve 6.4x weight compression with 1.66% Top-5 accuracy drop on AlexNet at 5-bit resolution. An arithmetic element supporting BRSLog-quantified DNN inference is proposed to adapt to different base-2 : base-\(\sqrt {2}\) ratios. With \(\sqrt {2}\) approximation, the resource-consuming multipliers can be replaced by shifters and adders with only 0.54% accuracy penalty. The proposed arithmetic element is simulated in UMC 55nm Low Power Process, and it is 50.42% smaller in area and 55.60% lower in power consumption than the widely-used 16-bit fixed-point multiplier. Compared with equivalent SegLog arithmetic element designed for fixed base-2 : base-\(\sqrt {2}\) ratio, the base-reconfigurable part only increases the area by 22.96 μm² and energy cost by 2.6 μW.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Precision-Aware Neuron Engine for DNN Accelerators

Article 26 April 2024

BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks

A Reconfigurable Multiplier/Dot-Product Unit for Precision-Scalable Deep Learning Applications

References

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778).
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
Article Google Scholar
Han, S., Mao, H., & Dally, W.J. (2015). Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. arXiv:abs/1510.00149.
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., & Bengio, Y. (2016). Binarized neural networks:, Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv:1602.02830.
Lee, E.H., Miyashita, D., Chai, E., Murmann, B., & Wong, S.S. (2017). LogNet: Energy-efficient neural networks using logarithmic computation. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5900–5904).
Xu, J., Huan, Y., Zheng, L., & Zou, Z. (2018). A Low-Power Arithmetic Element for Multi-Base Logarithmic Computation on Deep Neural Networks. In 2018 31st IEEE International System-on-Chip Conference (SOCC) (pp. 43–48).
Jafri, S.M.A.H., Hemani, A., Paul, K., & Abbas, N. (2017). MOCHA: Morphable Locality and Compression Aware Architecture for Convolutional Neural Networks. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 276–286).
Zhou, A., Yao, A., Guo, Y., Xu, L., & Chen, Y. (2017). Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights. arXiv:abs/1702.03044.
Jung, S., Son, C., Lee, S., Son, J., Han, J. -J., Kwak, Y., Hwang, S.J., & Choi, C. (June 2019). Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss.
Wang, K., Liu, Z., Lin, Y., Lin, J., & Han, S. (June 2019). HAQ: Hardware-Aware Automated Quantization With Mixed Precision. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q.V. (2019). MnasNet: Platform-Aware Neural Architecture Search for Mobile. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Miyashita, D., Lee, E.H., & Murmann, B. (2016). Convolutional neural networks using logarithmic data representation. arXiv:1603.01025.
Chen, Y., Krishna, T., Emer, J.S., & Sze, V. (2017). Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits, 52, 127–138.
Article Google Scholar
Shin, D., Lee, J., Lee, J., & Yoo, H. (2017). DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In 2017 IEEE International Solid-State Circuits Conference (ISSCC) (pp. 240–241).
Lee, J., Lee, J., Han, D., Lee, J., Park, G., & Yoo, H. (2019). LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16. In 2019 IEEE International Solid- State Circuits Conference - (ISSCC) (pp. 142–144).
Song, J., Cho, Y., Park, J., Jang, J., Lee, S., Song, J., Lee, J., & Kang, I. (2019). An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC. In 2019 IEEE International Solid- State Circuits Conference - (ISSCC) (pp. 130–132).
Stathis, D., Yang, Y., Tewari, S., Hemani, A., Paul, K., Grabherr, M., & Ahmad, R. (2019). Approximate Computing Applied to Bacterial Genome Identification using Self-Organizing Maps. In 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (pp. 560–567).
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., & Dally, W.J. (2016). EIE: Efficient Inference Engine on Compressed Deep Neural Network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (pp. 243–254).
Ando, K., Ueyoshi, K., Orimo, K., Yonekawa, H., Sato, S., Nakahara, H., Takamaeda-Yamazaki, S., Ikebe, M., Asai, T., Kuroda, T., & Motomura, M. (2018). BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W. IEEE Journal of Solid-State Circuits, 53, 983–994.
Article Google Scholar
Ueyoshi, K., Ando, K., Hirose, K., Takamaeda-Yamazaki, S., Kadomoto, J., Miyata, T., Hamada, M., Kuroda, T., & Motomura, M. (2018). QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS. In 2018 IEEE International Solid - State Circuits Conference - (ISSCC) (pp. 216–218).
Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (pp. 1135–1143).
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 25th International Conference on Neural Information Processing Systems, 1, 1097–1105.
Google Scholar
Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks For Large-Scale Image Recognition. arXiv:1409.1556.
Huang, G., Liu, Z., Maaten, L.V.D., & Weinberger, K.Q. (2017). Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2261–2269).
Jin, Y., Xu, J., Huan, Y., Yan, Y., Zheng, L., & Zou, Z. (2019). Energy-Aware Workload Allocation for Distributed Deep Neural Networks in Edge-Cloud Continuum. In 2019 32st IEEE international system-on-chip conference (SOCC).
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., & Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.
Article MathSciNet Google Scholar
Vedaldi, A., & Lenc, K. (2015). MatConvNet: Convolutional Neural Networks for MATLAB. In Proceedings of the 23rd ACM international conference on multimedia (pp. 689–692).

Download references

Acknowledgments

This work was supported in part by NSFC grants No. 61876039 and No. 62011530132, Shanghai Municipal Science and Technology Major Project (No.2018SHZDZX01) and ZJ Lab, and the Shanghai Platform for Neuromorphic and AI Chip (No. 17DZ2260900).

Author information

Authors and Affiliations

State Key Laboratory of ASIC and System, Fudan University, Shanghai, China
Jiawei Xu, Yuxiang Huan, Yi Jin, Haoming Chu, Li-Rong Zheng & Zhuo Zou

Authors

Jiawei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuxiang Huan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Jin
View author publications
You can also search for this author in PubMed Google Scholar
Haoming Chu
View author publications
You can also search for this author in PubMed Google Scholar
Li-Rong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Li-Rong Zheng or Zhuo Zou.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jiawei Xu and Yuxiang Huan contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Huan, Y., Jin, Y. et al. Base-Reconfigurable Segmented Logarithmic Quantization and Hardware Design for Deep Neural Networks. J Sign Process Syst 92, 1263–1276 (2020). https://doi.org/10.1007/s11265-020-01557-8

Download citation

Received: 18 December 2019
Revised: 30 April 2020
Accepted: 20 May 2020
Published: 20 July 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11265-020-01557-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Base-Reconfigurable Segmented Logarithmic Quantization and Hardware Design for Deep Neural Networks

Abstract

Access this article

Similar content being viewed by others

A Precision-Aware Neuron Engine for DNN Accelerators

BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks

A Reconfigurable Multiplier/Dot-Product Unit for Precision-Scalable Deep Learning Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Base-Reconfigurable Segmented Logarithmic Quantization and Hardware Design for Deep Neural Networks

Abstract

Access this article

Similar content being viewed by others

A Precision-Aware Neuron Engine for DNN Accelerators

BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks

A Reconfigurable Multiplier/Dot-Product Unit for Precision-Scalable Deep Learning Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation