BitHist: A Precision-Scalable Sparse-Awareness DNN Accelerator Based on Bit Slices Products Histogram

Meng, Zhaoteng; Xiao, Long; Gao, Xiaoyao; Li, Zhan; Shu, Lin; Hao, Jie

doi:10.1007/978-3-031-39698-4_20

Zhaoteng Meng^12,13,
Long Xiao^12,13,
Xiaoyao Gao^12,13,
Zhan Li^12,13,
Lin Shu^12,14 &
…
Jie Hao^12,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14100))

Included in the following conference series:

European Conference on Parallel Processing

1440 Accesses

Abstract

Memory and power-sensitive edge devices benefit from quantized models based on precision-scalable CNN accelerators. These accelerators can process CNN models that with different data precisions, which rely on the precision scalable multiply-accumulate (MAC) unit. Among all types of MAC units, the spatial precision scalable MAC (SPM) unit is an attractive one as it is flexible and can convert the decrease in data width into an increase in throughput. However, it becomes energy-inefficient due to the need for more shifters and high-width adders as the bit width of the operand increases. Taking advantage of the limited number of unique products of 2-bit unsigned multiplication in the existing SPM, this paper proposes a new MAC method based on the unique product histogram, which is orthogonal to the existing methods. Based on the proposed MAC method, this paper also proposes the BitHist, an efficient DNN accelerator that exploits both bit-level and data-level sparsity. The evaluation results illustrate that BitHist saves 57% of the area compared to the BitFusion and provides up to 4.60\(\times \) throughput per area and 17.4\(\times \) energy efficiency. Additionally, BitHist can achieve a 2.28\(\times \) performance gain from sparsity exploitation.

Supported by the National Science and Technology Major Project from Minister of Science and Technology, China (Grant No. 2018AAA0103100), National Natural Science Foundation of China under Grant 62236007, and Guangzhou Basic Research Program under Grant 202201011389.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, A., et al.: 9.1 A 7nm 4-core AI chip with 25.6TFLOPS hybrid fp8 training, 102.4TOPS INT4 inference and workload-aware throttling. In: 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 144–146 (2021). https://doi.org/10.1109/ISSCC42613.2021.9365791
Chen, Y.H., et al.: Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput. Archit. news 44(3), 367–379 (2016). https://doi.org/10.1145/3007787.3001177
Article Google Scholar
Courbariaux, M., et al.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/3e15cc11f979ed25912dff5b0669f2cd-Paper.pdf
Delmas, A., Sharify, S., Judd, P., Nikolic, M., Moshovos, A.: Dpred: Making typical activation values matter in deep learning computing. CoRR, vol. abs/1804.06732 (2018). https://doi.org/10.48550/arXiv.1804.06732
Gou, J., et al.: Knowledge distillation: a survey. Int. J. Comput. Vis. 129, 1789–1819 (2021). https://doi.org/10.1007/s11263-021-01453-z
Article Google Scholar
Han, S., et al.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015). https://doi.org/10.48550/arXiv.1510.00149
Hu, D.: An introductory survey on attention mechanisms in NLP problems. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) IntelliSys 2019. AISC, vol. 1038, pp. 432–448. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29513-4_31
Chapter Google Scholar
Idelbayev, Y., et al.: Low-rank compression of neural nets: learning the rank of each layer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/CVPR42600.2020.00807
Im, D., et al.: DSPU: a 281.6 mw real-time depth signal processing unit for deep learning-based dense RGB-D data acquisition with depth fusion and 3d bounding box extraction in mobile platforms. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65, pp. 510–512. IEEE (2022). https://doi.org/10.1109/ISSCC42614.2022.9731699
Internet: NVDLA open source project. [EB/OL]. https://nvdla.org
Jiao, Y., et al.: 7.2 a 12nm programmable convolution-efficient neural-processing-unit chip achieving 825TOPS. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 136–140 (2020). https://doi.org/10.1109/ISSCC19947.2020.9062984
Judd, P., et al.: Stripes: bit-serial deep neural network computing. IEEE Comput. Archit. Lett. 16(1), 80–83 (2017). https://doi.org/10.1109/LCA.2016.2597140
Article Google Scholar
Kang, S., et al.: 7.4 GANPU: a 135TFLOPS/W multi-DNN training processor for GANs with speculative dual-sparsity exploitation. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 140–142 (2020). https://doi.org/10.1109/ISSCC19947.2020.9062989
Kapur, S., et al.: Low precision RNNs: quantizing RNNs without losing accuracy. arXiv preprint arXiv:1710.07706 (2017). 10.48550/arXiv. 1710.07706
Kim, M., Ham, Y., Koo, C., Kim, T.W.: Simulating travel paths of construction site workers via deep reinforcement learning considering their spatial cognition and wayfinding behavior. Autom. Constr. 147, 104715 (2023)
Article Google Scholar
Li, F., et al.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016). 10.48550/arXiv. 1605.04711
Lin, D.D., Talathi, S.S., Annapureddy, V.S.: Fixed point quantization of deep convolutional networks. In: Computer ENCE (2016). https://doi.org/10.48550/arXiv.1511.06393
Lu, H., et al.: Distilling bit-level sparsity parallelism for general purpose deep learning acceleration. In: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 963–976 (2021). https://doi.org/10.1145/3466752.3480123
Mei, L., et al.: Sub-word parallel precision-scalable mac engines for efficient embedded DNN inference. In: 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 6–10 (2019). https://doi.org/10.1109/AICAS.2019.8771481
Nagel, M., et al.: A white paper on neural network quantization. arXiv preprint arXiv:2106.08295 (2021). https://doi.org/10.48550/arXiv.2106.08295
Sharify, S., Lascorz, A.D., Siu, K., Judd, P., Moshovos, A.: Loom: exploiting weight and activation precisions to accelerate convolutional neural networks. In: 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pp. 1–6 (2018). https://doi.org/10.1109/DAC.2018.8465915
Sharma, H., et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 764–775 (2018). https://doi.org/10.1109/ISCA.2018.00069
Tropea, M., et al.: Classifiers comparison for convolutional neural networks (CNNs) in image classification. In: 2019 IEEE/ACM 23rd International Symposium on Distributed Simulation and Real Time Applications (DS-RT), pp. 1–4 (2019). https://doi.org/10.1109/DS-RT47707.2019.8958662
Yuan, Z., et al.: A sparse-adaptive CNN processor with area/performance balanced n-way set-associate PE arrays assisted by a collision-aware scheduler. In: 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp. 61–64 (2019). https://doi.org/10.1109/A-SSCC47793.2019.9056918
Zhu, C., et al.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016). https://doi.org/10.48550/arXiv.1612.01064

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaoteng Meng, Long Xiao, Xiaoyao Gao, Zhan Li, Lin Shu & Jie Hao
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Zhaoteng Meng, Long Xiao, Xiaoyao Gao & Zhan Li
Guangdong Institute of Artificial Intelligence and Advanced Computing, Guangzhou, China
Lin Shu & Jie Hao

Authors

Zhaoteng Meng
View author publications
You can also search for this author in PubMed Google Scholar
Long Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyao Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhan Li
View author publications
You can also search for this author in PubMed Google Scholar
Lin Shu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Hao .

Editor information

Editors and Affiliations

University of Glasgow, Glasgow, UK
José Cano
University of Cyprus, Nicosia, Cyprus
Marios D. Dikaiakos
University of Cyprus, Nicosia, Cyprus
George A. Papadopoulos
Chalmers University of Technology, Gothenburg, Sweden
Miquel Pericàs
University of Manchester, Manchester, UK
Rizos Sakellariou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meng, Z., Xiao, L., Gao, X., Li, Z., Shu, L., Hao, J. (2023). BitHist: A Precision-Scalable Sparse-Awareness DNN Accelerator Based on Bit Slices Products Histogram. In: Cano, J., Dikaiakos, M.D., Papadopoulos, G.A., Pericàs, M., Sakellariou, R. (eds) Euro-Par 2023: Parallel Processing. Euro-Par 2023. Lecture Notes in Computer Science, vol 14100. Springer, Cham. https://doi.org/10.1007/978-3-031-39698-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-39698-4_20
Published: 24 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39697-7
Online ISBN: 978-3-031-39698-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BitHist: A Precision-Scalable Sparse-Awareness DNN Accelerator Based on Bit Slices Products Histogram