Abstract
Memory and power-sensitive edge devices benefit from quantized models based on precision-scalable CNN accelerators. These accelerators can process CNN models that with different data precisions, which rely on the precision scalable multiply-accumulate (MAC) unit. Among all types of MAC units, the spatial precision scalable MAC (SPM) unit is an attractive one as it is flexible and can convert the decrease in data width into an increase in throughput. However, it becomes energy-inefficient due to the need for more shifters and high-width adders as the bit width of the operand increases. Taking advantage of the limited number of unique products of 2-bit unsigned multiplication in the existing SPM, this paper proposes a new MAC method based on the unique product histogram, which is orthogonal to the existing methods. Based on the proposed MAC method, this paper also proposes the BitHist, an efficient DNN accelerator that exploits both bit-level and data-level sparsity. The evaluation results illustrate that BitHist saves 57% of the area compared to the BitFusion and provides up to 4.60\(\times \) throughput per area and 17.4\(\times \) energy efficiency. Additionally, BitHist can achieve a 2.28\(\times \) performance gain from sparsity exploitation.
Supported by the National Science and Technology Major Project from Minister of Science and Technology, China (Grant No. 2018AAA0103100), National Natural Science Foundation of China under Grant 62236007, and Guangzhou Basic Research Program under Grant 202201011389.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, A., et al.: 9.1 A 7nm 4-core AI chip with 25.6TFLOPS hybrid fp8 training, 102.4TOPS INT4 inference and workload-aware throttling. In: 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 144–146 (2021). https://doi.org/10.1109/ISSCC42613.2021.9365791
Chen, Y.H., et al.: Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput. Archit. news 44(3), 367–379 (2016). https://doi.org/10.1145/3007787.3001177
Courbariaux, M., et al.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/3e15cc11f979ed25912dff5b0669f2cd-Paper.pdf
Delmas, A., Sharify, S., Judd, P., Nikolic, M., Moshovos, A.: Dpred: Making typical activation values matter in deep learning computing. CoRR, vol. abs/1804.06732 (2018). https://doi.org/10.48550/arXiv.1804.06732
Gou, J., et al.: Knowledge distillation: a survey. Int. J. Comput. Vis. 129, 1789–1819 (2021). https://doi.org/10.1007/s11263-021-01453-z
Han, S., et al.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015). https://doi.org/10.48550/arXiv.1510.00149
Hu, D.: An introductory survey on attention mechanisms in NLP problems. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) IntelliSys 2019. AISC, vol. 1038, pp. 432–448. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29513-4_31
Idelbayev, Y., et al.: Low-rank compression of neural nets: learning the rank of each layer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/CVPR42600.2020.00807
Im, D., et al.: DSPU: a 281.6 mw real-time depth signal processing unit for deep learning-based dense RGB-D data acquisition with depth fusion and 3d bounding box extraction in mobile platforms. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65, pp. 510–512. IEEE (2022). https://doi.org/10.1109/ISSCC42614.2022.9731699
Internet: NVDLA open source project. [EB/OL]. https://nvdla.org
Jiao, Y., et al.: 7.2 a 12nm programmable convolution-efficient neural-processing-unit chip achieving 825TOPS. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 136–140 (2020). https://doi.org/10.1109/ISSCC19947.2020.9062984
Judd, P., et al.: Stripes: bit-serial deep neural network computing. IEEE Comput. Archit. Lett. 16(1), 80–83 (2017). https://doi.org/10.1109/LCA.2016.2597140
Kang, S., et al.: 7.4 GANPU: a 135TFLOPS/W multi-DNN training processor for GANs with speculative dual-sparsity exploitation. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 140–142 (2020). https://doi.org/10.1109/ISSCC19947.2020.9062989
Kapur, S., et al.: Low precision RNNs: quantizing RNNs without losing accuracy. arXiv preprint arXiv:1710.07706 (2017). 10.48550/arXiv. 1710.07706
Kim, M., Ham, Y., Koo, C., Kim, T.W.: Simulating travel paths of construction site workers via deep reinforcement learning considering their spatial cognition and wayfinding behavior. Autom. Constr. 147, 104715 (2023)
Li, F., et al.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016). 10.48550/arXiv. 1605.04711
Lin, D.D., Talathi, S.S., Annapureddy, V.S.: Fixed point quantization of deep convolutional networks. In: Computer ENCE (2016). https://doi.org/10.48550/arXiv.1511.06393
Lu, H., et al.: Distilling bit-level sparsity parallelism for general purpose deep learning acceleration. In: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 963–976 (2021). https://doi.org/10.1145/3466752.3480123
Mei, L., et al.: Sub-word parallel precision-scalable mac engines for efficient embedded DNN inference. In: 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 6–10 (2019). https://doi.org/10.1109/AICAS.2019.8771481
Nagel, M., et al.: A white paper on neural network quantization. arXiv preprint arXiv:2106.08295 (2021). https://doi.org/10.48550/arXiv.2106.08295
Sharify, S., Lascorz, A.D., Siu, K., Judd, P., Moshovos, A.: Loom: exploiting weight and activation precisions to accelerate convolutional neural networks. In: 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pp. 1–6 (2018). https://doi.org/10.1109/DAC.2018.8465915
Sharma, H., et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 764–775 (2018). https://doi.org/10.1109/ISCA.2018.00069
Tropea, M., et al.: Classifiers comparison for convolutional neural networks (CNNs) in image classification. In: 2019 IEEE/ACM 23rd International Symposium on Distributed Simulation and Real Time Applications (DS-RT), pp. 1–4 (2019). https://doi.org/10.1109/DS-RT47707.2019.8958662
Yuan, Z., et al.: A sparse-adaptive CNN processor with area/performance balanced n-way set-associate PE arrays assisted by a collision-aware scheduler. In: 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp. 61–64 (2019). https://doi.org/10.1109/A-SSCC47793.2019.9056918
Zhu, C., et al.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016). https://doi.org/10.48550/arXiv.1612.01064
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Meng, Z., Xiao, L., Gao, X., Li, Z., Shu, L., Hao, J. (2023). BitHist: A Precision-Scalable Sparse-Awareness DNN Accelerator Based on Bit Slices Products Histogram. In: Cano, J., Dikaiakos, M.D., Papadopoulos, G.A., Pericàs, M., Sakellariou, R. (eds) Euro-Par 2023: Parallel Processing. Euro-Par 2023. Lecture Notes in Computer Science, vol 14100. Springer, Cham. https://doi.org/10.1007/978-3-031-39698-4_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-39698-4_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39697-7
Online ISBN: 978-3-031-39698-4
eBook Packages: Computer ScienceComputer Science (R0)