Skip to main content

BitHist: A Precision-Scalable Sparse-Awareness DNN Accelerator Based on Bit Slices Products Histogram

  • Conference paper
  • First Online:
Euro-Par 2023: Parallel Processing (Euro-Par 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14100))

Included in the following conference series:

  • 1440 Accesses

Abstract

Memory and power-sensitive edge devices benefit from quantized models based on precision-scalable CNN accelerators. These accelerators can process CNN models that with different data precisions, which rely on the precision scalable multiply-accumulate (MAC) unit. Among all types of MAC units, the spatial precision scalable MAC (SPM) unit is an attractive one as it is flexible and can convert the decrease in data width into an increase in throughput. However, it becomes energy-inefficient due to the need for more shifters and high-width adders as the bit width of the operand increases. Taking advantage of the limited number of unique products of 2-bit unsigned multiplication in the existing SPM, this paper proposes a new MAC method based on the unique product histogram, which is orthogonal to the existing methods. Based on the proposed MAC method, this paper also proposes the BitHist, an efficient DNN accelerator that exploits both bit-level and data-level sparsity. The evaluation results illustrate that BitHist saves 57% of the area compared to the BitFusion and provides up to 4.60\(\times \) throughput per area and 17.4\(\times \) energy efficiency. Additionally, BitHist can achieve a 2.28\(\times \) performance gain from sparsity exploitation.

Supported by the National Science and Technology Major Project from Minister of Science and Technology, China (Grant No. 2018AAA0103100), National Natural Science Foundation of China under Grant 62236007, and Guangzhou Basic Research Program under Grant 202201011389.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, A., et al.: 9.1 A 7nm 4-core AI chip with 25.6TFLOPS hybrid fp8 training, 102.4TOPS INT4 inference and workload-aware throttling. In: 2021 IEEE International Solid- State Circuits Conference (ISSCC), vol. 64, pp. 144–146 (2021). https://doi.org/10.1109/ISSCC42613.2021.9365791

  2. Chen, Y.H., et al.: Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput. Archit. news 44(3), 367–379 (2016). https://doi.org/10.1145/3007787.3001177

    Article  Google Scholar 

  3. Courbariaux, M., et al.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28. Curran Associates, Inc. (2015). https://proceedings.neurips.cc/paper/2015/file/3e15cc11f979ed25912dff5b0669f2cd-Paper.pdf

  4. Delmas, A., Sharify, S., Judd, P., Nikolic, M., Moshovos, A.: Dpred: Making typical activation values matter in deep learning computing. CoRR, vol. abs/1804.06732 (2018). https://doi.org/10.48550/arXiv.1804.06732

  5. Gou, J., et al.: Knowledge distillation: a survey. Int. J. Comput. Vis. 129, 1789–1819 (2021). https://doi.org/10.1007/s11263-021-01453-z

    Article  Google Scholar 

  6. Han, S., et al.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015). https://doi.org/10.48550/arXiv.1510.00149

  7. Hu, D.: An introductory survey on attention mechanisms in NLP problems. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) IntelliSys 2019. AISC, vol. 1038, pp. 432–448. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29513-4_31

    Chapter  Google Scholar 

  8. Idelbayev, Y., et al.: Low-rank compression of neural nets: learning the rank of each layer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020). https://doi.org/10.1109/CVPR42600.2020.00807

  9. Im, D., et al.: DSPU: a 281.6 mw real-time depth signal processing unit for deep learning-based dense RGB-D data acquisition with depth fusion and 3d bounding box extraction in mobile platforms. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), vol. 65, pp. 510–512. IEEE (2022). https://doi.org/10.1109/ISSCC42614.2022.9731699

  10. Internet: NVDLA open source project. [EB/OL]. https://nvdla.org

  11. Jiao, Y., et al.: 7.2 a 12nm programmable convolution-efficient neural-processing-unit chip achieving 825TOPS. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 136–140 (2020). https://doi.org/10.1109/ISSCC19947.2020.9062984

  12. Judd, P., et al.: Stripes: bit-serial deep neural network computing. IEEE Comput. Archit. Lett. 16(1), 80–83 (2017). https://doi.org/10.1109/LCA.2016.2597140

    Article  Google Scholar 

  13. Kang, S., et al.: 7.4 GANPU: a 135TFLOPS/W multi-DNN training processor for GANs with speculative dual-sparsity exploitation. In: 2020 IEEE International Solid- State Circuits Conference - (ISSCC), pp. 140–142 (2020). https://doi.org/10.1109/ISSCC19947.2020.9062989

  14. Kapur, S., et al.: Low precision RNNs: quantizing RNNs without losing accuracy. arXiv preprint arXiv:1710.07706 (2017). 10.48550/arXiv. 1710.07706

  15. Kim, M., Ham, Y., Koo, C., Kim, T.W.: Simulating travel paths of construction site workers via deep reinforcement learning considering their spatial cognition and wayfinding behavior. Autom. Constr. 147, 104715 (2023)

    Article  Google Scholar 

  16. Li, F., et al.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016). 10.48550/arXiv. 1605.04711

  17. Lin, D.D., Talathi, S.S., Annapureddy, V.S.: Fixed point quantization of deep convolutional networks. In: Computer ENCE (2016). https://doi.org/10.48550/arXiv.1511.06393

  18. Lu, H., et al.: Distilling bit-level sparsity parallelism for general purpose deep learning acceleration. In: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 963–976 (2021). https://doi.org/10.1145/3466752.3480123

  19. Mei, L., et al.: Sub-word parallel precision-scalable mac engines for efficient embedded DNN inference. In: 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 6–10 (2019). https://doi.org/10.1109/AICAS.2019.8771481

  20. Nagel, M., et al.: A white paper on neural network quantization. arXiv preprint arXiv:2106.08295 (2021). https://doi.org/10.48550/arXiv.2106.08295

  21. Sharify, S., Lascorz, A.D., Siu, K., Judd, P., Moshovos, A.: Loom: exploiting weight and activation precisions to accelerate convolutional neural networks. In: 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), pp. 1–6 (2018). https://doi.org/10.1109/DAC.2018.8465915

  22. Sharma, H., et al.: Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network. In: 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 764–775 (2018). https://doi.org/10.1109/ISCA.2018.00069

  23. Tropea, M., et al.: Classifiers comparison for convolutional neural networks (CNNs) in image classification. In: 2019 IEEE/ACM 23rd International Symposium on Distributed Simulation and Real Time Applications (DS-RT), pp. 1–4 (2019). https://doi.org/10.1109/DS-RT47707.2019.8958662

  24. Yuan, Z., et al.: A sparse-adaptive CNN processor with area/performance balanced n-way set-associate PE arrays assisted by a collision-aware scheduler. In: 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp. 61–64 (2019). https://doi.org/10.1109/A-SSCC47793.2019.9056918

  25. Zhu, C., et al.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016). https://doi.org/10.48550/arXiv.1612.01064

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Hao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Meng, Z., Xiao, L., Gao, X., Li, Z., Shu, L., Hao, J. (2023). BitHist: A Precision-Scalable Sparse-Awareness DNN Accelerator Based on Bit Slices Products Histogram. In: Cano, J., Dikaiakos, M.D., Papadopoulos, G.A., Pericàs, M., Sakellariou, R. (eds) Euro-Par 2023: Parallel Processing. Euro-Par 2023. Lecture Notes in Computer Science, vol 14100. Springer, Cham. https://doi.org/10.1007/978-3-031-39698-4_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39698-4_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39697-7

  • Online ISBN: 978-3-031-39698-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics