Skip to main content

Advertisement

Log in

MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Today, the high accuracy of deep learning has led to use in various domains such as image and voice classification. However, vast computations of deep neural networks (DNNs) have caused the inefficiency of traditional processors, resulting in the emergence of hardware accelerators. DNN accelerators have increased performance by exploiting opportunities such as data reuse and sparsity. In these accelerators, the dataflow is an essential factor, so that some of them have a reconfigurable architecture to support different mappings and dataflows. However, the accelerators explicitly designed for exploiting the data sparsity are usually non-reconfigurable and have fixed dataflow. This paper presents a new dataflow called Channel Dimension Stationary (CDS) for the MAERI (a Reconfigurable Neural Network Accelerator). It can be used for convolutional layers with sparse input feature maps (ifmaps). In the proposed dataflow, computations are based on the Cartesian product method. However, multiplications leading to useless results are avoided. To analyze the mapping based on CDS dataflow, we upgraded the mRNA tool (mapper for Reconfigurable Neural Accelerators), which includes an energy and performance analyzer of the mapping strategy in MAERI. By evaluation, we found that in the sparse ifmaps of 50%, 70%, and 90%, the proposed mapping on average can increase energy efficiency by 3x, 6x, and 13x respectively, without noticeable reduction of utilization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

The original data and analyzer used and analyzed during the current study are available in the Github repository, https://github.com/georgia-tech-synergy-lab/mRNA. In addition, part of the data generated during the present study is included in this manuscript, and generated codes are available from the corresponding author on request.

Notes

  1. Input data (e.g., images) is transformed into a feature map by passing through CNN layers.

  2. ScratchPad Memory (Prefetch Buffer).

References

  1. Parashar, A., et al.: Scnn: an accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Arch. News 45(2), 27–40 (2017)

    Article  Google Scholar 

  2. Li, R., et al.: Retrieving real world clothing images via multi-weight deep convolutional neural networks. Clust. Comput. 22(3), 7123–7134 (2019)

    Article  Google Scholar 

  3. Aimar, A., et al.: Nullhop: a flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Netw. Learn. Syst. 30(3), 644–656 (2018)

    Article  Google Scholar 

  4. Xia, K.-J., Yin, H.-S., Wang, J.-Q.: A novel improved deep convolutional neural network model for medical image fusion. Clust. Comput. 22(1), 1515–1527 (2019)

    Article  MathSciNet  Google Scholar 

  5. Kwon, H., Samajdar, A., Krishna, T.: Maeri: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Not. 53(2), 461–475 (2018)

    Article  Google Scholar 

  6. Qasem, M.H., et al.: Matrix multiplication of big data using mapreduce: a review. In: 2017 2nd International Conference on the Applications of Information Technology in Developing Renewable Energy Processes & Systems (IT-DREPS). IEEE (2017)

  7. Zhu, X., et al.: Weighted pooling for image recognition of deep convolutional neural networks. Clust. Comput. 22(4), 9371–9383 (2019)

    Article  Google Scholar 

  8. Chen, Y.-H., et al.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2016)

    Article  Google Scholar 

  9. Kwon, H., Samajdar, A., Krishna, T.: Rethinking nocs for spatial neural network accelerators. In: 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE (2017)

  10. Li, J., et al.: SmartShuttle: optimizing off-chip memory accesses for deep learning accelerators. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE (2018)

  11. Zhang, S., et al.: Cambricon-x: an accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE (2016)

  12. Gao, M., et al.: Tetris: Scalable and efficient neural network acceleration with 3d memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (2017)

  13. Firuzan, A., et al.: Reconfigurable network-on-chip for 3D neural network accelerators. In: 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE (2018)

  14. Chen, K.-C., et al.: NoC-based DNN accelerator: a future design paradigm. in Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip (2019)

  15. Ascia, G., et al.: Networks-on-chip based deep neural networks accelerators for iot edge devices. In: 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS). IEEE (2019)

  16. Mirmahaleh, S.Y.H., Reshadi, M., Bagherzadeh, N.: Flow mapping on mesh-based deep learning accelerator. J. Parallel Distrib. Comput. 144, 80–97 (2020)

    Article  Google Scholar 

  17. Mirmahaleh, S.Y.H., et al.: Flow mapping and data distribution on mesh-based deep learning accelerator. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip. (2019)

  18. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)

  19. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  20. Qin, E., et al.: Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE (2020)

  21. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  22. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

  23. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  24. Huang, G., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

  25. Mahafzah, B.A., Tahboub, R.Y., Tahboub, O.Y.: Performance evaluation of broadcast and global combine operations in all-port wormhole-routed OTIS-Mesh interconnection networks. Clust. Comput. 13(1), 87–110 (2010)

    Article  Google Scholar 

  26. Mahafzah, B.A., et al.: The optical chained-cubic tree interconnection network: topological structure and properties. Comput. Electr. Eng. 38(2), 330–345 (2012)

    Article  Google Scholar 

  27. Abdullah, M., Abuelrub, E., Mahafzah, B.: The chained-cubic tree interconnection network. Int. Arab J. Inf. Technol. 8(3), 334–343 (2011)

    Google Scholar 

  28. Mahafzah, B.A., Al-Zoubi, I.O.: Broadcast communication operations for hyper hexa-cell interconnection network. Telecommun. Syst. 67(1), 73–93 (2018)

    Article  Google Scholar 

  29. Samajdar, A., et al.: Scale-sim: systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883 (2018)

  30. Zhao, Z., et al.: mRNA: enabling efficient mapping space exploration for a reconfiguration neural accelerator. In: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE (2019)

  31. Iandola, F.N., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(< 0.5\) MB model size. arXiv preprint arXiv:1602.07360 (2016)

Download references

Funding

This work has received no funding.

Author information

Authors and Affiliations

Authors

Contributions

Midia Reshadi introduced the subject and tools. Then, Babak NarimanJahan proposed the idea of the work and, by the approval of Midia Reshadi and Ahmad KhademZadeh, started analyzing, experimenting, and modifying mRNA based on the proposed idea. With the guidance of Midia Reshadi and the advice of Akram Reza, and the supervision of Ahmad KhademZadeh, Babak NarimanJahan analyzed the results of the experiments and wrote the manuscript. All authors read and approved the final version.

Corresponding author

Correspondence to Ahmad Khademzadeh.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Narimanjahan, B., Reshadi, M., Khademzadeh, A. et al. MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map. Cluster Comput 25, 3213–3230 (2022). https://doi.org/10.1007/s10586-021-03527-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03527-6

Keywords

Navigation