MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

Narimanjahan, Babak; Reshadi, Midia; Khademzadeh, Ahmad; Reza, Akram

doi:10.1007/s10586-021-03527-6

MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

Published: 02 February 2022

Volume 25, pages 3213–3230, (2022)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Babak Narimanjahan¹,
Midia Reshadi^1,2,
Ahmad Khademzadeh ORCID: orcid.org/0000-0002-8667-190X³ &
…
Akram Reza⁴

274 Accesses
Explore all metrics

Abstract

Today, the high accuracy of deep learning has led to use in various domains such as image and voice classification. However, vast computations of deep neural networks (DNNs) have caused the inefficiency of traditional processors, resulting in the emergence of hardware accelerators. DNN accelerators have increased performance by exploiting opportunities such as data reuse and sparsity. In these accelerators, the dataflow is an essential factor, so that some of them have a reconfigurable architecture to support different mappings and dataflows. However, the accelerators explicitly designed for exploiting the data sparsity are usually non-reconfigurable and have fixed dataflow. This paper presents a new dataflow called Channel Dimension Stationary (CDS) for the MAERI (a Reconfigurable Neural Network Accelerator). It can be used for convolutional layers with sparse input feature maps (ifmaps). In the proposed dataflow, computations are based on the Cartesian product method. However, multiplications leading to useless results are avoided. To analyze the mapping based on CDS dataflow, we upgraded the mRNA tool (mapper for Reconfigurable Neural Accelerators), which includes an energy and performance analyzer of the mapping strategy in MAERI. By evaluation, we found that in the sparse ifmaps of 50%, 70%, and 90%, the proposed mapping on average can increase energy efficiency by 3x, 6x, and 13x respectively, without noticeable reduction of utilization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Massively Parallel Neural Processing Array (MPNA): A CNN Accelerator for Embedded Systems

Reconfigurable spatial-parallel stochastic computing for accelerating sparse convolutional neural networks

Article 17 May 2023

Memory Bandwidth and Energy Efficiency Optimization of Deep Convolutional Neural Network Accelerators

Data availability

The original data and analyzer used and analyzed during the current study are available in the Github repository, https://github.com/georgia-tech-synergy-lab/mRNA. In addition, part of the data generated during the present study is included in this manuscript, and generated codes are available from the corresponding author on request.

Notes

Input data (e.g., images) is transformed into a feature map by passing through CNN layers.
ScratchPad Memory (Prefetch Buffer).

References

Parashar, A., et al.: Scnn: an accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Arch. News 45(2), 27–40 (2017)
Article Google Scholar
Li, R., et al.: Retrieving real world clothing images via multi-weight deep convolutional neural networks. Clust. Comput. 22(3), 7123–7134 (2019)
Article Google Scholar
Aimar, A., et al.: Nullhop: a flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Netw. Learn. Syst. 30(3), 644–656 (2018)
Article Google Scholar
Xia, K.-J., Yin, H.-S., Wang, J.-Q.: A novel improved deep convolutional neural network model for medical image fusion. Clust. Comput. 22(1), 1515–1527 (2019)
Article MathSciNet Google Scholar
Kwon, H., Samajdar, A., Krishna, T.: Maeri: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Not. 53(2), 461–475 (2018)
Article Google Scholar
Qasem, M.H., et al.: Matrix multiplication of big data using mapreduce: a review. In: 2017 2nd International Conference on the Applications of Information Technology in Developing Renewable Energy Processes & Systems (IT-DREPS). IEEE (2017)
Zhu, X., et al.: Weighted pooling for image recognition of deep convolutional neural networks. Clust. Comput. 22(4), 9371–9383 (2019)
Article Google Scholar
Chen, Y.-H., et al.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2016)
Article Google Scholar
Kwon, H., Samajdar, A., Krishna, T.: Rethinking nocs for spatial neural network accelerators. In: 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE (2017)
Li, J., et al.: SmartShuttle: optimizing off-chip memory accesses for deep learning accelerators. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE (2018)
Zhang, S., et al.: Cambricon-x: an accelerator for sparse neural networks. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE (2016)
Gao, M., et al.: Tetris: Scalable and efficient neural network acceleration with 3d memory. In: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (2017)
Firuzan, A., et al.: Reconfigurable network-on-chip for 3D neural network accelerators. In: 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS). IEEE (2018)
Chen, K.-C., et al.: NoC-based DNN accelerator: a future design paradigm. in Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip (2019)
Ascia, G., et al.: Networks-on-chip based deep neural networks accelerators for iot edge devices. In: 2019 Sixth International Conference on Internet of Things: Systems, Management and Security (IOTSMS). IEEE (2019)
Mirmahaleh, S.Y.H., Reshadi, M., Bagherzadeh, N.: Flow mapping on mesh-based deep learning accelerator. J. Parallel Distrib. Comput. 144, 80–97 (2020)
Article Google Scholar
Mirmahaleh, S.Y.H., et al.: Flow mapping and data distribution on mesh-based deep learning accelerator. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip. (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Qin, E., et al.: Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Huang, G., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Mahafzah, B.A., Tahboub, R.Y., Tahboub, O.Y.: Performance evaluation of broadcast and global combine operations in all-port wormhole-routed OTIS-Mesh interconnection networks. Clust. Comput. 13(1), 87–110 (2010)
Article Google Scholar
Mahafzah, B.A., et al.: The optical chained-cubic tree interconnection network: topological structure and properties. Comput. Electr. Eng. 38(2), 330–345 (2012)
Article Google Scholar
Abdullah, M., Abuelrub, E., Mahafzah, B.: The chained-cubic tree interconnection network. Int. Arab J. Inf. Technol. 8(3), 334–343 (2011)
Google Scholar
Mahafzah, B.A., Al-Zoubi, I.O.: Broadcast communication operations for hyper hexa-cell interconnection network. Telecommun. Syst. 67(1), 73–93 (2018)
Article Google Scholar
Samajdar, A., et al.: Scale-sim: systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883 (2018)
Zhao, Z., et al.: mRNA: enabling efficient mapping space exploration for a reconfiguration neural accelerator. In: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE (2019)
Iandola, F.N., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(< 0.5\) MB model size. arXiv preprint arXiv:1602.07360 (2016)

Download references

Funding

This work has received no funding.

Author information

Authors and Affiliations

Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
Babak Narimanjahan & Midia Reshadi
School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
Midia Reshadi
Iran Telecommunication Research Center (ITRC), Tehran, Iran
Ahmad Khademzadeh
Department of Computer Engineering, Shahr-e-Qods Branch, Islamic Azad University, Tehran, Iran
Akram Reza

Authors

Babak Narimanjahan
View author publications
You can also search for this author in PubMed Google Scholar
Midia Reshadi
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Khademzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Akram Reza
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Midia Reshadi introduced the subject and tools. Then, Babak NarimanJahan proposed the idea of the work and, by the approval of Midia Reshadi and Ahmad KhademZadeh, started analyzing, experimenting, and modifying mRNA based on the proposed idea. With the guidance of Midia Reshadi and the advice of Akram Reza, and the supervision of Ahmad KhademZadeh, Babak NarimanJahan analyzed the results of the experiments and wrote the manuscript. All authors read and approved the final version.

Corresponding author

Correspondence to Ahmad Khademzadeh.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narimanjahan, B., Reshadi, M., Khademzadeh, A. et al. MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map. Cluster Comput 25, 3213–3230 (2022). https://doi.org/10.1007/s10586-021-03527-6

Download citation

Received: 06 June 2021
Revised: 29 October 2021
Accepted: 23 December 2021
Published: 02 February 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10586-021-03527-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

Abstract

Access this article

Similar content being viewed by others

Massively Parallel Neural Processing Array (MPNA): A CNN Accelerator for Embedded Systems

Reconfigurable spatial-parallel stochastic computing for accelerating sparse convolutional neural networks

Memory Bandwidth and Energy Efficiency Optimization of Deep Convolutional Neural Network Accelerators

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

Abstract

Access this article

Similar content being viewed by others

Massively Parallel Neural Processing Array (MPNA): A CNN Accelerator for Embedded Systems

Reconfigurable spatial-parallel stochastic computing for accelerating sparse convolutional neural networks

Memory Bandwidth and Energy Efficiency Optimization of Deep Convolutional Neural Network Accelerators

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation