Godiva: green on-chip interconnection for DNNs

Asad, Arghavan; Mohammadi, Farah

doi:10.1007/s11227-022-04749-0

Godiva: green on-chip interconnection for DNNs

Published: 16 August 2022

Volume 79, pages 2404–2430, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Arghavan Asad¹ &
Farah Mohammadi¹

254 Accesses
Explore all metrics

Abstract

The benefits of deep neural networks (DNNs) and other big-data algorithms have led to their use in almost every modern application. The rising use of DNNs in diverse domains including computer vision, speech recognition, image classification, and prediction has increased the demand for energy-efficient hardware architectures. Massive amounts of parallel processing in large-scale DNN algorithms have made communication and storage a strong wall in front of a DNN’s power and performance. Nowadays, DNNs have gained a great deal of success by utilizing the inherent parallelism of GPU architectures. However, recent research shows that the integration of CPUs and GPUs presents a more efficient solution for running the next generation of machine learning (ML) chips. Designing interconnection networks for a heterogenous CPU-GPU platform are a challenge (especially for the execution of DNN workloads) as it must be scalable and efficient. A study in this work shows that the majority of traffic in DNN workloads is associated with last level caches (LLCs). Therefore, there is a need to design a low-overhead interconnect fabric to minimize the energy and access time to the LLC banks. To address this issue, a low-overhead on-chip interconnection, named Godiva, for running DNNs energy-efficiently has been proposed. Godiva interconnection affords low LLCs accesses delay using a low-overhead and small cost hardware in a heterogenous CPU-GPU platform. An experimental evaluation targeting a 16CPU-48GPU system and a set of popular DNN workloads reveals that the proposed heterogenous architecture improves system energy by about 21.7 × and reduces interconnection network area by about 51% when compared to a mesh-based CPU design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Fig. 10.

An architecture-level analysis on deep learning models for low-impact computations

Article Open access 26 June 2022

Evaluating Performance, Power and Energy of Deep Neural Networks on CPUs and GPUs

Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

Data availability

We confirm that all relevant data and results are included within the article.

References

Inci A, Bolotin E, Fu Y, Dalal G, Mannor S, Nellans D, Marculescu D (2020) The architectural implications of distributed reinforcement learning on CPU-GPU systems. arXiv:2012.04210
Russakovsky Olga, Deng Jia, Hao Su, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Bakhoda A, Yuan GL, Fung WW, Wong H, Aamodt TM (2009) Analyzing CUDA workloads using a detailed GPU simulator. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp 163–174. IEEE
Espeholt L, Marinier R, Stanczyk P, Wang K, Michalski M (2019) Seed rl: scalable and efficient deep-rl with accelerated central inference. arXiv:1910.06591
Kayiran O, Nachiappan NC, Jog A, Ausavarungnirun R, Kandemir MT, Loh GH, Mutlu O, Das CR (2014) Managing GPU concurrency in heterogeneous architectures. In: 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp 114–126. IEEE
Kim Ryan Gary, Doppa Janardhan Rao, Pande Partha Pratim, Marculescu Diana, Marculescu Radu (2018) Machine learning and manycore systems design: a serendipitous symbiosis. Computer 51(7):66–77
Article Google Scholar
Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
Article Google Scholar
Inci A, Isgenc MM, Marculescu D (2020) DeepNVM++: cross-layer modeling and optimization framework of non-volatile memories for deep learning. arXiv:2012.04559
Nabavinejad Seyed Morteza, Baharloo Mohammad, Chen Kun-Chih, Palesi Maurizio, Kogel Tim, Ebrahimi Masoumeh (2020) An overview of efficient interconnection networks for deep neural network accelerators. IEEE J Emerg Sel Top Circuits Syst 10(3):268–282
Article Google Scholar
Chen Y-H, Krishna T, Emer JS, Sze V (2016) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138
Article Google Scholar
Talebi M, Salahvarzi A, Monazzah AMH, Skadron K, Fazeli M (2020) ROCKY: a robust hybrid on-chip memory kit for the processors with STT-MRAM cache technology. IEEE Trans Comput 70(12):2198–2210
MATH Google Scholar
Chen Y-H, Emer J, Sze V (2017) Using dataflow to optimize energy efficiency of deep neural network accelerators. IEEE Micro 37(3):12–21
Article Google Scholar
Reza, MF, Ampadu P (2019) Energy-efficient and high-performance NoC architecture and mapping solution for deep neural networks. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, pp 1–8
Mirmahaleh SYH, Reshadi M, Shabani H, Guo X, Bagherzadeh N (2019) Flow mapping and data distribution on mesh-based deep learning accelerator. In: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, pp 1–8
Luo T, Liu S, Li L, Wang Y, Zhang S, Chen T, Zhiwei Xu, Temam O, Chen Y (2016) DaDianNao: a neural network supercomputer. IEEE Trans Comput 66(1):73–88
Article MathSciNet Google Scholar
Liu X, Wen W, Qian X, Li H, Chen Y (2018) Neu-NoC: a high-efficient interconnection network for accelerated neuromorphic systems. In: 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), pp 141–146. IEEE
Wong HSP, Raoux S, Kim S, Liang J, Reifenberg JP, Rajendran B, Asheghi M, Goodson KE (2010) Phase change memory. In: Proceedings of the IEEE 98, 12: 2201–2227
Liu X, Mao M, Liu B, Li B, Wang Y, Jiang H, Barnell M et al (2016) Harmonica: a framework of heterogeneous computing systems with memristor-based neuromorphic computing accelerators. In: IEEE Transactions on Circuits and Systems I: Regular Papers 63, 5: 617–628
Endoh T (2021) 3D integration of memories including heterogeneous integration. In: 2021 International Symposium on VLSI Technology, Systems and Applications (VLSI-TSA), pp 1–2. IEEE
Joardar BK, Doppa JR, Pande PP, Marculescu D, Marculescu R (2018) Hybrid on-chip communication architectures for heterogeneous manycore systems. In: 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp 1–6. IEEE
Bernstein L, Sludds A, Hamerly R, Sze V, Emer J, Englund D (2021) Freely scalable and reconfigurable optical hardware for deep learning. Sci Rep 11(1):1–12
Article Google Scholar
Karkar A, Mak T, Tong K-F, Yakovlev A (2016) A survey of emerging interconnects for on-chip efficient multicast and broadcast in many-cores. IEEE Circuits Syst Mag 16(1):58–72
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Google Scholar
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv:1405.3531
Kim D, Kung J, Chai S, Yalamanchili S, Mukhopadhyay S (2016) Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory. ACM SIGARCH Comput Archit News 44(3):380–392
Article Google Scholar
Power J, Hestness J, Orr MS, Hill MD, Wood DA (2014) gem5-gpu: A heterogeneous cpu-gpu simulator. IEEE Comput Archit Lett 14(1):34–36
Article Google Scholar
Binkert Nathan, Beckmann Bradford, Black Gabriel, Reinhardt Steven K, Saidi Ali, Basu Arkaprava, Hestness Joel et al (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7
Article Google Scholar
Leng Jingwen, Hetherington Tayler, ElTantawy Ahmed, Gilani Syed, Kim Nam Sung, Aamodt Tor M, Reddi Vijay Janapa (2013) GPUWattch: enabling energy optimizations in GPGPUs. ACM SIGARCH Comput Archit News 41(3):487–498
Article Google Scholar
Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pp 469–480
Agarwal N, Krishna T, Peh LS, Jha NK (2009) GARNET: a detailed on-chip network model inside a full-system simulator. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp 33–42. IEEE
Deng Li (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142
Article Google Scholar
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Greg S, Corrado et al (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467
Lotfi-Kamran P, Grot B, Falsafi B (2012) NOC-Out: microarchitecting a scale-out processor. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp 177–187. IEEE
Lee J, Li Si, Kim H, Yalamanchili S (2013) Design space exploration of on-chip ring interconnection for a CPU–GPU heterogeneous architecture. J Parallel Distrib Comput 73(12):1525–1538
Article Google Scholar
Alhubail L, Jasemi M, Bagherzadeh N (2020) Noc design methodologies for heterogeneous architecture In: 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 299–306. IEEE
Mishra AK, Vijaykrishnan N, Das CR (2011) A case for heterogeneous on-chip interconnects for CMPs. ACM SIGARCH Comput Archit News 39(3):389–400
Article Google Scholar
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp 44–54. IEEE

Download references

Author information

Authors and Affiliations

Electrical, Computer, and Biomedical Engineering Department, Ryerson University, 350 Victoria St, Toronto, ON, M5B 2K3, Canada
Arghavan Asad & Farah Mohammadi

Authors

Arghavan Asad
View author publications
You can also search for this author inPubMed Google Scholar
Farah Mohammadi
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Arghavan Asad.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Asad, A., Mohammadi, F. Godiva: green on-chip interconnection for DNNs. J Supercomput 79, 2404–2430 (2023). https://doi.org/10.1007/s11227-022-04749-0

Download citation

Accepted: 30 July 2022
Published: 16 August 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s11227-022-04749-0

Keywords

Part of a collection:

Computer Science SDG 7: Affordable and Clean Energy

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Godiva: green on-chip interconnection for DNNs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An architecture-level analysis on deep learning models for low-impact computations

Evaluating Performance, Power and Energy of Deep Neural Networks on CPUs and GPUs

Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now