Abstract
Multi-layer perceptron (MLP) is a class of Artificial Neural Networks widely used in regression, classification, and prediction. To accelerate the training of MLP, more cores can be used for parallel computing on many-core systems. However, with the increasing number of cores integrated into the chip, the communication bottleneck in the training of MLP on electrical network-on-chip (ENoC) becomes severe, degrading MLP training performance. Replacing ENoC with optical network-on-chip (ONoC) can break the communication bottleneck in MLP training. To facilitate the development of ONoC for MLP training, it is necessary to compare and model the MLP training performance of ONoC and ENoC in advance. This paper first analyzes and compares the differences between ONoC and ENoC. Then, we formulate the performance and energy model of MLP training on ONoC and ENoC by analyzing the communication and computation time, static energy, and dynamic energy consumption, respectively. Furthermore, we conduct extensive simulations to compare their MLP training performance and energy consumption with our simulation infrastructure. The experimental results show the MLP training time of ONoC has been reduced by 65.16% and 52.51% on average in different numbers of cores and batch sizes compared with ENoC. The results also exhibit that ONoC overall has 54.86% and 43.13% on average energy reduction in different numbers of cores and batch sizes compared with ENoC. However, with a small number of cores (e.g., less than 50) in MLP training, ENoC consumes less energy than ONoC. These experiments confirm that generally ONoC is a good replacement for ENoC when using a large number of cores in terms of performance and energy consumption for MLP training.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Dai F, Chen Y, Huang Z, Zhang H (2021) Performance comparison of multi-layer perceptron training on electrical and optical network-on-chips. In: International Conference on Parallel and Distributed Computing: Applications and Technologies, Springer, pp 129–141
Nabavinejad SM, Baharloo M, Chen K-C, Palesi M, Kogel T (2020) An overview of efficient interconnection networks for deep neural network accelerators. IEEE J Emerg Sel Topics Circuits Syst 10(3):268–282
Liu F, Zhang H, Chen Y, Huang Z, Huaxi G (2017) Wavelength-reused hierarchical optical network on chip architecture for manycore processors. IEEE Trans Sustain Comput 4(2):231–244
Yang W, Chen Y, Huang Z, Zhang H (2017) Rwadmm: routing and wavelength assignment for distribution-based multiple multicasts in onoc. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), IEEE pp 550–557
Dai F, Chen Y, Zhang H, Huang Z (2021) Accelerating fully connected neural network on optical network-on-chip (onoc). arXiv preprint arXiv:2109.14878
Zhao Y, Ge F, Cui C, Zhou F, Wu N (2020) A mapping method for convolutional neural networks on network-on-chip. In: 2020 IEEE 20th International Conference on Communication Technology (ICCT), IEEE, pp 916–920
Khan ZA, Abbasi U, Kim SW (2022) An efficient algorithm for mapping deep learning applications on the noc architecture. Appl Sci 12(6):3136
Mirmahaleh SYH, Rahmani AM (2019) Dnn pruning and mapping on noc-based communication infrastructure. Microelectron J 94:104655
Chen Y-H, Krishna T, Emer JS, Vivienne S (2016) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138
Lu W, Yan G, Li J, Gong S, Han Y, Li X (2017) Flexflow: a flexible dataflow accelerator architecture for convolutional neural networks. In: 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), IEEE, pp 553–564
Kwon H, Samajdar A, Krishna T (2018) Maeri: enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. ACM SIGPLAN Not 53(2):461–475
Yasoubi A, Hojabr R, Takshi H, Modarressi M, Daneshtalab M (2015) Cupan–high throughput on-chip interconnection for neural networks. In: International Conference on Neural Information Processing, Springer, pp 559–566
Liu X, Wen W, Qian X, Li H, Chen Y (2018) Neu-noc: a high-efficient interconnection network for accelerated neuromorphic systems. In: 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE, pp 141–146
Firuzan A, Modarressi M, Daneshtalab M, Reshadi M (2018) Reconfigurable network-on-chip for 3d neural network accelerators. In: 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS), IEEE, pp 1–8
Dong Y, Kumai K, Lin Z, Li Y, Watanabe T (2009) High dependable implementation of neural networks with networks on chip architecture and a backtracking routing algorithm. In: 2009 Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics (PrimeAsia), IEEE, pp 404–407
Akopyan F, Sawada J, Cassidy A, Alvarez-Icaza R, Arthur J, Merolla P, Imam N, Nakamura Y, Datta P, Nam G-J et al (2015) Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans Comput Aided Des Integr Circuits Syst 34(10):1537–1557
Kim J-Y, Park J, Lee S, Kim M, Oh J, Yoo H-J (2010). A 118.4 gb/s multi-casting network-on-chip with hierarchical star-ring combined topology for real-time object recognition. IEEE J Solid-State Circuits 45(7):1399–1409
Pan Y, Kumar P, Kim J, Memik G, Zhang Y, Choudhary A (2009) Firefly: illuminating future network-on-chip with nanophotonics. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, pp 429–440
Pan Y, Kim J, Memik G (2010) Flexishare: channel sharing for an energy-efficient nanophotonic crossbar. In: IEEE Intl. Symp. High Perf. Comput. Archite. (HPCA), pp 1–12
Vantrease D, Schreiber R, Monchiero M, McLaren M, Jouppi N, Fiorentino M, Davis A, Binkert N, Beausoleil R, Ahn J (2008) Corona: system implications of emerging nanophotonic technology. In: ACM/IEEE Proc. ISCA, pp 153–164
Kurian G, Miller J, Psota J, Eastep J, Liu J, Michel J, Kimerling L, Agarwal A (2010) ATAC: a 1000-core cache-coherent processor with on-chip optical network. In: ACM Intl. Conf. Parallel Architectures and Compilation Techniques (PACT), pp 153–164
Bashir J, Eldhose Peter, Sarangi Smruti R (2019) Bigbus: a scalable optical interconnect. ACM J Emerg Technol Comput Syst (JETC) 15(1):1–24
Bashir J, Sarangi SR (2017) Nuplet: a photonic based multi-chip nuca architecture. In: 2017 IEEE International Conference on Computer Design (ICCD), IEEE, pp 617–624
Kavyan Ziabari AK, Abellán JL, Ubal R, Chen C, Joshi A, Kaeli D (2015)Leveraging silicon-photonic noc for designing scalable gpus. In: Proceedings of the 29th ACM on International Conference on Supercomputing, pp 273–282
Bashir J, Sarangi SR (2020) Gpuopt: power-efficient photonic network-on-chip for a scalable gpu. ACM J Emerg Technol Comput Syst (JETC) 17(1):1–26
Yahya MR, Wu N, Ali ZA, Khizar Y (2021) Optical versus electrical: performance evaluation of network on-chip topologies for uwasn manycore processors. Wireless Pers Commun 116(2):963–991
Okada R, Power and performance comparison of electronic 2d-noc and opto-electronic 2d-noc
Touza R, Martínez J, Álvarez M, Roca J (2022) Obtaining anti-missile decoy launch solution from a ship using machine learning techniques. Int J Interact Multimed Artif Intell 7(4)
Bashir J, Goodchild C, Sarangi SR (202) Seconet: a security framework for a photonic network-on-chip. In: 2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS), IEEE, pp 1–8
Liu F, Zhang H, Chen Y, Huang Z, Gu H (2016) Dynamic ring-based multicast with wavelength reuse for optical network on chips. In: 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC), pp 153–160
Bashir J, Peter E, Sarangi SR (2019) A survey of on-chip optical interconnects. ACM Comput Surv (CSUR) 51(6):1–34
Peter E, Sarangi SR (2015) Optimal power efficient photonic swmr buses. In: 2015 Workshop on Exploiting Silicon Photonics for Energy-Efficient High Performance Computing, pp 25–32
Gibbons PB (1989) A more practical pram model. In: Proceedings of the First Annual ACM Symposium on Parallel Algorithms and Architectures, pp 158–168
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111
David C, Richard K, David P, Abhijit S, Klaus Erik S, Eunice S, Ramesh S, Thorsten VE (1993) Logp: towards a realistic model of parallel computation. In: Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp 1–12
Gianfranco B, Herley KT, Andrea P, Geppino P, Paul S (1996) Bsp vs logp. In: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, pp 25–32
Abbas Eslami K, Dara R, Hamid SA, Shaahin H (2008) A markovian performance model for networks-on-chip. In: 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), pp 157–164
Tikir MM, Laura C, Erich S, Allan S (2007) A genetic algorithms approach to modeling the performance of memory-bound computations. In: SC’07: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, IEEE, pp 1–12
Zhuang X, Liberatore V (2005) A recursion-based broadcast paradigm in wormhole routed networks. IEEE Trans Parallel Distrib Syst 16(11):1034–1052
Grani P, Bartolini S (2014) Design options for optical ring interconnect in future client devices. ACM J Emerg Technol Comput Syst (JETC) 10(4):1–25
Zhouhan L, Roland M, Kishore K (2015) How far can we go without convolution: Improving fully-connected networks. arXiv preprint arXiv:1511.02580
Cireşan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep, big, simple neural nets for handwritten digit recognition. Neural Comput 22(12):3207–3220
Kadam SS, Adamuthe AC, Patil AB (2020) Cnn model for image classification on mnist and fashion-mnist dataset. J Sci Res 64(2):374–384
Abouelnaga Y, Ali OS, Rady H, Moustafa M (2016) Cifar-10: Knn-based ensemble of classifiers. In: 2016 International Conference on Computational Science and Computational Intelligence (CSCI), IEEE, pp 1192–1195
Kågström B, Ling P, Van Loan C (1998) Gemm-based level 3 blas: high-performance model implementations and performance evaluation benchmark. ACM Trans Math Softw (TOMS) 24(3):268–302
Li RM, King CT, Das B (2016) Extending gem5-garnet for efficient and accurate trace-driven noc simulation. In: Proceedings of the 9th International Workshop on Network on Chip Architectures, pp 3–8
Sun C, Owen Chen CH, Kurian G, Wei L, Miller J, Agarwal A, Peh LS, Stojanovic V (2012) Dsent-a tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In: 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip, IEEE, pp 201–210
Laer van A (2018) The effect of an optical network on-chip on the performance of chip multiprocessors. PhD thesis, UCL (University College London)
Zhang X, Louri A.(2010) A multilayer nanophotonic interconnection network for on-chip many-core communications. In: ACM/IEEE Proc. DAC, pp 156–161
Vlasov Y, Green WMJ, Xia F (2008) High-throughput silicon nanophotonic wavelength-insensitive switch for on-chip optical networks. Nat Photonics 2(4):242–246
Deng L, Li G, Han S, Shi L, Xie Y (2020) Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc IEEE 108(4):485–532
Acknowledgements
The authors thank the reviewers for taking the time and effort necessary to review the manuscript. Besides, we acknowledge the use of New Zealand eScience Infrastructure (NeSI) high-performance computing facilities as part of this research (Project code: uoo03633).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This study extends the work in [1], which is published at The 22nd International Conference on Parallel and Distributed Computing, Applications, and Technologies in 2022.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dai, F., Chen, Y., Huang, Z. et al. Comparing the performance of multi-layer perceptron training on electrical and optical network-on-chips. J Supercomput 79, 10725–10746 (2023). https://doi.org/10.1007/s11227-022-04945-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04945-y