Abstract
The advancement of networks-on-chip (NoCs) is noteworthy as the number of cores increases. The bandwidth demand has grown steadily as network traffic has increased owing to high-workload applications. The NoC traffic broadly divided into control messages and data messages in which data messages are bigger in size. As NoC channel bandwidth sets in proportion to the size of the data messages, the NoC bandwidth remains underutilized during control messages transmission. This adversely affects NoC power and performance efficiency. In modern NoC architectures, multiple NoC is popular to efficiently utilize NoC bandwidth because it offers more than one physical channel for traffic communication. The conventional multiple-NoC architectures statically distribute traffic between the NoCs. This significantly affects the power-performance metrics. We have observed up to fivefold variation in energy efficiency during the analysis of static traffic distribution for multiple NoC. In this paper, we propose an adaptive distribution of control messages for multiple NoC to improve bandwidth utilization. The traversal of control messages switch between the NoC networks according to the runtime utilization of networks. The proposed adaptive distribution of control messages improves energy efficiency up to \(72.7\%\) and \(66.9\%\) on average over single-NoC and static traffic distribution in multiple NoC, respectively. The link utilization also improves by \(1.37\times\) and \(40\%\) on average over single-NoC and conventional static traffic distribution, respectively. Thus, the proposed adaptive distribution overcomes the implications of static traffic distribution.
Similar content being viewed by others
Data Availability Statement
Not applicable.
Notes
Single-Chip Cloud Computer.
bits per second.
Giga Bytes per second.
Tera bits per second.
The number of messages.
The dotted red box exhibits volume of control messages, whereas blue-lined box (the rightmost column of the table) highlights the percentage variation in result metric.
We shall interchangeably use the term percentage/fraction/ratio/proportion.
Least Recently Used (LRU) replacement policy.
The names of fine-grained messages indicate message class, message type, and cache association.
Less-frequent messages.
Gem5 simulator facilitates various CPU models, Instruction Set Architecture (ISAs), caches, and cache-coherence protocol models.
In full system simulation, the hardware of computer system is simulated at the level of the details such that the complete software stacks from real systems can run on the simulator.
router connects with rest of the routers via four links {N, S, E, W} in North, South, East, and West directions specific to mesh topology.
RMS-recognition mining synthesis.
The data patterns further dependent on number of cores, cache size, and cache line size.
FLoating pOint oPerations: this is computation unit that is used in high-performance computing for improving accuracy in results.
these are the synchronization primitives.
The geometric mean is the standard way to represent the results for averaging normalized results [61].
for the classification of the benchmarks, please refer Table 10.
One message transmits through more than one NoC.
Each NoC carries different types of messages.
References
Yadav S (2022) Interconnect paradigm shift towards networks-on-chip in manycore processors: a review on challenges. In: Kumar R, Ahn CW, Sharma TK, Verma OP, Agarwal A (eds) Soft computing: theories and applications. Lecture notes in networks and systems. Springer, Cham, p 425
Morgan AA, Hassan AS, Watheq El-Kharashi M, Tawfik A (2020) NoC\(^2\): an efficient interfacing approach for heavily-communicating NoC-based systems. IEEE Access 8(2020):185992–186011
Zhang C, Zhao C, He J, Chen S, Zheng L, Huang K, Han W, Zhai J (2021) Critique of planetary normal mode computation: parallel algorithms, performance, and reproducibility by SCC Team From Tsinghua University. IEEE Trans Parallel Distribut Syst 32(11):2631–2634
Kang J-H, Hwang J, Hyung JS, Ryu H (2021) High-performance simulations of turbulent boundary layer flow using Intel Xeon Phi many-core processors. J Supercomput 77(9):9597–9614
Ginosar R (2021) The plural many-core architecture-high performance at low power. In: Multi-processor system-on-chip 1: architectures, pp. 53-68
Das R, Narayanasamy S, Satpathy SK, Dreslinski RG (2013) Catnap: energy proportional multiple network-on-chip. ACM SIGARCH Comput Archit News 41(2013):320–331
Zhou W, Ouyang Y, Li J, Dongyu X (2023) A transparent virtual channel power gating method for on-chip network routers. Integration 88:286–297
Yadav S, Laxmi V, Gaur MS, Kapoor HK (2019) Improving static power efficiency via placement of network demultiplexer over control plane of router in multi-NoCs. In: Proceedings of 56th ACM/IEEE Design Automation Conference (DAC). IEEE, pp. 1–2
Yadav S, Raj R (2022) Power efficient network selector placement in control plane of multiple networks-on-chip. J Supercomput 78(2022):6664–6695
Zhou W, Ouyang Y, Xu D, Huang Z, Liang H, Wen X (2023) Energy-efficient multiple network-on-chip architecture With bandwidth expansion. IEEE Trans Very Large Scale Integr (VLSI) Syst Preprint 1–14
Yoon YJ, Concer N, Petracca M, Carloni L (2010) Virtual channels vs. multiple physical networks: a comparative analysis. In: Proceedings of 47th Conference on Design Automation Conference (DAC), ACM/EDAC/IEEE, pp. 162–165
Yadav S, Laxmi V, Gaur MS (2020) Multiple-NoC exploration and customization for energy efficient traffic distribution. In: IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SOC). IEEE, pp. 200-201
Hesham S, Goehringer D, Abd MA, Ghany E (2020) HPPT-NoC: a dark-silicon inspired hierarchical TDM NoC with efficient power-performance trading. IEEE Trans Parallel Distrib Syst 31(3):675–694
Shafique M, Garg S (2017) Computing in the dark silicon era: current trends and research challenges. IEEE Des Test 34(2017):8–23
Yao Y (2023) Game-of-life temperature-aware DVFS strategy for tile-based chip many-core processor. IEEE J Emerging Sel Top Circuits Syst
Li Z, Miguel JS, Jerger NE (2016) The runahead network-on-chip. In: 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, pp. 333–344
Liu Z, Li G, Cheng J (2023) Efficient accelerator/network co-search with circular greedy reinforcement learning. IEEE Trans Circuits Syst II
Lu H, Yan G, Han Y, Wang Y, Li X (2015) ShuttleNoC: boosting on-chip communication efficiency by enabling localized power adaptation. In: Proceedings of 20th Asia and South Pacific Design Automation Conference, IEEE, pp. 142–147
Lu H, Chang Y, Yan G, Lin N, Wei X, Li X (2019) ShuttleNoC: power-adaptable communication infrastructure for many-core processors. IEEE Trans Comput Aided Des Integr Circuits Syst 38:1438–1451
Asadi B, Zia SM, Al-Khafaji HMR, Mohamadian A (2023) Network-on-chip and photonic network-on-chip basic concepts: a survey. J Electron Test
Yoon YJ, Concer N, Petracca M, Carloni LP (2013) Virtual channels and multiple physical networks: two alternatives to improve NoC performance. IEEE Trans Comput Aided Des Integr Circuits Syst 32:1906–1919
Li X, Yan G, Liu C (2023) Fault-tolerant network-on-chip. In: Built-in fault-tolerant computing paradigm for resilient large-scale chip design. Springer: Singapore
Kadri N, Koudil M (2019) A survey on fault-tolerant application mapping techniques for Network-on-Chip. J Syst Archit 92:39–52
Sepúlveda J, Flórez D, Gogniat G (2015) Reconfigurable security architecture for disrupted protection zones in NoC-based MPSoCs. In: Proceedings of 10th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC). IEEE, pp. 1–8
Ali U, Sahni SAR, Khan O (2023) Characterization of timing-based software side-channel attacks and mitigations on network-on-chip hardware. J Emerg Technol Comput Syst
Baharloo M, Aligholipour R, Abdollahi M, Khonsari A (2020) ChangeSUB: a power efficient multiple network-on-chip architecture. Comput Electr Eng 83(2020):106578
Aligholipour R, Baharloo M, Farzaneh B, Abdollahi M, Khonsari A (2021) TAMA: turn-aware mapping and architecture: a power-efficient network-on-chip approach. ACM Trans Embed Comput Syst 20(2021):1–24
Rovinski A (2022) Towards free, open, and ubiquitous hardware design. University of Michigan, PhD dissertation
Alimi I, Aboderin O, Muga NJ, Teixeira AL (eds) (2022). IntechOpen, England
Alagarsamy A, Mahilmaran S, Gopalakrishnan L, Ko S-B (2023) SaHNoC: an optimal energy efficient hybrid networks-on-chip architecture. J Supercomp 79:6538–6559
Yadav S, Laxmi V, Gaur MS (2016) A power efficient dual link mesh NOC architecture to support nonuniform traffic arbitration at routing logic. In: Proceedings of the 29th International Conference on VLSI Design (VLSID). IEEE, pp. 69–74
Yadav S (2022) A study on requests serialization in directory-based protocol for MESI cache coherence protocol. In: Soft Computing: Theories and Applications: Proceedings of SoCTA 2021. Springer, pp. 761–768
Yadav S, Laxmi V, Kapoor HK, Gaur MS, Zwolinski M (2018) A power efficient crossbar arbitration in multi-NoC for multicast and broadcast traffic. In: Proceedings of International Conference on IEEE International Symposium on Smart Electronic Systems (IEEE-iSES). IEEE
Yadav S, Laxmi V, Gaur MS, Bhargava M (2015) C\(^2\) -DLM: cache coherence aware dual link mesh for on-chip interconnect. In: Proceedings 19th IEEE International Symposium on VLSI Design and Test, IEEE
Zhou W, Ouyang Y, Lu Y, Liang H (2022) A router architecture with dual input and dual output channels for Networks-on-Chip. Microprocess Microsyst 90:104464
Yoon YJ (2017) Design and optimization of Networks-on-Chip for future heterogeneous systems-on-chip. Thesis of Columbia University
Volos S, Seiculescu C, Grot B, Pour NK, Falsafi B, Micheli G de (2012) CCNoC: specializing on-chip interconnects for energy efficiency in cache-coherent servers. In: IEEE/ACM Sixth International Symposium on Networks-on-Chip. IEEE, pp. 67–74
Mirhosseinia A, Sadrosadatib M, Soltanic B, Sarbazi-Azadb H (2022) A power-performance balanced network-on-chip for mixed CPU-GPU systems. Adv Comput 2022:45
Balfour J, Dally WJ (2006) Design tradeoffs for tiled CMP on-chip networks. In: ACM International Conference on Supercomputing 25th Anniversary Volume. ACM, pp. 390–401
Kunthara RG, James RK, Sleeba SZ, Jose J (2022) DAReS: deflection aware rerouting between subnetworks in bufferless on-chip networks. In: Proceedings of the Great Lakes Symposium on VLSI, pp. 211–216
Miguel JS, Jerger NE (2015) Data criticality in network on chip design. In: Proceedings of the 9th International Symposium on Networks-on-Chip (NOCS). ACM, pp. 1–8
Mishra AK, Mutlu O, Das CR (2013) A heterogeneous multiple network-on-chip design: an application-aware approach. In: 50th ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, pp. 1–10
Mandal SK, Ayoub R, Kishinevsky M, Ogras UY (2019) Analytical performance models for NoCs with multiple priority traffic classes. ACM Trans Embed Comput Syst (TECS) 18(5s):1–21
Buckler M, Burleson W, Sadowski G (2013) Low-power networks-on-chip: progress and remaining challenges. In: International Symposium on Low Power Electronics and Design (ISLPED). IEEE, pp. 132–134
Trik M, Akhavan H, Bidgoli AM, Molk AMNG, Vashani H, Mozaffari SP (2023) A new adaptive selection strategy for reducing latency in networks on chip. Integration 89:9–24
Trik M, Molk AMNG, Ghasemi F, Pouryeganeh P (2022) A hybrid selection strategy based on traffic analysis for improving performance in networks on chip. J Sensors, 3112170
Ofori-Attah E, Agyeman MO (2017) A survey of low power NoC design techniques. In: Proceedings of the 2nd International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems (AISTECS ’17). Association for Computing Machinery, New York, NY, USA, pp. 22–27
Singh R, Bohra M, Hemrajani P, Kalla A, Bhatt DP, Purohit N, Daneshtalab M (2022) Review, analysis, and implementation of path selection strategies for 2D NoCs. IEEE Access 10:129245–129268
Rad F, Reshadi M, Khademzadeh A (2020) A survey and taxonomy of congestion control mechanisms in wireless network on chip. J Syst Archit 108:101807
Fang Z, Cheng L, Vangal SR (2009) Using criticality information to route cache coherency communications. U.S. Patent US20090300292 A1
Nicopoulos CA, Park D, Kim J, Vijaykrishnan N, Yousif MS, Das CR (2006) ViChaR: a dynamic virtual channel regulator for network-on-chip routers. In: Proceedings of the Thirty-ninth IEEE/ACM International Symposium on Microarchitecture (MICRO’06), Orlando, FL, pp. 333–346
Lai M, Wang Z, Gao L, Lu H, Dai K (2008) A dynamically-allocated virtual channel architecture with congestion awareness for on-chip routers. In: Proceedings of the Forty-fifth ACM/IEEE Design Automation Conference, Anaheim, CA, pp. 630–633
Baharloo M, Khonsari A, Dolati M, Shiri P, Ebrahimi M, Rahmati D (2020) Traffic-aware performance optimization in Real-time wireless network on chip. Nano Commun Netw 26:100321
Gogte V, Kolli A, Wenisch TF (2022) A primer on memory persistency. Synth Lect Comput Architect 1(2022):1–115
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S et al (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2011):1–7
Semakin A (2021) Simulation of a multi-core computer system in the gem5 simulator. In: AIP Conference Proceedings. https://doi.org/10.1063/5.0035841
Maron CAF, Vogel A, Griebler D, Fernandes LG (2019) Should PARSEC benchmarks be more parametric? a case study with Dedup. In: 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 217–221. IEEE
Krishna T (2017) A detailed on-chip network model inside a full-system simulator. In: Gem5 Workshop. ARM Research Summit
Zhang H, Chen Y, Huang Z, Xia C, Liang J, Gu H (2021) Comparative analysis of simulators for optical network-on-chip (ONoC). In: 12th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), IEEE, pp. 19–23
Sethi MAJ, Hussin FA, Hamid NH (2017) Network-on-Chip (NoC) topologies and performance: a review. Rev Netw Chip Archit 10(1):4–29
Vogel RM (2022) The geometric mean? Commun Stat Theory Methods 51(1):82–94
Xiang X, Sigdel P, Tzeng N-F (2020) Bufferless network-on-chips with bridged multiple subnetworks for deflection reduction and energy savings. IEEE Trans Comput 69(2020):577–590
Baharloo M, Khonsari A (2018) A low-power wireless-assisted multiple network-on-chip. Microprocess Microsyst 63(2018):104–115
Duraisamy K, Hao L, Pande PP, Kalyanaraman A (2016) High-performance and energy-efficient network-on-chip architectures for graph analytics. ACM Trans Embed Comput Syst 15(2016):1–26
Dasari UK, Temam O, Narayanaswami R, Woo DH (2021) Apparatus and mechanism for processing neural network tasks using a single chip package with multiple identical dies. U.S. Patent 10, 15/819,753
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
The first author originated the idea, implemented it and wrote the paper. The second, third, and fourth authors all helped to conceptualize the idea, organize the manuscript, and analyze the results. The fifth author contributes to the revision of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests regarding this work.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Message distribution and cache state transitions
Appendix A: Message distribution and cache state transitions
Table 15 in this appendix summarizes 41 alternative message distributions on dual-NoC that are executed for each PARSEC benchmark application described in Table 3 (Sect. 3). Figures 9 and 10 are tabular representations of cache state transition graphs, as discussed in Sect. 4, Figs. 1 and 2, respectively.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yadav, S., Laxmi, V., Kapoor, H. et al. Adaptive distribution of control messages for improving bandwidth utilization in multiple NoC. J Supercomput 79, 17208–17246 (2023). https://doi.org/10.1007/s11227-023-05208-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05208-0