Abstract
This paper proposes and evaluates the performance of an all-to-all broadcasting algorithm for a 2D torus Network on Chip (NoC). The proposed algorithm uses special spanning trees called NEWS spanning trees. These trees are link conflict free which implies that the communication steps of the all-to-all algorithm are contention free. The proposed all-to-all broadcasting algorithm is optimal in terms of transmission time and does not need any additional buffer memory like in the all-to-all algorithm for the 2D torus (IEEE Trans Comput 50:1029–1032, 2001). Reducing the amount of buffer space is a very important issue in NoC architectures. Our algorithm is therefore a more efficient solution for all-to-all broadcasting in 2D torus NoC multi-core systems compared to previously proposed algorithms.
Similar content being viewed by others
References
Ackland B et al (2000) A single chip, 1.6-Billion, 16-b MAC/s Multiprocessor DSP. IEEE J Solid State Circuits:412–424
Benini L, De Micheli G (2002) Networks on chips: a new SoC paradigm. IEEE Comput 35:70–78
Benini L, De Micheli G (2000) System-level power optimization: techniques and tools. ACM Trans Design Autom Electr Syst:115–192
Dally WJ, Towles B (2001) Route packets, not wires: on-chip interconnection networks. In: Proc. Design Automatin Conf. (DAC). pp 684–689
Marculescu R, Ogras UY, Peh L, Jerger NE, Hoskote Y (2009) Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives. IEEE Trans Comput Aided Design Integr Circuits Syst 28(1):3–21
Bertozzi D et al (2005) NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. IEEE Trans Parallel Distrib Syst 16(2):113–129
Benini L, De Micheli G (2006) Networks on chips: technology and tools. Morgan Kaufmann
Guerrier P, Greiner A (2000) A generic architecture for on-chip packet-switched interconnections. In: Proc. Design and Test in Europe (DATE). pp 250–256
Kumar S et al (2002) A network on chip architecture and design methodology. Proc Intl Symposium VLSI (ISVLSI):117–124
Bjerregaard T, Mahadevan S (2006) A survey of research and practices of network-on-chip. ACM Comput Surv 38(1) (article 1)
Ogras UY, Hu J, Marculescu R (2005) Key re-search problems in NoC design: a holistic perspective. In: CODES. pp 69–75
Dally WJ (1990) Performance analysis of k-ary n-cube interconnection networks. IEEE Trans Comput 39(6):775–785
Dally WJ, Seitz CL (1986) The torus routing chip. J Distrib Comput 1(4):187–196
Zhang Z, Guo Z, Yang Y (2012) Efficient all-to-all broadcast in Gaussian On-Chip-Networks. IEEE Trans Comput 62(10):1959–1971
Saad Y, Schultz MH (1989) Data communication in parallel architectures. Parallel Comput 11:131–150
Johnsson SL, Ho CT (1989) Optimum broadcasting and personalized communication in hypercubes. IEEE Trans Comput 38(9):1249–1268
Bruck J, Ho CT, Kipnis S, Weathersby D (1994) Efficient Algorithms for All-to-All Communications in Multi-Port Message-Passing Systems. In: ACM Symposium on Parallel Algorithms and Architectures. pp 298–309
Calvin C, Perennes S, Trystram D (1995) All-to-all broadcast in torus with wormhole-like routing. In: Proc. of 7th IEEE Symposium on Parallel and Distributed Processing. pp 130–137
Yang Y, Wang J (1999) Efficient all-to-all broadcast in all-port mesh and torus networks. In: Proceedings of the Fifth International Symposium on High-Performance Computer Architecture. Orlando, pp 290–299
Yang Y, Wang J (2001) Pipelined all-to-all broadcast in all-port meshes and tori. IEEE Trans Comput 50(10):1029–1032
Yang Y, Wang J (2002) Near-optimal all-to-all broadcast in multidimensional all-port meshes and tori. IEEE Trans Parallel Distrib Syst 13(2):128–141
Huang H (2010) Efficient all-to-all broadcast algorithm in torus networks. IEEE Int Conf Intell Comput Intell Syst:911–916
Touzene A (1991) Brigitte plateau, optimal multinode broadcast on a mesh connected graph with reduced bufferization. Distrib Memory Comput Lect Notes Comput Sci 487(1991):143–152
Hassoun S, Alpert CJ, Thiagarajan M (2002) Optimal buffered routing path construction for single and multiple clock domain systems. In: Proceedings of the 2002 IEEE/ACM International Conference on Computer-Aided Design. pp 247–253
Ogras UY, Marculescu R (2006) It’s a small world after all: NoC performance optimization via long-range link insertion. IEEE Trans Very Large Scale Integr Syst 14(7):693–706
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor: theoretical properties and algorithms, parallel computing (journal). Elsevier 21(11):1783–1806
Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th Annual International High Performance Computing Conference. The 1993 High Performance Computing: New Horizons Supercomputing Symposium. Calgary, Alberta, Canada, pp 349–357
Wani MA, Arabnia HR (2003) Parallel edge-region-based degmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–193
Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitized images. Int J Eurographics Assoc 8(1):3–12
Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114
Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–433
Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–270
Arabnia HR, Oliver MA (1987) Arbitrary rotation of raster images with SIMD machine architectures. Int J Eurographics Assoc 6(1):3–12
Bhandarkar SM, Arabnia HR, Smith JW (1995) A reconfigurable architecture for image processing and computer vision. Int J Pattern Recogn Artif Intell 9(2):201–229
Arabnia HR (1996) Distributed Stereocorrelation Algorithm. Int J Comput Commun 707–712
Touzene A (2014) On all-to-all broadcast in dense gaussian network on-chip. IEEE Trans Parallel Distrib Syst 99:1
Touzene A (2014) All-to-all broadcast in hexagonal torus networks on-chip. IEEE Trans Parallel Distrib Syst 99:1 (no. preprints)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Touzene, A., Day, K. All-to-all broadcasting in torus Network on Chip. J Supercomput 71, 2585–2596 (2015). https://doi.org/10.1007/s11227-015-1406-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1406-z