Skip to main content
Log in

Scheduling array redistribution with virtual channel support

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper addresses the well-known problem of redistributing data arrays over a multiprocessor network. The block-cyclic redistribution techniques found in the literature deal effectively with this problem, based on the assumption that there is a direct link between all the processors of the network. Most of these techniques aim at reducing the number of messages and the total redistribution cost. However, an application of these techniques on non-all-to-all communication networks, like tori, shows that these techniques suffer long delays. In this work, we try to solve the general block-cyclic redistribution problem on non-all-to-all networks, by grouping the messages into well-defined classes, and transferring them with the support of a well-specified number of virtual channels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Arabnia HR, Oliver MA (1987) Arbitrary rotation of raster images with SIMD machine architectures. Int J Eurogr Assoc (Comput Gr Forum) 6(1):3–12

    Google Scholar 

  2. Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–433

    Article  Google Scholar 

  3. Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Int J Eurogr Assoc (Comput Gr Forum) 8(1):3–12

    Google Scholar 

  4. Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–193

    Article  Google Scholar 

  5. Arabnia HR, Smith JW (1993) A Reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium. Calgary, Alberta, Canada, June, pp 349–357

  6. Arabnia HR (1995) A distributed stereocorrelation algorithm. In: Proceedings of computer communications and networks (ICCCN’95), IEEE, pp 479–482

  7. Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–270

    Article  MATH  Google Scholar 

  8. Bhandarkar SM, Arabnia HR, Smith JW (1995) A reconfigurable architecture for image processing and computer vision. Int J Pattern Recogn Artif Intell (IJPRAI) (special issue on VLSI algorithms and architectures for computer vision, image processing, pattern recognition and AI) 9(2):201–229

  9. Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor: theoretical properties and algorithms. Parallel Comput 21(11):1783–1806

    Article  Google Scholar 

  10. Bhandarkar SM, Arabnia HR (1995) The Hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114

    Article  Google Scholar 

  11. Chung YC, Hsu CH, Bai SW (1998) A basic-cycle calculation technique for efficient dynamic data redistribution. IEEE Trans Parallel Distrib Syst 32:359–377

    Article  Google Scholar 

  12. Desprez F, Dongarra J, Petitet A, Randriamaro C, Robert Yves (1998) Scheduling block-cyclic array redistribution. IEEE Trans Parallel Distrib Syst 9(2):192–205

    Article  Google Scholar 

  13. Jun D, Negishi Y (2010) Overlapping methods of all-to-all communication and FFT algorithms for torus-connected massively parallel supercomputers. International conference on high performance computing, networking, storage and analysis (SC). New Orleans, Louisiana, pp 1–9

  14. Faraj A, Xin Y, Patarasuk P (2007) A message scheduling scheme for all-to-all personalized communication on ethernet switched clusters. IEEE Trans Parallel Distrib Syst 18(2):264–276

    Article  Google Scholar 

  15. Feng R, Zhang P, Deng Y (2013) Deadlock-free routing algorithms for 6D Mesh/iBT interconnection networks. In: Proceedings of the 5th IEEE international conference on information technology: new generations. Las Vegas, Nevada, pp 253–258

  16. Kalns ET, Ni LM (1995) Processor mapping techniques toward efficient data redistribution. IEEE Trans Parallel Distrib Syst 6:1234–1247

    Article  Google Scholar 

  17. Hsu C-H, Chung Y-C, Yang D-L, Dow C-R (2001) A generalized processor mapping technique for array redistribution. IEEE Trans Parallel Distrib Syst 12(7):743–757

    Article  Google Scholar 

  18. Huang JW, Chu CP (2006) An efficient communication scheduling method for the processor mapping technique applied data redistribution. J Supercomput 37:297–318

    Article  Google Scholar 

  19. Kumar S, Sabharwal Y, Garg R, Heidelberger P (2008) Optimization of all-to-all communication on the blue Gene/L supercomputer. In: Proceedings of the 37th international conference on parallel processing. Washington, DC, USA, pp 320–329

  20. Lim YW, Bhat PB, Prasanna VK (1998) Efficient algorithms for block cyclic redistribution of arrays. Algorithmica 24:298–330

    Article  MathSciNet  Google Scholar 

  21. Liu G, Gu N, Ren K, Tao Y (2006) Optimal all-to-all personalized communication in All-Port Tori. First international multi-symposiums on computer and computational sciences, IMSCCS ’06. Hanzhou, Zhejiang, pp 20–24

  22. Luo W, Xiang D (2012) Deadlock-free routing algorithm for torus networks. IEEE Trans Parallel Distrib Syst 23(5):800–808

    Article  Google Scholar 

  23. Ramanujam RS, Lin B (2013) Randomized throughput-optimal oblivious routing for torus networks. IEEE Trans Comput 62(6):561–574

    Article  MathSciNet  Google Scholar 

  24. Ramaswamy S, Simons P, Banerjee P (1996) Optimization for efficient array redistribution on distributed memory multicomputers. J Parallel Distrib Comput 38:217–228

    Article  MATH  Google Scholar 

  25. Souravlas SI, Roumeliotis M (2004) A pipeline technique for dynamic data transfer on a multiprocessor grid. Int J Parallel Progr 32(5):361–388

    Article  MATH  Google Scholar 

  26. Thakur R, Choudhary A, Ramanujam J (1996) Efficient algorithms for array redistribution. IEEE Trans Parallel Distrib Syst 7:587–594

    Article  Google Scholar 

  27. Wani MA, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63

    Article  MATH  Google Scholar 

  28. Yang Y, Wang J (2002) Near-optimal all-to-all broadcast in multidimensional all-port meshes and tori. IEEE Comput Trans Parallel Distrib Syst 13(2):128–141

    Article  Google Scholar 

  29. Zhang P, Deng Y (2012) Design and analysis of pipelined broadcast algorithms for the all-port interlaced bypass torus networks. IEEE Comput Trans Parallel Distrib Syst 23(12):2245–2253

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stavros Souravlas.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Souravlas, S., Roumeliotis, M. Scheduling array redistribution with virtual channel support. J Supercomput 71, 4215–4234 (2015). https://doi.org/10.1007/s11227-015-1519-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1519-4

Keywords

Navigation