Skip to main content
Log in

A fault-tolerant routing algorithm in HyperX topology based on unsafety vectors

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

HyperX is a promising high-radix topology proposed by a group of researchers in HP laboratories. The topology offers numerous advantages of high-radix routers, among which are very low diameter and low average distance. Increasing degree of routers and growth of network size intensifies failure probability of routers. Thereby, the essence of a routing algorithm with fault-tolerance capability is inevitable. In this paper, for the first time, a fault-tolerant routing algorithm for HyperX topology is suggested. The proposed algorithm is based on the concept of unsafety vectors by which the unsafety degree of each node is calculated depending on its faulty neighbors. For each step of the routing, the node with the lowest number of faulty neighbors is selected. The neighbors are located along the path of message from source to destination. Furthermore, we analytically have induced some properties of the proposed algorithm. Yet, an applied example for step by step illustration of the functionality of the algorithm is given showing its efficient performance even in the presence of catastrophic failures. The performance of proposed routing algorithm is evaluated by the simulation results of various workloads signifying the accuracy and integrity of the suggested algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Kim J (2008) High-Radix Interconnection Networks, PhD Thesis, Department of Electrical Engineering, Stanford University, USA

  2. Kim J, Dally WJ, Abts D (2007) Flattened butterfly: a cost-efficient topology for high-radix networks. InISCA ’07: ACM Proceedings of the 34th annual international symposium on computer architecture, pp 126–137, San Diago, California, USA, June 2007

  3. Ahn JH, Binkert N, Davis A, McLaren M, Schreiber RS (2009) HyperX: topology, routing, and packaging of efficient large-scale networks. InSC ’09: ACM Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp 1–11, New York, NY, USA

  4. Kim J, Dally WJ, Scott S, Abts D (2008) Technology-driven highly-scalable dragonfly topology. In ISCA ’08: Proceeding of the 35th International Symposium on Computer Architecture (ISCA), pp 77–88, Beijing, China, June 2008

  5. Thamarakuzhi A, Chandy JA (2010) 2-Dilated flattened butterfly: A nonblocking switching network. International Conference on High Performance Switching and Routing (HPSR), pp 153–158, Richardson, TX, June 2010

  6. Scott S, Abts D, Kim J, Dally WJ (2006) The blackwidow high-radix clos network. In: ISCA ’06: Proceedings of the 33rd annual international symposium on Computer Architecture, pp 16–28, Boston, MA, July 2006

  7. Sullivan H, Bashkow TR (1977) A large scale, homogeneous, fully distributed parallel machine I. ISCA ’77: Proceedings of the 4th annual symposium on Computer architecture, vol 5, no. 7, pp 105–117, March

  8. Bhuyan LN, Agrawal DP (1984) Generalized hypercube and hyperbus structures for a computer network. IEEE Trans Comput 33(4):323–333

    Article  MATH  Google Scholar 

  9. Azizi S, Safaei F, Hashemi N (2013) On the topological properties of HyperX. J Supercomput 66(1):572–593

    Article  Google Scholar 

  10. Leiserson C (1985) Fat-trees: Universal networks for hardware efficient supercomputing. IEEE Trans Comput 34(10):892–901

    Article  Google Scholar 

  11. Al-Sadi J, Day K, Ould-Khaoua M (2002) Unsafety Vectors: a new fault-tolerant routing for the binary n-cube. J Syst Archit 47(9):783–793

    Article  Google Scholar 

  12. Chen MS, Shin KG (1990) Adaptive fault-tolerant routing in hypercube multicomputers. IEEE Trans Comput 39(12):1406–1416

    Article  MathSciNet  Google Scholar 

  13. Chen MS, Shin KG (1990) Depth-first search approach for fault-tolerant routing in hypercube multicomputers. IEEE Trans Parallel Distrib Syst 1(2):152–159

    Article  MathSciNet  Google Scholar 

  14. Sheu J-P, Su M-Y (1994) A multicast algorithm for hypercube multiprocessors. Proc Int Conf Parallel Algorithm Appl 2(4):277–290

    Article  MATH  Google Scholar 

  15. Al-Sadi J, Day K, Ould-Khaoua M (2001) Unsafety vectors: a new fault-tolerant routing for \(k\)-Ary \(n\)-cubes. J Microprocess Microsyst 25(5):239–246

    Article  Google Scholar 

  16. Wu J (1998) Adaptive fault-tolerant routing in cube-based multicomputers using safety vectors. IEEE Trans Parallel Distrib Syst 9(4):321–334

    Article  Google Scholar 

  17. Wu J, Gao F, Li Z, Min Y (2000) Optimal fault-tolerant routing in hypercubes using extended safety vectors. In: Proceedings 7th International Conference on Parallel and Distributed Systems, pp 264–271, Iwate, Japan

  18. Das RK (2011) Fault tolerant routing using improved safety vectors. In: Proceeding 1st International Conference on Parallel Distributed and Grid Computing (PDGC), pp 129–134, Solan, January

  19. Chiu C-W, Yang C-B, Huang K-S, Tseng C-T (2009) A fault-tolerant routing algorithm with safety vectors on the (n, k)-star graph. In: Proceedings 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN), pp 34–39, Kaohsiung

  20. Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. Int J Eurographics Assoc (Computer Graphics Forum) 8(1):3–12

    Article  Google Scholar 

  21. Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–193

    Article  Google Scholar 

  22. Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor: theoretical properties and algorithms. J Distrib Parallel Comput 21(11):1783–1806

    Article  Google Scholar 

  23. Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th Annual International High Performance Computing Conference. The 1993 High Performance Computing: New Horizons Supercomputing Symposium, Calgary, Alberta, Canada, pp 349–357

  24. Kim J, Dally WJ, Abts D (2006) Adaptive routing in high-radix clos network. In: SC ’06 ACM Proceedings of the conference on Supercomputing, New York, USA

  25. Wu J (1997) Reliable unicasting in faulty hypercubes using safety levels. IEEE Trans Comput 46(2):241–247

  26. Kleinrock L (1975) Queueing systems, vol 1. Wiley, New York

    MATH  Google Scholar 

  27. Dally WJ, Towles B (2004) Principles and practices of interconnection networks. Morgan Kaufman, San Francisco

    Google Scholar 

  28. Nayebi A, Meraji S, Shamaei A, Sarbazi-Azad H (2007) Xmulator: a listener-based integrated simulation platform for interconnection networks. In: Proceedings of Asian Modelling Symposium (AMS2007), IEEE Press. Available http://www.xmulator.com

  29. Kahng A, Li B, Peh L, Samadi K (2009) ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration. In: Proceedings of Design, Automat. Test Eur, DATE, pp 423–428

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farshad Safaei.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Azizi, S., Safaei, F. & Roozikhar, M. A fault-tolerant routing algorithm in HyperX topology based on unsafety vectors. J Supercomput 71, 1224–1248 (2015). https://doi.org/10.1007/s11227-014-1355-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1355-y

Keywords

Navigation