Abstract
With nowadays popularity of large-scale parallel computers, Multiprocessors System-on-Chip (MP-SoCs), multicomputers, cluster computers and peer-to-peer communication networks, fault-tolerant routing becomes an important issue in developing these systems. Fault-tolerant routing algorithms in such systems aim at providing continuous operations in the presence of one or more failures by allowing the graceful degradation of system. The Software-Based fault-tolerant routing scheme has been suggested as an efficient routing algorithm to preserve both communication performance and fault-tolerant demands in parallel computer systems. To study network performance, a number of different analytical models for fault-free routing algorithms have been proposed in the past literature. However, there has not been reported any similar analytical model of fault-tolerant routing in the presence of faulty components. This paper presents a new analytical modeling approach for determining the effects of failures in wormhole-switched 2-D tori using the fault-tolerant Software-Based scheme. More specifically, we describe a general model to derive mathematical expressions to investigate the performance behavior of routing algorithms confronting convex (|-shaped, □-shaped) or concave (U-shaped, +-shaped, T-shaped, H-shaped) faulty regions. The model is validated through comprehensive simulation experiments for different types of failures.
Similar content being viewed by others
References
Duato J, Yalamanchili S, Ni LM (2003) Interconnection networks: an engineering approach. Morgan Kaufmann, San Mateo
Dally WJ, Towles B (2004) Principles and practices of interconnection networks. Morgan Kaufmann, San Mateo
Duato J (1993) A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distrib Syst 4(12):1320–1331
Suh YJ, et al (2000) Software-Based rerouting for fault-tolerant pipelined communication. IEEE Trans Parallel Distrib Syst 11(3):193–211
Chakravorty S, Kalé LV (2004) A fault tolerant protocol for massively parallel systems. In: 18th IEEE international parallel and distributed processing symposium (IPDPS04), April 2004, p 212a
Al-Karaki JN (2004) Performance analysis of repairable cluster of workstations. In: 18th IEEE international parallel and distributed processing symposium (IPDPS04), April 2004, p 253a
Karimou D, Myoupo J (2005) A fault-tolerant permutation routing algorithm in mobile ad-hoc networks. In: Lecture notes on computer science, vol 3421, pp 107–115
Gupta G, Younis M (2003) Fault-tolerant clustering of wireless sensor networks. In: IEEE Conf on Wireless Communications and Networking, March 2003, pp 1579–1584
Pande PP, et al (2005) Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Trans Comput 54(8):1025–1040
Safaei F, Fathy M, Khonsari A, Ould-Khaoua M (2006) A performance model of fault-tolerant routing algorithm in interconnect networks. In: 6th International Conference on Computational Science (ICCS06), Part I, LNCS 3991, May 2006, pp 744–752
Gómez ME, et al (2006) A routing methodology for achieving fault tolerance in direct networks. IEEE Trans Comput 55(4):400–415
Wu J, Jiang Z (2005) On constructing the minimum orthogonal convex polygon for the fault-tolerant routing in 2-D faulty meshes. IEEE Trans Reliab 54(3):449–458
Hoseiny Farahabady M, Safaei F, Khonsari A, Fathy M (2006) On the fault patterns properties in the torus networks. In: 4th ACS/IEEE international conference on computer systems and applications (AICCSA06), March 2006, pp 215–220
Hoseiny Farahabady M, Safaei F, Khonsari A, Fathy M (2006) Characterization of spatial fault patterns in interconnection networks. J Parallel Comput 32(11–12):886–901
Abraham S, Padmanabhan K (1989) Performance of the direct binary n-cube networks for multiprocessors. IEEE Trans Comput 37(7):1000–1011
Ould-Khaoua M (1999) A performance model of Duato’s adaptive routing algorithm in k-ary n-cubes. IEEE Trans Comput 48(12):1–8
Agarwal A (1991) Limits on interconnection network performance. IEEE Trans Parallel Distrib Syst 2(4):398–412
Draper JT, Ghosh J (1994) A comprehensive analytical model for wormhole routing in multicomputer systems. J Parallel Distrib Comput 32(2):202–214
Kleinrock L (1975) Queueing systems, vol 1. Wiley, New York
Sarbazi-Azad H (2001) Performance analysis of wormhole routing in multicomputer interconnection networks. PhD Thesis, Computing Science Department, Glasgow University
Dally WJ (1992) Virtual channel flow control. IEEE Trans Parallel Distrib Syst 3(2):194–205
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Safaei, F., Khonsari, A., Fathy, M. et al. Performance analysis of fault-tolerant routing algorithm in wormhole-switched interconnections. J Supercomput 41, 215–245 (2007). https://doi.org/10.1007/s11227-007-0114-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-007-0114-8