Abstract
A fault-tolerant routing method that can tolerate solid faults using only two virtual channels is presented. The proposed routing algorithm, called FT-Ecube, not only uses a fewer number of virtual channels but also tolerates f-chains in the meshes. Furthermore, the proposed scheme misroutes messages both clockwise and counter clockwise directions to reduce channel contention on f-rings. It is shown that the proposed algorithm is deadlock-free and livelock-free in meshes when it has nonoverlapping multiple f-regions. Further, we conducted flit-level simulations to evaluate the performance of the proposed routing algorithm. As our simulation results show, FT-Ecube tolerates multiple faulty blocks using only two virtual channels per physical channel, and has good performance in terms of average latency.
Similar content being viewed by others
References
F. Allen and et al. Blue gene: a vision for protein science using a petaflop supercomputer. IBM Systems J., 4:310–327, 2001.
R.V. Boppana and S. Chalasani. Fault-tolerant wormhole routing algorithms for mesh networks. IEEE Trans. Computers, 44(7):848–864, July 1995.
S. Chalasani and R.V. Boppana. Communication in multicomputers with nonconvex faults. IEEE Trans. Computers, 46(5):616–622, May 1997.
C. Chen and G. Chiu. A fault-tolerant routing scheme for meshes with nonconvex faults. IEEE Trans. On Parallel and Distributed Systems, 616–622, May 2001.
J. Duato, S. Yalmanchili and L. Ni. Interconnection networks an engineering approach. IEEE Computer Society Press, Los Alamitos, California, 1997.
C. Ho and L. Stockmeyer. A new approach to fault-tolerant wormhole Routing for mesh-connected parallel computers. In International Conference on Parallel and Distributed Processing Techniques and applications (IPDPS’02), 460–468, 2002.
R. Libeskind-Hadas and E. Brandt. Origin-based fault-tolerant routing in the mesh. IEEE Symposium on High-Performance Computer Architecture, 102–111, 1995.
S. Park, J. Youn and B. Bose. Fault-tolerant wormhole routing algorithms in the presence of concave faults. International Parallel and Distributed Processing Symposium, 633–638, May 2000.
S. Park, J. Youn and B. Bose. Wormhole routing in faulty mesh networks. In International Conference on Parallel and Distributed Processing Techniques and applications, 1007–1012, June 2000.
C. Su and K. Shin. Adaptive fault-tolerant deadlock-free routing in meshes and hypercubes. IEEE Trans. Computers, 45(6):666–683, June 1996.
P. Sui and S. Wang. An improved algorithm for fault-tolerant wormhole routing in meshes. IEEE Trans. Computers, 46(9):1040–1042, Sept. 1997.
J. Youn, B. Bose and S. Park. Fault-tolerant communication in meshes with some nonconvex faults. In International Conference on Communications in Computing, 233–239, June 2000.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the NSF grant MIP-9705738
Rights and permissions
About this article
Cite this article
Youn, JH., Bose, B. & Park, S. Fault-Tolerant Routing Algorithm in Meshes with Solid Faults. J Supercomput 37, 161–177 (2006). https://doi.org/10.1007/s11227-006-5530-7
Issue Date:
DOI: https://doi.org/10.1007/s11227-006-5530-7