Skip to main content

Fault-tolerant message routing for multiprocessors

  • Workshop on Fault-Tolerant Parallel and Distributed Systems Dimiter Avresky, Boston University David B. Kaeli, Notheastern University
  • Conference paper
  • First Online:
Book cover Parallel and Distributed Processing (IPPS 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1388))

Included in the following conference series:

Abstract

In this paper the problem of fault-tolerant message routing in two-dimensional meshes, with each inner node having 4 neighbors, is investigated. It is assumed that some nodes/links can be faulty, so it is necessary to route messages, using local information at each step. A new and efficient algorithm is proposed to solve this problem. This algorithm is local and consists of pre-routing and routing stages. The pre-routing algorithm is implemented off-line. The complexity of the pre-routing stage is O(W), where N is the number of nodes in the system, and t is the number of faulty nodes. The complexity of the online routing stage (the size of the routing table stored in the local memory) is O(t). The pre-routing algorithm is performed only once, after a new fault is detected. The algorithm allows 100% of deliverable messages to be delivered in the presence of faulty nodes with no deadlocks or lifelocks. No nodes are declared unsafe. The main idea is to construct fault free rectangular clusters during the pre-routing stage and store the information about their boundaries in local memories. At the routing stage the direction for sending a message at any node is determined by a cluster to which the destination node belongs. The algorithm is generalized on the case of multidimensional meshes.

This work was supported by the NSF under Grant MIP 9630096

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. J. Class and L. M.Ni,The Turn Model for Adaptive Routing, Proc. of the 19th Annual Int. Symp. on Computer Architecture, pp. 278–286, May 1992.

    Google Scholar 

  2. C. Cunningham and D. Avresky, Fault-Tolerant Adaptive Routing for Two-Dimensional Meshes, Proc. of First Int. Symp. on High Performance Computing Architecture, Raleigh, North Carolina, USA, January 1995.

    Google Scholar 

  3. C. Cunningham and D. Avresky, Fault-Tolerant Adaptive Broadcasting and Multicasting using Wormhole Routing in Two-Dimensional Meshes, Technical Report 95-033, Department of Computer Science, Texas A&M University.

    Google Scholar 

  4. J. Duato, A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks, Proc. of Int. Conf on Parallel Processing, vol. I., pp. 142–149, August 1994.

    Google Scholar 

  5. R.V. Boppana and S. Chalasani, A Comparison of Adaptive Wormhole Routing Algorithms, Computer Architecture News, 21(2), pp. 351–360, May 1993.

    Article  Google Scholar 

  6. S. Chalasani and R V. Boppana, Communication in Multicomputers with Nonconvex Faults, IEEE Trans. on Computers, vol. 46, pp. 616–622, May 1997.

    Article  Google Scholar 

  7. H.-L. Chen and N.-F. Tzeng, Subcube determination in faulty hypercubes, IEEE Trans. on Comput., vol.46, pp. 871–879, August 1997.

    Article  Google Scholar 

  8. L. M. Ni and P.K. McKinley, A Survey of Wormhole Routing Techniques in Directed Networks, Computer, vol. 26, pp. 62–76, February 1993.

    Article  Google Scholar 

  9. W. Dally and C.L. Seitz, Deadlock-Free Message Routing in Multiprocessor Interconnection Networks, IEEE Trans. on Comput., vol. 36, pp.547–553, May 1987.

    Google Scholar 

  10. Y.M. Boura and C.R. Das,Fault-Tolerant Routing in Mesh Networks, Proc. of Int. Conf. on Parallel Processing, vol. O., pp. 106–109, August 1995.

    Google Scholar 

  11. R.V. Boppana and S. Chalasani, Fault-Tolerant Wormhole Routing Algorithms in Mesh Networks, IEEE Trans. on Comput., vol. 44, pp.848–864, July 1995.

    Article  Google Scholar 

  12. W.J. Dally and H. Aoki, Deadlock-Free Adaptive Routing in Multiprocessor Networks Using Virtual Channels, IEEE Trans. on Parallel and Distibuted Systems, vol. 44, pp. 66–475, April 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

José Rolim

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zakrevski, L., Karpovsky, M. (1998). Fault-tolerant message routing for multiprocessors. In: Rolim, J. (eds) Parallel and Distributed Processing. IPPS 1998. Lecture Notes in Computer Science, vol 1388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64359-1_737

Download citation

  • DOI: https://doi.org/10.1007/3-540-64359-1_737

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64359-3

  • Online ISBN: 978-3-540-69756-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics