Skip to main content

Design of a router for fault-tolerant networks

  • Conference paper
  • First Online:
Parallel Computer Routing and Communication (PCRCW 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 853))

Included in the following conference series:

Abstract

As interconnection networks grow larger and larger, the need for reliable message delivery in the presence of faults grows as well. Unfortunately, most network routing schemes currently in use do not provide graceful tolerance of even the most common faults. Because routing messages around failed components requires non-minimal routing, it makes sense to examine routers which, by design, allow packets to take non-minimal routes. Such routers provide a basic level of fault-tolerance by allowing messages to be routed around faults, without requiring a priori knowledge of their locations. However, the mechanisms can be slow and clumsy at times. We augment Chaotic routing, a non-minimal adaptive routing scheme, with a limited amount of hardware to support fault detection, identification, and reconfiguration so that the network can automatically reconfigure itself when faults occur. We present a high-level design of these mechanisms, driven by the goal of achieving reasonable reliability without exorbitant cost.

This work is supported in part by Office of Naval Research grant N00014-91-J-1007 and National Science Foundation grant MIP9213469.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kevin Bolding. Chaotic Routing: Design and Implementation of an Adaptive Multicomputer Network Router. PhD thesis, University of Washington, Seattle, WA, July 1993.

    Google Scholar 

  2. Kevin Bolding and Lawrence Snyder. Overview of fault handling for the chaos router. In Proceedings of the 1991 IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems, pages 124–127, November 1991.

    Google Scholar 

  3. Kevin Bolding and Lawrence Snyder. Mesh and torus chaotic routing. In Advanced Research in VLSI and Parallel Systems: Proceedings of the 1992 Brown/MIT Conference, pages 333–347, March 1992.

    Google Scholar 

  4. Andrew A. Chien and Jae H. Kim. Planar-adaptive routing: Low-cost adaptive networks for multiprocessors. In Proc. Int. Symp. on Computer Architecture, pages 268–277, May 1992.

    Google Scholar 

  5. Bill Coates, Al Davis, and Ken Stevens. The post office experience: Designing a large asynchronous chip. In Proceedings of the HICSS, 1993.

    Google Scholar 

  6. Robert Cypher and Luis Gravano. Adaptive, deadlock-free packet routing in torus networks with minimal storage. In Proc. Int. Conf. on Parallel Processing, pages 204–211, 1992.

    Google Scholar 

  7. W. Dally. Wire-efficient VLSI multiprocessor communication networks. In Paul Losleben, editor, Proceedings of the Stanford Conference on Advanced Research in VLSI, pages 391–415. MIT Press, March 1987.

    Google Scholar 

  8. W. Dally and C. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. on Computers, C-36(5):547–553, May 1987.

    Google Scholar 

  9. Chien Fang and Ted Szymanski. An analysis of deflection routing in multi-dimensional regular mesh networks. In Proceedings of IEEE INFOCOM '91, pages 859–868, April 1991.

    Google Scholar 

  10. C. Flaig. VLSI mesh routing systems. Master's thesis, California Institute of Technology, May 1987.

    Google Scholar 

  11. Melanie L. Fulgham and Lawrence Snyder. Performance of chaos and oblivious routers under non-uniform traffic. Technical Report CSE-93-06-01, University of Washington, Seattle, WA, June 1993.

    Google Scholar 

  12. Christopher J. Glass and Lionel M. Ni. The turn model for adaptive routing. In Proc. Int. Symp. on Computer Architecture, 1992.

    Google Scholar 

  13. P. Kermani and L. Kleinrock. Virtual cut-through: A new computer communication switching technique. Computer Networks, 3:267–286, 1979.

    Google Scholar 

  14. Smaragda Konstantinidou and Lawrence Snyder. The chaos router: A practical application of randomization in network routing. In Proc. Symp. on Parallel Algorithms and Architectures, pages 21–30, 1990.

    Google Scholar 

  15. J. Y. Ngai. A Framework for Adaptive Routing in Multicomputer Networks. PhD thesis, California Institute of Technology, Pasadena, CA, May 1989.

    Google Scholar 

  16. Gustavo D. Pifarré, Luis Gravano, Sergio A. Felperin, and Jorge L. C. Sanz. Fully-adaptive minimal deadlock-free packet routing in hypercubes, meshes and other networks. In Proc. Symp. on Parallel Algorithms and Architectures, pages 278–290, 1991.

    Google Scholar 

  17. Charles L. Seitz and Wen-King Su. A family of routing and communication chips based on the Mosaic. In Symp. on Integrated Systems: Proc. of the 1993 Washington Conf., pages 320–337, 1993.

    Google Scholar 

  18. B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proceedings of SPIE, pages 241–248, 1981.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Kevin Bolding Lawrence Snyder

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bolding, K., Yost, W. (1994). Design of a router for fault-tolerant networks. In: Bolding, K., Snyder, L. (eds) Parallel Computer Routing and Communication. PCRCW 1994. Lecture Notes in Computer Science, vol 853. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58429-3_40

Download citation

  • DOI: https://doi.org/10.1007/3-540-58429-3_40

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58429-2

  • Online ISBN: 978-3-540-48787-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics