Abstract
As interconnection networks grow larger and larger, the need for reliable message delivery in the presence of faults grows as well. Unfortunately, most network routing schemes currently in use do not provide graceful tolerance of even the most common faults. Because routing messages around failed components requires non-minimal routing, it makes sense to examine routers which, by design, allow packets to take non-minimal routes. Such routers provide a basic level of fault-tolerance by allowing messages to be routed around faults, without requiring a priori knowledge of their locations. However, the mechanisms can be slow and clumsy at times. We augment Chaotic routing, a non-minimal adaptive routing scheme, with a limited amount of hardware to support fault detection, identification, and reconfiguration so that the network can automatically reconfigure itself when faults occur. We present a high-level design of these mechanisms, driven by the goal of achieving reasonable reliability without exorbitant cost.
This work is supported in part by Office of Naval Research grant N00014-91-J-1007 and National Science Foundation grant MIP9213469.
Preview
Unable to display preview. Download preview PDF.
References
Kevin Bolding. Chaotic Routing: Design and Implementation of an Adaptive Multicomputer Network Router. PhD thesis, University of Washington, Seattle, WA, July 1993.
Kevin Bolding and Lawrence Snyder. Overview of fault handling for the chaos router. In Proceedings of the 1991 IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems, pages 124–127, November 1991.
Kevin Bolding and Lawrence Snyder. Mesh and torus chaotic routing. In Advanced Research in VLSI and Parallel Systems: Proceedings of the 1992 Brown/MIT Conference, pages 333–347, March 1992.
Andrew A. Chien and Jae H. Kim. Planar-adaptive routing: Low-cost adaptive networks for multiprocessors. In Proc. Int. Symp. on Computer Architecture, pages 268–277, May 1992.
Bill Coates, Al Davis, and Ken Stevens. The post office experience: Designing a large asynchronous chip. In Proceedings of the HICSS, 1993.
Robert Cypher and Luis Gravano. Adaptive, deadlock-free packet routing in torus networks with minimal storage. In Proc. Int. Conf. on Parallel Processing, pages 204–211, 1992.
W. Dally. Wire-efficient VLSI multiprocessor communication networks. In Paul Losleben, editor, Proceedings of the Stanford Conference on Advanced Research in VLSI, pages 391–415. MIT Press, March 1987.
W. Dally and C. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. on Computers, C-36(5):547–553, May 1987.
Chien Fang and Ted Szymanski. An analysis of deflection routing in multi-dimensional regular mesh networks. In Proceedings of IEEE INFOCOM '91, pages 859–868, April 1991.
C. Flaig. VLSI mesh routing systems. Master's thesis, California Institute of Technology, May 1987.
Melanie L. Fulgham and Lawrence Snyder. Performance of chaos and oblivious routers under non-uniform traffic. Technical Report CSE-93-06-01, University of Washington, Seattle, WA, June 1993.
Christopher J. Glass and Lionel M. Ni. The turn model for adaptive routing. In Proc. Int. Symp. on Computer Architecture, 1992.
P. Kermani and L. Kleinrock. Virtual cut-through: A new computer communication switching technique. Computer Networks, 3:267–286, 1979.
Smaragda Konstantinidou and Lawrence Snyder. The chaos router: A practical application of randomization in network routing. In Proc. Symp. on Parallel Algorithms and Architectures, pages 21–30, 1990.
J. Y. Ngai. A Framework for Adaptive Routing in Multicomputer Networks. PhD thesis, California Institute of Technology, Pasadena, CA, May 1989.
Gustavo D. Pifarré, Luis Gravano, Sergio A. Felperin, and Jorge L. C. Sanz. Fully-adaptive minimal deadlock-free packet routing in hypercubes, meshes and other networks. In Proc. Symp. on Parallel Algorithms and Architectures, pages 278–290, 1991.
Charles L. Seitz and Wen-King Su. A family of routing and communication chips based on the Mosaic. In Symp. on Integrated Systems: Proc. of the 1993 Washington Conf., pages 320–337, 1993.
B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proceedings of SPIE, pages 241–248, 1981.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bolding, K., Yost, W. (1994). Design of a router for fault-tolerant networks. In: Bolding, K., Snyder, L. (eds) Parallel Computer Routing and Communication. PCRCW 1994. Lecture Notes in Computer Science, vol 853. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58429-3_40
Download citation
DOI: https://doi.org/10.1007/3-540-58429-3_40
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58429-2
Online ISBN: 978-3-540-48787-6
eBook Packages: Springer Book Archive