Abstract
The loop is a commonly used interconnection network for computer systems. In this paper we consider the problem of making a loop network fault-tolerant. Previous solutions employ the absolute minimum number of redundant components, for a specified level of fault tolerance. In our approach, "extra" redundancy is used to reduce the size and complexity of the interconnection network. Designs based on chordal rings are presented which can tolerate one and two processor failures. The examples given indicate that for large scale systems, the approach can produce improved designs, which are more in accord with the limitations of current technology.
Preview
Unable to display preview. Download preview PDF.
References
B. Arden and H. Lee, "Analysis of Chordal Ring Network," IEEE Trans. on Comput., Vol. C-30, pp. 291–295, April 81.
D.V. Chudnovsky, G.V. Chudnovsky, and M.M. Denneau, "Regular Graphs with Small Diameter as Models for Interconnection Networks," Proc. of the 3rd International Conf. on Supercomputing, Vol. III, pp. 232–239, 1988.
S. Dutt and J.P. Hayes, "Design and Reconfiguration Strategies for Near-Optimal k-Fault-Tolerant Tree Architectures," FTCS-18, June 1988.
F. Harary, Graph Theory, Addison Wesley, 1972.
J.P. Hayes, "A Graph Model for Fault-Tolerant Computing Systems," IEEE Trans. on Comput., Vol. C-25, No. 9, pp. 875–884, September 1976.
J.P. Hayes, T.N. Mudge, Q.F. Stout, S. Colley, S. and J. Palmer, "Architecture of a hypercube Supercomputer," Proc. of 1986 Int'l Conf. on Parallel Processing, August 1986, pp. 653–660.
K. Hwang and F.A. Briggs, Computer Architecture and parallel Processing, McGraw-Hill, 1984.
K. Hwang and J. Ghosh, "Hypernet: A Communication-Efficient Architecture for Constructing Massively Parallel Computers," IEEE Trans. on Computers, pp. 1450–1466, December 1987.
D.K. Pradhan, "Interconnection topologies for fault-tolerant parallel and distributed architectures," Proc. 10th Int'l Conf. on Parallel Processing, pp. 238–242, August 1981.
M.J. Quinn, Designing Efficient Algorithms for Parallel Computers, McGraw-Hill Book Company, 1987.
C.L. Seitz, W.C. Athas, C.M. Flaig, A.J. Martin, J. Seizovic, C.S. Steele and W. Su, "The Architecture and Programming of the Ametek Series 2010 Multiprocessor," Proc. of The 3-rd Conf. on Hypercube Concurrent Computers and Applications, pp. 33–38, January 1988.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zimmerman, G.W. (1991). A new approach to system-level fault-tolerance in message-passing multicomputers. In: Sherwani, N.A., de Doncker, E., Kapenga, J.A. (eds) Computing in the 90's. Great Lakes CS 1989. Lecture Notes in Computer Science, vol 507. Springer, New York, NY. https://doi.org/10.1007/BFb0038515
Download citation
DOI: https://doi.org/10.1007/BFb0038515
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-97628-0
Online ISBN: 978-0-387-34815-5
eBook Packages: Springer Book Archive