Distributed Self-Diagnosis and Fault-Tolerant Communication in Parallel Multiprocessor Networks

Dilger, Elmar

doi:10.1007/978-3-642-69698-5_24

Elmar Dilger⁴

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 84))

42 Accesses

Abstract

In this paper concepts are presented which provide fault-tolerant communication and distributed system-level self-diagnosis in networks. Information of the state of the system is not obtained by performing a normal diagnostic algorithm /6,9,10/, but is provided by the interconnection structure of the network itself. Maximum diagnosability and maximum degree of fault-tolerant communication can be achieved in merely three rounds. This result covers not only processing element faults but also faults of the interconnections.

Zusammenfassung

In dieser Arbeit sollen Konzepte vorgestellt werden, die eine einfache verteilte Diagnose und einen fehlertoleranten Nachrichtenaustausch in gewissen Multiprozessor-Netzwerken ermöglichen. Information über den Zustand des Systems erhält man dabei nicht wie üblich mit Hilfe eines Diagnosealgorithmus /6,9,10/, sondern implizit durch Nachrichtenaustausch über die Verbindungsstruktur des Netzes. Ein sehr wichtiges Ergebnis wird sein, daß die volle Diagnostizierbarkeit sich innerhalb dreier “Runden” gewinnen läßt und im Fehlerfalle drei Runden des Nachrichtenaustauschs genügen, Nachrichten korrekt vom Sender zum Empfänger zu leiten. Als Fehler werden dabei Fehler der Prozessorknoten sowie Fehler der Verbindungen betrachtet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D. P. Agrawal: Testing and Fault-Tolerance of Multistage Interconnection Networks, Computer, April 1982, pp. 41–53.
Google Scholar
E. Ammann, M. Dal Cin: Efficient Algorithms for Comparison-Based Self-Diagnosis, in: M. Dal Cin, E. Dilger (eds.): Self-Diagnosis and Fault-Tolerance, Proceedings, Attempto-Verlag Tübingen, 1981, pp. 118
Google Scholar
E. Dilger, E. Ammann: System-Level Self-Diagnosis in n-CubeConnected Multiprocessor Networks, Digest of papers FTCS-14, 1984.
Google Scholar
T. Feng: A Survey of Interconnection Networks, Computer, December 1981, pp. 12–27.
Google Scholar
W. K. Giloi: Die Entwicklung der Rechnerarchitektur von der vonNeumann-Maschine bis zu Rechnern der “fünften” Generation, Elektronische Rechenanlagen, April 1984, pp. 55–70.
Google Scholar
S. H. Hosseini, J. G. Kuhl, S. M. Reddy: A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair, IEEE Trans. Comp., C-33 No. 3, 1984, pp. 223–233.
Article Google Scholar
D. E. Knuth: The Art of Programming, Vol. 3: Sorting and Searching, Addison-Wesley„ 1973.
Google Scholar
Y. Koga, E. Fukushima, K. Yoshihara: Error Recoverable and Securable Data Communication for Computer Network, Digest of papers FTCS12, 1982, pp. 183–186.
Google Scholar
J. G. Kuhl, S. M. Reddy: Distibuted Fault-Tolerance for Large Multiprocessor Systems, Proc. 7th Conference on Computer Architecture, 1980, pp. 23–30.
Google Scholar
J. G. Kuhl, S. M. Reddy: Fault-Diagnosis in Fully Distributed Systems, Digest of papers FTCS-11, 1981, pp. 100–105.
Google Scholar
L. Lamport: Using Time Instead of Timeout for Fault-Tolerant Distributed Systems, ACM Trans. Program. Lang. Syst. 6, 2 (April 1984), pp. 254–280.
Article Google Scholar
D. S. Parker: Notes on Shuffle/Exchange-Type Switching Networks, IEEE Trans. Comp., C-29 No. 3, 1980, pp. 213–222.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Information Sciences, University of Tuebingen, Koestlinstr. 6, D-7400, Tuebingen, Germany
Elmar Dilger

Authors

Elmar Dilger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Systemtechnik (F2), Gesellschaft für Mathematik und Datenverarbeltung mbH Bonn, Schloß Birlinghoven, Postfach 1240, 5205, St. Augustin, USA
K.-E. Großpietsch
Institut für Informationsverarbeitung, Universität Tübingen, Köstlinstraße 6, 7400, Tübingen, Germany
M. Dal Cin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dilger, E. (1984). Distributed Self-Diagnosis and Fault-Tolerant Communication in Parallel Multiprocessor Networks. In: Großpietsch, KE., Dal Cin, M. (eds) Fehlertolerierende Rechensysteme. Informatik-Fachberichte, vol 84. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-69698-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-69698-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-13348-3
Online ISBN: 978-3-642-69698-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics