Skip to main content

Distributed Self-Diagnosis and Fault-Tolerant Communication in Parallel Multiprocessor Networks

Verteilte Selbstdiagnose und fehlertoleranter Nachrichtenaustausch in parallelverarbeitenden Multiprozessor-Netzwerken

  • Conference paper
Fehlertolerierende Rechensysteme

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 84))

  • 42 Accesses

Abstract

In this paper concepts are presented which provide fault-tolerant communication and distributed system-level self-diagnosis in networks. Information of the state of the system is not obtained by performing a normal diagnostic algorithm /6,9,10/, but is provided by the interconnection structure of the network itself. Maximum diagnosability and maximum degree of fault-tolerant communication can be achieved in merely three rounds. This result covers not only processing element faults but also faults of the interconnections.

Zusammenfassung

In dieser Arbeit sollen Konzepte vorgestellt werden, die eine einfache verteilte Diagnose und einen fehlertoleranten Nachrichtenaustausch in gewissen Multiprozessor-Netzwerken ermöglichen. Information über den Zustand des Systems erhält man dabei nicht wie üblich mit Hilfe eines Diagnosealgorithmus /6,9,10/, sondern implizit durch Nachrichtenaustausch über die Verbindungsstruktur des Netzes. Ein sehr wichtiges Ergebnis wird sein, daß die volle Diagnostizierbarkeit sich innerhalb dreier “Runden” gewinnen läßt und im Fehlerfalle drei Runden des Nachrichtenaustauschs genügen, Nachrichten korrekt vom Sender zum Empfänger zu leiten. Als Fehler werden dabei Fehler der Prozessorknoten sowie Fehler der Verbindungen betrachtet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D. P. Agrawal: Testing and Fault-Tolerance of Multistage Interconnection Networks, Computer, April 1982, pp. 41–53.

    Google Scholar 

  2. E. Ammann, M. Dal Cin: Efficient Algorithms for Comparison-Based Self-Diagnosis, in: M. Dal Cin, E. Dilger (eds.): Self-Diagnosis and Fault-Tolerance, Proceedings, Attempto-Verlag Tübingen, 1981, pp. 118

    Google Scholar 

  3. E. Dilger, E. Ammann: System-Level Self-Diagnosis in n-CubeConnected Multiprocessor Networks, Digest of papers FTCS-14, 1984.

    Google Scholar 

  4. T. Feng: A Survey of Interconnection Networks, Computer, December 1981, pp. 12–27.

    Google Scholar 

  5. W. K. Giloi: Die Entwicklung der Rechnerarchitektur von der vonNeumann-Maschine bis zu Rechnern der “fünften” Generation, Elektronische Rechenanlagen, April 1984, pp. 55–70.

    Google Scholar 

  6. S. H. Hosseini, J. G. Kuhl, S. M. Reddy: A Diagnosis Algorithm for Distributed Computing Systems with Dynamic Failure and Repair, IEEE Trans. Comp., C-33 No. 3, 1984, pp. 223–233.

    Article  Google Scholar 

  7. D. E. Knuth: The Art of Programming, Vol. 3: Sorting and Searching, Addison-Wesley„ 1973.

    Google Scholar 

  8. Y. Koga, E. Fukushima, K. Yoshihara: Error Recoverable and Securable Data Communication for Computer Network, Digest of papers FTCS12, 1982, pp. 183–186.

    Google Scholar 

  9. J. G. Kuhl, S. M. Reddy: Distibuted Fault-Tolerance for Large Multiprocessor Systems, Proc. 7th Conference on Computer Architecture, 1980, pp. 23–30.

    Google Scholar 

  10. J. G. Kuhl, S. M. Reddy: Fault-Diagnosis in Fully Distributed Systems, Digest of papers FTCS-11, 1981, pp. 100–105.

    Google Scholar 

  11. L. Lamport: Using Time Instead of Timeout for Fault-Tolerant Distributed Systems, ACM Trans. Program. Lang. Syst. 6, 2 (April 1984), pp. 254–280.

    Article  Google Scholar 

  12. D. S. Parker: Notes on Shuffle/Exchange-Type Switching Networks, IEEE Trans. Comp., C-29 No. 3, 1980, pp. 213–222.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1984 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dilger, E. (1984). Distributed Self-Diagnosis and Fault-Tolerant Communication in Parallel Multiprocessor Networks. In: Großpietsch, KE., Dal Cin, M. (eds) Fehlertolerierende Rechensysteme. Informatik-Fachberichte, vol 84. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-69698-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-69698-5_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-13348-3

  • Online ISBN: 978-3-642-69698-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics