Skip to main content

Fault-tolerant hardware configuration management on the multiprocessor system DIRMU 25

  • Architectural Aspects (Session 3.1)
  • Conference paper
  • First Online:
Book cover CONPAR 86 (CONPAR 1986)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 237))

Included in the following conference series:

Abstract

This paper describes fault tolerance techniques which have been developed and implemented for the multiprocessor system DIRMU 25 — a 25-processor system which is operational at the University of Erlangen-Nuremberg. First a short overview of the DIRMU hardware architecture, programming environment and parallel application programs is given. Fault-diagnosis and reconfiguration are implemented in a layer of the DIRMOS operating system: the hardware configuration management. The concept of this configuration management is described in general (based on a graph model) and its application for the fault-tolerant execution of parallel programs is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hwang, K., Briggs, F.A.: Computer Architecture and Parallel Processing, McGraw Hill 1984.

    Google Scholar 

  2. Siewiorek, D.P., Swarz, R.S.: The Theory and Practice of Reliable System Design, Digital Press 1982.

    Google Scholar 

  3. Hopkins, Jr, A. L., Smith, III, T.B., Lala, J.H.: FTMP — A Highly Reliable Fault-Tolerant Multiprocessor for Aircraft, Proc. of the IEEE, Vol. 66, No. 10, 1221–1239.

    Google Scholar 

  4. Handler, W., Maehle, E., Wirl, K.: DIRMU Multiprocessor Configurations, Proc. 1985 Int. Conf. on Parallel Processing, St. Charles, Ill., 1985, 652–656.

    Google Scholar 

  5. Handler, W., Maehle, E., Wirl, K.: The DIRMU Testbed for High-Performance Multiprocessor Configurations, Proc. Int. Conf. on Supercomputing Systems, St. Petersburg, Fl., 1985, 468–475.

    Google Scholar 

  6. Hayes, J.P.: A Graph Model for Fault-Tolerant Computing Systems. IEEE Trans. on Computers, Vol. C-25, No. 9, Sept. 1976, 875–884.

    Google Scholar 

  7. Maehle, E., Fehlertolerantes Verhalten in Multiprozessoren — Untersuchungen zur Diagnose und Rekonfiguration, Dissertation, Arbeitsberichte des IMMD, Vol. 15, No. 2, Univ. of Erlangen-Nuremberg 1982.

    Google Scholar 

  8. Moritzen, K.: System-Level Fault-Diagnosis in Distributed Systems, 2nd GI/NTG/GMR Conf. ‘Fault-Tolerant Computing Systems', Informatik-Fachberichte 84, Springer, Berlin Heidelberg New York Tokyo 1984, 301–312.

    Google Scholar 

  9. Moritzen, K.: Softwarewerkzeuge zur Programmierung von Multiprozessoren mit begrenzten Nachbarschaften — ein Beitrag zur Konfigurationsverwaltung, Dissertation, Univ. of Erlangen-Nuremberg (to appear).

    Google Scholar 

  10. Wirth, N.: Programming in Modula-2, Springer, Berlin, Heidelberg New York Tokyo 1982.

    Google Scholar 

  11. Maehle, E., Wirl, K., Japel, D.: Experiments with Parallel Programs on the DIRMU Multiprocessor Kit, Proc. ‘Parallel Computing 85', Berlin 1985, 515–520.

    Google Scholar 

  12. Bode, A., Fritsch, G., Henning, W., Volkert, J.: High Performance Multiprocessor Systems for Numerical Simulation, Proc. First Int. Conf. on Supercomputing Systems, St.Petersburg, Fl., 1985, 460–467.

    Google Scholar 

  13. Cook, S.A.: The Complexity of Theorem Proving Procedures, Proc. 3rd Annual Symp. on Theory of Computing, 1971, 151–158.

    Google Scholar 

  14. Andrews, G.R., Schneider, F.B.: Concepts and Notations of Parallel Programming, ACM Computing Surveys, Vol. 15, No. 1, March 1983, 3–43.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Wolfgang Händler Dieter Haupt Rolf Jeltsch Wilfried Juling Otto Lange

Rights and permissions

Reprints and permissions

Copyright information

© 1986 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maehle, E., Moritzen, K., Wirl, K. (1986). Fault-tolerant hardware configuration management on the multiprocessor system DIRMU 25. In: Händler, W., Haupt, D., Jeltsch, R., Juling, W., Lange, O. (eds) CONPAR 86. CONPAR 1986. Lecture Notes in Computer Science, vol 237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-16811-7_170

Download citation

  • DOI: https://doi.org/10.1007/3-540-16811-7_170

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-16811-9

  • Online ISBN: 978-3-540-44856-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics