Abstract
This paper describes fault tolerance techniques which have been developed and implemented for the multiprocessor system DIRMU 25 — a 25-processor system which is operational at the University of Erlangen-Nuremberg. First a short overview of the DIRMU hardware architecture, programming environment and parallel application programs is given. Fault-diagnosis and reconfiguration are implemented in a layer of the DIRMOS operating system: the hardware configuration management. The concept of this configuration management is described in general (based on a graph model) and its application for the fault-tolerant execution of parallel programs is discussed.
Preview
Unable to display preview. Download preview PDF.
References
Hwang, K., Briggs, F.A.: Computer Architecture and Parallel Processing, McGraw Hill 1984.
Siewiorek, D.P., Swarz, R.S.: The Theory and Practice of Reliable System Design, Digital Press 1982.
Hopkins, Jr, A. L., Smith, III, T.B., Lala, J.H.: FTMP — A Highly Reliable Fault-Tolerant Multiprocessor for Aircraft, Proc. of the IEEE, Vol. 66, No. 10, 1221–1239.
Handler, W., Maehle, E., Wirl, K.: DIRMU Multiprocessor Configurations, Proc. 1985 Int. Conf. on Parallel Processing, St. Charles, Ill., 1985, 652–656.
Handler, W., Maehle, E., Wirl, K.: The DIRMU Testbed for High-Performance Multiprocessor Configurations, Proc. Int. Conf. on Supercomputing Systems, St. Petersburg, Fl., 1985, 468–475.
Hayes, J.P.: A Graph Model for Fault-Tolerant Computing Systems. IEEE Trans. on Computers, Vol. C-25, No. 9, Sept. 1976, 875–884.
Maehle, E., Fehlertolerantes Verhalten in Multiprozessoren — Untersuchungen zur Diagnose und Rekonfiguration, Dissertation, Arbeitsberichte des IMMD, Vol. 15, No. 2, Univ. of Erlangen-Nuremberg 1982.
Moritzen, K.: System-Level Fault-Diagnosis in Distributed Systems, 2nd GI/NTG/GMR Conf. ‘Fault-Tolerant Computing Systems', Informatik-Fachberichte 84, Springer, Berlin Heidelberg New York Tokyo 1984, 301–312.
Moritzen, K.: Softwarewerkzeuge zur Programmierung von Multiprozessoren mit begrenzten Nachbarschaften — ein Beitrag zur Konfigurationsverwaltung, Dissertation, Univ. of Erlangen-Nuremberg (to appear).
Wirth, N.: Programming in Modula-2, Springer, Berlin, Heidelberg New York Tokyo 1982.
Maehle, E., Wirl, K., Japel, D.: Experiments with Parallel Programs on the DIRMU Multiprocessor Kit, Proc. ‘Parallel Computing 85', Berlin 1985, 515–520.
Bode, A., Fritsch, G., Henning, W., Volkert, J.: High Performance Multiprocessor Systems for Numerical Simulation, Proc. First Int. Conf. on Supercomputing Systems, St.Petersburg, Fl., 1985, 460–467.
Cook, S.A.: The Complexity of Theorem Proving Procedures, Proc. 3rd Annual Symp. on Theory of Computing, 1971, 151–158.
Andrews, G.R., Schneider, F.B.: Concepts and Notations of Parallel Programming, ACM Computing Surveys, Vol. 15, No. 1, March 1983, 3–43.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1986 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maehle, E., Moritzen, K., Wirl, K. (1986). Fault-tolerant hardware configuration management on the multiprocessor system DIRMU 25. In: Händler, W., Haupt, D., Jeltsch, R., Juling, W., Lange, O. (eds) CONPAR 86. CONPAR 1986. Lecture Notes in Computer Science, vol 237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-16811-7_170
Download citation
DOI: https://doi.org/10.1007/3-540-16811-7_170
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-16811-9
Online ISBN: 978-3-540-44856-3
eBook Packages: Springer Book Archive