Abstract
In this paper the fault tolerance concept for the dynamical reconfigurable multiprocessor system DAMP currently under development at the University of Paderborn is introduced. Its architecture is based on a single type of building block (DAMP-module) consisting of a transputer, memory and a local switching network. These building blocks are interconnected according to a fixed physical topology with restricted neighborhood (octagonal torus). Communication paths between nodes can dynamically be built up and released during runtime in a fully distributed way (circuit-switching). Currently an 8-processor prototype is operational, a redesign for a 64-processor system is under way. Fault-tolerance will be realized by dynamic redundancy in form of standby sparing. The distributed self-diagnosis, reconfiguration and recovery techniques are described in some detail.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adamo, J.-M.; Bonello, Ch.: TéNor++: A Dynamic Configurer for SuperNode Machines. Proc. CONPAR 90-VAPP IV, Lecture Notes in Computer Science 457, 640–651, Springer-Verlag, Berlin 1990
Anderson, T.; Lee, P.A.: Fault Tolerance - Principles and Practice. Prentice/Hall, Englewood Cliffs 1981
Banerjee, P.: Strategies for Reconfiguring Hypercubes under Faults. Proc. 20th Int. Symp. on Fault-Tolerant Computing ‘FTSC-20’, 210–217, Newcastle upon Tyne 1990
Bauch, A.; Braam, R.; Maehle, E.: DAMP - A Dynamic Reconfigurable Multiprocessor System With a Distributed Switching Network. Proc. Distributed Memory Computing (ÉDMCC2), Lecture Notes in Computer Science 487,495–504, Springer-Verlag, Berlin 1991
Braam, R.; Mockenhaupt, J.; Pollmann, A.: Simulation von Beanspruchung und Verformung biologischer Gelenke auf dem dynamisch adaptierbaren Multiprozessorsystem DAMP. To appear: Proc. TAT’90, Transputer-Anwender-Treffen, Aachen 1990
Damsz, J.: Softwaremodul für den dezentralen Verbindungsaufbau im dynamisch adaptierbaren Multiprozessorsystem DAMP. Interner Arbeitsbericht Nr. 35, Fachgebiet Datentechnik, Universität-GH-Paderborn 1990
Görke, W.: Fehlertolerante Rechensysteme. Oldenbourg-Verlag, München Wien 1989
Hayes, J.P.; Mudge, T.: Hypercube Supercomputers. Proc. of the IEEE, Vol. 77, No. 12, Dec. 1989, 1829–1841.
Händler, W.; Maehle, E.; Wirl, K.: The DIRMU Testbed for High-Performance Multiprocessor Configurations. Proc. First Int. Conf. on Supercomputing Systems, 468–475, St. Petersburg FL, 1985
Occam 2 Reference Manual. Prentice Hall, New York London 1988
Transputer Reference Manual. Prentice Hall, New York London 1988
INM91] The T9000 Transputer Products Overview Manual, INMOS Limited 1991
Kübler, F.D.: A Cluster-Oriented Architecture for the Mapping of Parallel Processor Networks to High Performance Applications. Proc. Int. Conf. on Supercomputing, 179–189, ACM 1988
Lehmann, L.; Brehm, J.: Rollback-Recovery in Multiprocessor Ring Configurations. Proc. 3rd Int. Conf. on Fault-Tolerant Computing Systems, Informatik-Fachberichte 147,213–223, Springer-Verlag, Berlin Heidelberg 1987
Maehle, E.; Moritzen, K; Wirl, K.: A Graph Model and Its Application to a Fault-Tolerant Multiprocessor System. Proc. 16th Int. Symp. on Fault-Tolerant Computing ‘FTCS-16’, 292–297, Wien 1986
Moritzen, K.: System Level Fault-Diagnosis in Distributed Systems. Proc. 2nd Conf. ‘Fault-Tolerant Computing Systems’, Informatik-Fachberichte 84, 301–312, Springer-Verlag, Berlin Heidelberg 1984
Peercy. M.; Baneijee, P.: Distributed Algorithms for Shortest-Path Deadlock-Free Routing and Broadcasting in Arbitrarily Faulty Hypercubes. Proc. 20th Int. Symp. on Fault-Tolerant Computing ‘FTSC-20’, 218–225, Newcastle upon Tyne 1990
Rennels, D.A.: On Implementing Fault-Tolerance in Binary Hypercubes. Proc. 16th Int. Symp. on Fault-Tolerant Computing ‘FTCS-16’, 344–349, Wien 1986
Rost, J.; Maehle, E.: A Distributed Algorithm for Dynamic Task Scheduling. Proc. CONPAR 90-VAPP IV, Lecture Notes in Computer Science 457, 628–639, Springer-Verlag, Berlin 1990
Seidl, W.: Modelle der Fehlertoleranz in Nachrichten-gekoppelten Parallelrechnern. Proc. GI-18. Jahrestagung II, Informatik-Fachberichte 188, 366–378, Springer-Verlag, Berlin Heidelberg 1988
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bauch, A., Maehle, E. (1991). Self-Diagnosis, Reconfiguration and Recovery in the Dynamical Reconfigurable Multiprocessor System DAMP. In: Cin, M.D., Hohl, W. (eds) Fault-Tolerant Computing Systems. Informatik-Fachberichte, vol 283. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76930-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-76930-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-54545-3
Online ISBN: 978-3-642-76930-6
eBook Packages: Springer Book Archive