Abstract
The reconfiguration approach presented in this paper provides a solution to the need for fault tolerance in large systems. The developed techniques all have a data complexity and an execution time complexity less than proportional to the number of nodes in the system. Hence the approach is extremely suited for massively parallel systems. The reconfiguration strategy consists of four different subtasks, repartitioning (each application must have sufficient working processors), loading of injured networks, remapping (to replace faulty processors by working ones) and deadlock-free fault tolerant compact routing.
Supported by the EC as ESPRIT-project 6731 and by the IWT-Flanders
Preview
Unable to display preview. Download preview PDF.
References
Vounckx J., Deconinck G., e.a.: The FTMPS-Project, Design and Implementation of Fault Tolerance Techniques for Massively Parallel Systems, HPCN 95, LNCS 797, Springer-Verlag, pp. 401–406, Munich, April 1994
Mahmood A.: Concurrent Error Detection Using Watchdog Processors — A Survey. IEEE Trans. on Computers, 37 (2), 1990.
Altmann J., Balbach F., Hein A.: An Approach for Hierarchical System Level Diagnosis of Massively Parallel Computers Combined with a Simulation-Based Method for Dependability Analysis, EDCC-1 conference, LNCS 852, Springer-Verlag, pp. 371–385, Berlin, October 1994
Bieker B., Maehle E., Deconinck G., Vounckx J.: Reconfiguration and Checkpointing in Massively Parallel Systems, EDCC-1 conference, LNCS 852, Springer-Verlag, pp. 353–370, Berlin, October 1994
Vounckx J., Deconinck G., Lauwereins R., Peperstraete J.A.: Fault-Tolerant Compact Routing based on Reduced Structural Information in Wormhole-Switching based Networks, Proc. SICC 94 conference, Ottawa, Canada, May 1994
Vounckx J., Deconinck G., Lauwereins R., Peperstraete J.A.: Deadlock-Free Fault-Tolerant Wormhole Routing in Mesh based Massively Parallel Networks, IEEE TCAA Newsletter, accepted for publication (Automn 1994)
Vounckx J., Deconinck G., Cuyvers R., Lauwereins R.: Minimal Deadlock-Free Compact Routing in Wormhole Switching based Injured Meshes, internal report KULeuven-ESAT, August 1994
van Leeuwen J., Tan R.B.: Interval Routing, The Computer Journal, 30(4), 1987, pp. 298–307
Tanenbaum A.S.: Computer Networks, Prentice-Hall, 1988
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vounckx, J., Deconinck, G., Lauwereins, R. (1995). Reconfiguration of massively parallel systems. In: Hertzberger, B., Serazzi, G. (eds) High-Performance Computing and Networking. HPCN-Europe 1995. Lecture Notes in Computer Science, vol 919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0046655
Download citation
DOI: https://doi.org/10.1007/BFb0046655
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59393-5
Online ISBN: 978-3-540-49242-9
eBook Packages: Springer Book Archive