Reconfiguration of massively parallel systems

Vounckx, Johan; Deconinck, G.; Lauwereins, R.

doi:10.1007/BFb0046655

Johan Vounckx¹,
G. Deconinck¹ &
R. Lauwereins^1,2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 919))

Included in the following conference series:

International Conference on High-Performance Computing and Networking

119 Accesses

Abstract

The reconfiguration approach presented in this paper provides a solution to the need for fault tolerance in large systems. The developed techniques all have a data complexity and an execution time complexity less than proportional to the number of nodes in the system. Hence the approach is extremely suited for massively parallel systems. The reconfiguration strategy consists of four different subtasks, repartitioning (each application must have sufficient working processors), loading of injured networks, remapping (to replace faulty processors by working ones) and deadlock-free fault tolerant compact routing.

Supported by the EC as ESPRIT-project 6731 and by the IWT-Flanders

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vounckx J., Deconinck G., e.a.: The FTMPS-Project, Design and Implementation of Fault Tolerance Techniques for Massively Parallel Systems, HPCN 95, LNCS 797, Springer-Verlag, pp. 401–406, Munich, April 1994
Google Scholar
Mahmood A.: Concurrent Error Detection Using Watchdog Processors — A Survey. IEEE Trans. on Computers, 37 (2), 1990.
Google Scholar
Altmann J., Balbach F., Hein A.: An Approach for Hierarchical System Level Diagnosis of Massively Parallel Computers Combined with a Simulation-Based Method for Dependability Analysis, EDCC-1 conference, LNCS 852, Springer-Verlag, pp. 371–385, Berlin, October 1994
Google Scholar
Bieker B., Maehle E., Deconinck G., Vounckx J.: Reconfiguration and Checkpointing in Massively Parallel Systems, EDCC-1 conference, LNCS 852, Springer-Verlag, pp. 353–370, Berlin, October 1994
Google Scholar
Vounckx J., Deconinck G., Lauwereins R., Peperstraete J.A.: Fault-Tolerant Compact Routing based on Reduced Structural Information in Wormhole-Switching based Networks, Proc. SICC 94 conference, Ottawa, Canada, May 1994
Google Scholar
Vounckx J., Deconinck G., Lauwereins R., Peperstraete J.A.: Deadlock-Free Fault-Tolerant Wormhole Routing in Mesh based Massively Parallel Networks, IEEE TCAA Newsletter, accepted for publication (Automn 1994)
Google Scholar
Vounckx J., Deconinck G., Cuyvers R., Lauwereins R.: Minimal Deadlock-Free Compact Routing in Wormhole Switching based Injured Meshes, internal report KULeuven-ESAT, August 1994
Google Scholar
van Leeuwen J., Tan R.B.: Interval Routing, The Computer Journal, 30(4), 1987, pp. 298–307
MathSciNet Google Scholar
Tanenbaum A.S.: Computer Networks, Prentice-Hall, 1988
Google Scholar

Download references

Author information

Authors and Affiliations

K.U.Leuven-ESAT, K. Mercierlaan 94, B-3001, Heverlee, Belgium
Johan Vounckx, G. Deconinck & R. Lauwereins (Senior Research Associate)
Belgian National Science Foundation, Belgium
R. Lauwereins (Senior Research Associate)

Authors

Johan Vounckx
View author publications
You can also search for this author in PubMed Google Scholar
G. Deconinck
View author publications
You can also search for this author in PubMed Google Scholar
R. Lauwereins
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bob Hertzberger Giuseppe Serazzi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vounckx, J., Deconinck, G., Lauwereins, R. (1995). Reconfiguration of massively parallel systems. In: Hertzberger, B., Serazzi, G. (eds) High-Performance Computing and Networking. HPCN-Europe 1995. Lecture Notes in Computer Science, vol 919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0046655

Download citation

DOI: https://doi.org/10.1007/BFb0046655
Published: 02 February 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59393-5
Online ISBN: 978-3-540-49242-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics