Skip to main content

Reconfiguration of massively parallel systems

  • Conference paper
  • First Online:
High-Performance Computing and Networking (HPCN-Europe 1995)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 919))

Included in the following conference series:

  • 119 Accesses

Abstract

The reconfiguration approach presented in this paper provides a solution to the need for fault tolerance in large systems. The developed techniques all have a data complexity and an execution time complexity less than proportional to the number of nodes in the system. Hence the approach is extremely suited for massively parallel systems. The reconfiguration strategy consists of four different subtasks, repartitioning (each application must have sufficient working processors), loading of injured networks, remapping (to replace faulty processors by working ones) and deadlock-free fault tolerant compact routing.

Supported by the EC as ESPRIT-project 6731 and by the IWT-Flanders

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vounckx J., Deconinck G., e.a.: The FTMPS-Project, Design and Implementation of Fault Tolerance Techniques for Massively Parallel Systems, HPCN 95, LNCS 797, Springer-Verlag, pp. 401–406, Munich, April 1994

    Google Scholar 

  2. Mahmood A.: Concurrent Error Detection Using Watchdog Processors — A Survey. IEEE Trans. on Computers, 37 (2), 1990.

    Google Scholar 

  3. Altmann J., Balbach F., Hein A.: An Approach for Hierarchical System Level Diagnosis of Massively Parallel Computers Combined with a Simulation-Based Method for Dependability Analysis, EDCC-1 conference, LNCS 852, Springer-Verlag, pp. 371–385, Berlin, October 1994

    Google Scholar 

  4. Bieker B., Maehle E., Deconinck G., Vounckx J.: Reconfiguration and Checkpointing in Massively Parallel Systems, EDCC-1 conference, LNCS 852, Springer-Verlag, pp. 353–370, Berlin, October 1994

    Google Scholar 

  5. Vounckx J., Deconinck G., Lauwereins R., Peperstraete J.A.: Fault-Tolerant Compact Routing based on Reduced Structural Information in Wormhole-Switching based Networks, Proc. SICC 94 conference, Ottawa, Canada, May 1994

    Google Scholar 

  6. Vounckx J., Deconinck G., Lauwereins R., Peperstraete J.A.: Deadlock-Free Fault-Tolerant Wormhole Routing in Mesh based Massively Parallel Networks, IEEE TCAA Newsletter, accepted for publication (Automn 1994)

    Google Scholar 

  7. Vounckx J., Deconinck G., Cuyvers R., Lauwereins R.: Minimal Deadlock-Free Compact Routing in Wormhole Switching based Injured Meshes, internal report KULeuven-ESAT, August 1994

    Google Scholar 

  8. van Leeuwen J., Tan R.B.: Interval Routing, The Computer Journal, 30(4), 1987, pp. 298–307

    MathSciNet  Google Scholar 

  9. Tanenbaum A.S.: Computer Networks, Prentice-Hall, 1988

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bob Hertzberger Giuseppe Serazzi

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vounckx, J., Deconinck, G., Lauwereins, R. (1995). Reconfiguration of massively parallel systems. In: Hertzberger, B., Serazzi, G. (eds) High-Performance Computing and Networking. HPCN-Europe 1995. Lecture Notes in Computer Science, vol 919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0046655

Download citation

  • DOI: https://doi.org/10.1007/BFb0046655

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-59393-5

  • Online ISBN: 978-3-540-49242-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics