Skip to main content

Distributed resource management for parallel applications in networks of workstations

  • Conference paper
  • First Online:
High-Performance Computing and Networking (HPCN-Europe 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1225))

Included in the following conference series:

Abstract

Running parallel applications in a network of workstations (NOW) requires the use of a resource management system with batch queueing and load balancing functionalities to utilize idle workstations in the NOW and to avoid load imbalance in the network.

A resource management system for parallel jobs requires special functionalities to schedule jobs to hosts and to support checkpointing and migration of parallel applications. This paper describes the essential components of a distributed resource management system supporting parallel computations in a NOW and how to reuse existing resource management components for this approach.

The implementation of a distributed resource manager demonstrates the practical relevance of the design concept.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, and Vaidy Sunderam. PVM: Parallel Virtual Machine — A Users' Guide and Tutorial for Networked Parallel Computing. Scientific and Engineering Computation. The MIT Press, Cambridge, MA, 1994.

    Google Scholar 

  2. GENIAS Software GmbH, Erzgebirgstr. 2B, D-93073 Neutraubling, Germany. CODINE Reference Manual, Version 4.0, 1996.

    Google Scholar 

  3. Thomas P. Green and J. Snyder. DQS, A Distributed Queuing System. Technical report, Florida State University, March 1992.

    Google Scholar 

  4. Peter Luksch, Ursula Maier, Sabine Rathmayer, Friedemann Unger, and Matthias Weidmann. Parallelization of a state-of-the-art industrial CFD Package for Execution on Networks of Workstations and Massively Parallel Processors. In Third European PVM Users' Group Meeting, EuroPVM 96, München, October 1996.

    Google Scholar 

  5. Peter Luksch, Ursula Maier, Sabine Rathmayer, and Matthias Weidmann. Software Engineering Methods for Parallel and Distributed Scientific Computing. In HPCN Europe 1996, Lecture Notes in Computer Science. Springer-Verlag, April 1996.

    Google Scholar 

  6. Michael Litzkow and Marvin Solomon. Supporting checkpointing and process migration outside the UNIX kernel. In Proceedings of the USENIX Winter Conference, San Francisco, CA, January 1992.

    Google Scholar 

  7. Thomas Ludwig. Automatische Lastverwaltung für Parallelrechner. Reihe Informatik. BI-Wissenschaftsverlag, Mannheim, 1993.

    Google Scholar 

  8. Christoph Pleier. Prozeβverlagerung in heterogenen Rechnernetzen basierend auf einer speziellen Übersetzungstechnik. Informatik. Herbert Utz Verlag Wissenschaft, München, 1996.

    Google Scholar 

  9. Georg Stellner and Jim Pruyne. Resource Management and Checkpointing for PVM. In Proceedings of the 2nd European PVM Users' Group Meeting, pages 131–136, Lyon, September 1995. Editions Hermes.

    Google Scholar 

  10. Georg Stellner. CoCheck: Checkpointing and Process Migration for MPI. In Proceedings of the International Parallel Processing Symposium, pages 526–531, Honolulu, HI, April 1996. IEEE Computer Society Press, 10662 Los Vaqueros Circle, P.O. Box 3014, Los Alamitos, CA 90720-1264.

    Google Scholar 

  11. Todd Tannenbaum and Michael Litzkow. The Condor Distributed Processing System. Dr. Dobb's Journal, (2):40–48, February 1995.

    Google Scholar 

  12. Avi Ziv and Jehoshua Bruck. Checkpointing in Parallel and Distributed Systems. In Albert Zomaya, editor, Parallel and Distributed Computing Handbook, Series on Computer Engineering, chapter 10, pages 274–302. McGraw-Hill, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bob Hertzberger Peter Sloot

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maier, U., Stellner, G. (1997). Distributed resource management for parallel applications in networks of workstations. In: Hertzberger, B., Sloot, P. (eds) High-Performance Computing and Networking. HPCN-Europe 1997. Lecture Notes in Computer Science, vol 1225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0031618

Download citation

  • DOI: https://doi.org/10.1007/BFb0031618

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62898-9

  • Online ISBN: 978-3-540-69041-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics