Skip to main content

Migration and rollback transparency for arbitrary distributed applications in workstation clusters

  • Workshop on Run-Time Systems for Parallel Programming Matthew Haines, University of Wyoming, USA Koen Langendoen, Vrije Universiteit, The Netherlands Greg Benson, University of Califonia at Davis, USA
  • Conference paper
  • First Online:
Parallel and Distributed Processing (IPPS 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1388))

Included in the following conference series:

Abstract

Programmers and users of compute intensive scientific applications often do not want to (or even cannot) code load balancing and fault tolerance into their programs.

The Beam system [18] uses a global virtual name space to provide migration and rollback transparency in user space for distributed groups of processes on workstations. The system calls are interposed and their parameters translated between the name spaces. Unlike other migration mechanisms, Beam does not require the applications to be written for a specific programming model or communication library.

In this paper we describe design and implementation of a separate system call interposition process [3] that accesses the application via the debugging interface. The main advantage of this approach is that it can handle even unmodified (e. g. commercially bought) application programs. We compare measured performance figures with previous similar approaches [15, 20].

At the time of writing funded by DFG contract SFB 342 at Institute for Computer Science, Technical University Munich

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.D. Alexandrov, M. Ibel, K.E. Schauser, and C.J. Scheiman. Extending the Operating System at the User Level: the Ufo Global File System. In USENIX Technical Conference Proceedings, pages 77–90, Anaheim, CA, January 1997.

    Google Scholar 

  2. D. Andres, C. Elford, B. Fin, and L. Smith. Dynamic load balancing in PVM. Technical report, University of Illinois at Urbanna-Champaign, April 1993.

    Google Scholar 

  3. M. Bolz. Transparent Redirection of System Calls for Unmodified Programs in Beam Master's thesis, Institut für Betriebssysteme und Rechnerverbund, TU Braunschweig, November 1997. (In German).

    Google Scholar 

  4. J. Cargille and B.P. Miller. Binary Wrapping: A Technique for Instrumenting Object Code. ACM Sigplan Notices, 27(6):17–18, June 1992.

    Google Scholar 

  5. J. Casas, D.L. Clark, R. Konuru, S.W. Otto, R.M. Prouty, and J. Walpole. MPVM: A migration transparent version of PVM. Computing Systems, 8(2):171–216, 1995.

    Google Scholar 

  6. CCS Annual Report. WWW page, Center for Computational Sciences, Oak Ridge National Laboratory, 1995.http://www.ccs.ornl.org/AnRep95/CCS95.html.

    Google Scholar 

  7. R. Faulkner and R. Gomes. The Process File System and Process Model in UNIX System V. In USENIX Technical Conference Proceedings, pages 243–252, Dallas, TX, January 1991.

    Google Scholar 

  8. Al Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel Virtual Machine — A Users' Guide and Tutorial for Networked Parallel Computing. The MIT Press, Cambridge, Massachusetts, 1994.

    Google Scholar 

  9. M.B. Jones.Transparently Interposing User Code at the System Interface. PhD thesis, CMU, September 1992.

    Google Scholar 

  10. A.H. Karp, M. Heath, and Al Geist. 1995 Gordon Bell Prize Winners. IEEE Computer, 29(1):79–85, January 1996.

    Google Scholar 

  11. J. León, A.L, Fisher, and P. Steenkiste. Fail-save PVM: A portable package for distributed programming with Transparent Recovery. Report CMU-CS-93-124, Carnegie Mellon University, February 1993.

    Google Scholar 

  12. M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny. Checkpointing and Migration of UNIX Processes in the Condor Distributed Processing System. Report 1346, University of Wisconsin-Madison Computer Sciences, April 1997.

    Google Scholar 

  13. M.J. Litzkow and M. Solomon. Supporting Checkpointing and Process Migration Outside the UNIX Kernel. In USENIX Technical Conference Proceedings, pages 283–290, San Francisco, CA, January 1992.

    Google Scholar 

  14. D. Long, J. Caroll, and C. Park. A Study of the Reliability of Internet Sites. In Proceedings of the 10th Symposium on Reliable Distributed Systems, pages 177–186,1991.

    Google Scholar 

  15. K.I. Mandelberg and V.S. Sunderam. Process Migration in UNIX Networks. In USENIX Technical Conference Proceedings, pages 357–363, Dallas, TX, February 1988.

    Google Scholar 

  16. Message Passing Interface Forum MPIF. MPI-2: Extensions to the Message-Passing Interface. Technical report, University of Tennessee, Knoxville, July 1997. http://www.mpi-forum.org.

    Google Scholar 

  17. S. Petri, M. Bolz, and H. Langendörfer. Transparent Migration and Rollback for Unmodified Applications in Workstation Clusters. Informatik-Bericht 98-02, TU Braunschweig, April 1998. To appear.

    Google Scholar 

  18. S. Petri and H. Langendbrfer. Load Balancing and Fault Tolerance in Workstation Clusters — Migrating Groups of Communicating Processes. Operating Systems Review, 29(4):25–36, October 1995.

    Article  Google Scholar 

  19. S. Petri, B. Schnor, M. Becker, B. Hinrichs, T. Tschamtke, and H. Langendörfer. Evaluation of Multicast Methods to Maintain a Global Name Space for Transparent Process Migration in Workstation Clusters. In Kommunikation in Verteilten Systemen, pages 224–234. GI/ITG Fachtagung KIVS'97, Springer, February 1997.

    Google Scholar 

  20. S. Petri, B. Schnor, H. Langendbrfer, and J. Steinborn. Consistent Global Checkpoints for Distributed Applications on Clusters of Unix Workstations. In Paralleles und Verteiltes Rechnen — Beiträge zum 4. Workshop über Wissenschaftliches Rechnen, pages 77–86, Aachen, October 1996. TU Braunschweig, Shaker.

    Google Scholar 

  21. T Shirakihara, H. Hirayama, K. Sato, and T. Kanai. ARTEMIS: Advanced Reliable disTributed Environment Middleware System. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA'97, pages 97–106, Las Vegas, NV, July 1997.

    Google Scholar 

  22. G. Stellner. CoCheck: Checkpointing and Process Migration for MPI. In Proceedings of the 10th International Parallel Processing Symposium (IPPS '96), Honolulu, Hawaii, April 1996.

    Google Scholar 

  23. Sun Microsystems. SunOS Reference Manual, 1990. Revision A.

    Google Scholar 

  24. J. Trinitis. An External Checkpointing Technique for Integration into a Parallel Tool Environment. In preparation. trinitis@informatik.tu-muenchen.de, 1998.

    Google Scholar 

  25. J.J.J. Vesseur, R.N. Heederik, B.J. Overeinder, and P.M.A. Sloot. Experiments in Dynamic Load Balancing for Parallel Cluster Computing. In Proceedings of the Workshop on Parallel Programming and Computation (ZEUS'95) and the 4th Nordic Transputer Conference (NTUG'95), pages 189–194, Amsterdam, June 1995. IOS Press. *** DIRECT SUPPORT *** A0008D07 00007

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

José Rolim

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Petri, S., Bolze, M., Langendörfer, H. (1998). Migration and rollback transparency for arbitrary distributed applications in workstation clusters. In: Rolim, J. (eds) Parallel and Distributed Processing. IPPS 1998. Lecture Notes in Computer Science, vol 1388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64359-1_686

Download citation

  • DOI: https://doi.org/10.1007/3-540-64359-1_686

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64359-3

  • Online ISBN: 978-3-540-69756-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics