Abstract
Several computing environments including wide area networks and nondedicated networks of workstations are characterized by frequent unavailability of the participating machines. Parallel computations, with interdependencies among their component processes, can not make progress if some of the participating machines become unavailable during the computation. As a result, to deliver acceptable performance, the set of participating processors must be dynamically adjusted following the changes in computing environment. In this paper, we discuss the design of a run time system to support a Virtual BSP Computer that allows BSP programmers to treat a network of transient processors as a dedicated network. The Virtual BSP Computer enables parallel applications to remove computations from processors that become unavailable and thereby adapt to the changing computing environment. The run time system, which we refer to as adaptive replication system (ARS), uses replication of data and computations to keep current a mapping of a set of virtual processors to a subset of the available machines. ARS has been implemented and integrated with a message passing library for the Bulk-Synchronous Parallel (BSP) model. The extended library has been applied to two parallel applications with the aim of using idle machines in a network of workstations (NOW) for parallel computations. We present the performance results of ARS for these applications.
This work was partially supported by NSF Grant CCR-9527151. The content does not necessarily reflect the position or policy of the U.S. Government.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gilbert Cabillic and Isabelle Puaut. Stardust: an environment for parallel programming on networks of heterogeneous workstations. J. Parallel and Distributed Computing, 40(1), Jan 1997.
Clemens H. Cap and Volker Strumpen. Efficient Parallel Computing in Distributed Workstation Environments. Parallel Computing, pages 1221–1234, 1993.
Nicholas Carriero, Eric Freeman, Gelernter, and David Kaminsky. Adaptive Parallelism and Piranha. Computer, 28(1):40–49, January 1995.
Message Passing Interface Forum. MPI: A Message Passing Interface Standard. Technical report, Message Passing Interface Forum, May 5, 1994.
L. Kleinrock and W.Korfhage. Collecting Unused Processing Capacity: An Analysis of Transient Distributed Systems. IEEE Transactions on Parallel and Distributed Systems, 4(5), May 1993.
J. Leon, Allan L. Fischer, and Peter Steenkiste. Fail-safe PVM: A portable package for distributed programming with transparent recovery. Technical report, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, Feb 1993.
Michael J. Litzkow, Miron Livny, and Matt W. Mutka. Condor — A Hunter of Idle Workstations. In Proc. 8th Intl. Conf. Distributed Computing Systems, San Jose, California, June 13–17, 1988.
Richard Miller. A Library for Bulk-synchronous Parallel Programming. In British Computer Society Workshop on General Purpose Parallel Computing, Dec 1993.
M. V. Nibhanupudi, C. D. Norton, and B. K. Szymanski. Plasma Simulation On Networks Of Workstations Using The Bulk-Synchronous Parallel Model. In Proc. Intl. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA'95), Athens, Georgia, Nov 1995.
M. V. Nibhanupudi and B. K. Szymanski. Adaptive Parallelism In The Bulk-Synchronous Parallel model. In Proceedings of the Second International Euro-Par Conference, Lyon, France, Aug 1996.
J. K. Ousterhout. Scheduling techniques for concurrent systems. In Proc. Third Intl. Conf. Distributed Computing Systems, Oct 1982.
G. Stellner. CoCheck: Checkpointing and process migration for MPI. In Proceedings of the International Parallel Processing Symposium, April 1996.
V. S. Sunderam. PVM: A Framework for Parallel Distributed Computing. Concurrency: Practice and Experience, 2(4):315–339, 1990.
Leslie G. Valiant. A Bridging Model for Parallel Computation. Communications of the ACM, 33(8):103–111, August 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nibhanupudi, M.V., Szymanski, B.K. (1998). Runtime support for virtual BSP computer. In: Rolim, J. (eds) Parallel and Distributed Processing. IPPS 1998. Lecture Notes in Computer Science, vol 1388. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-64359-1_685
Download citation
DOI: https://doi.org/10.1007/3-540-64359-1_685
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64359-3
Online ISBN: 978-3-540-69756-5
eBook Packages: Springer Book Archive