Abstract
The paper presents a definition of replay of a distributed application as a function of three parameters: depth, width, and length. It addresses the problem of nondeterminism in distributed system and proposes an efficient approach to trace a PVM application behaviour in order to eliminate races in repetited execution. Detecting races in distributed computations requires implementation of a strongly consistent system of vector clocks. Therefore a system of vector clocks was adapted for a dynamic application model. Finally it presents the architecture of a tool supporting replay of PVM applications.
This work was sponsored by INCO-Esprit KIT Project no. 997100 (Parallel Processing Tools. Integration and Results Dissemination).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dione, C., Feeley, M., Desbiens, J.: A Taxonomy of Distributed Debuggers Based on Execution Replay. Proc. of the International Conference on Parallel and Distributed Techniques and Applications, Sunnyvale, California (1996)
Damodaran-Kamal, S.K., Francioni, J.M.: Testing Races in Parallel Programs with an OtOt Strategy. Proc. of the 1994 International Symposium on Software Testing and Analysis (ISSTA), ACM Sigsoft, ACM Press, New York (1994) 216–227
Fagot, A., de Kergommeaux, J.C.: Systematic Assessment of the Overhead of Tracing Parallel Programs. Proc. of PDP’96, IEEE Computer Society, (1996) 179–186
Geist, G.A., Beguelin, A., Dongarra, J.J., Jiang, W., Manchek, R., Sunderam, V.S.: PVM: Parallel Virtual Machine, A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA, (1994)
Krawczyk, H., Wiszniewski, B., Kuzora, P., Neyman, M., Proficz, J.: Integrated Static and Dynamic Analysis of PVM Programs with STEPS. Computers and Artificial Intelligence, 17(5) (1998) 441–453
Lamport, L.: Time, clocks and the ordering of events in a distributed system. Communications of ACM, 21(7) (1978) 558–565
Lourenço, J., Cunha, J.C.: Replaying Distributed Applications with RPVM. Proc. of DAPSYS’98, (1998)
Lourenço, J., Cunha, J.C., Krawczyk, H., Kuzora, P., Neyman, M., Wiszniewski, B.: An integrated testing and debugging environment for parallel and distributed programs. Proc. of the 23rd Euromicro Conference (EUROMICRO’97), IEEE Computer Society Press, Budapest, Hungary, (1997) 291–298
Mackey, M.: Program Replay in PVM. Technical Report, Hewlett Packard, Concurrent Computing Department, Hewlett Packard Laboratories, (1993)
Neyman, M.: Non-deterministic Recovery of Computations in Testing of Distributed Systems. Proc. of Ninth European Workshop on Dependable Computing, (1998) 114–117
Netzer, R.B., Miller, B.P.: Optimal Tracing and Replay for debugging messagepassing parallel programs. The Journal of Supercomputing, 8(4) (1995) 371–388
Raynal, M., Singhal, M.: Logical Time: Capturing Causality in Distributed Systems. IEEE Computer, 1 (1996) 49–56
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Neyman, M., Bukowski, M., Kuzora, P. (1999). Efficient Replay of PVM Programs. In: Dongarra, J., Luque, E., Margalef, T. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 1999. Lecture Notes in Computer Science, vol 1697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48158-3_11
Download citation
DOI: https://doi.org/10.1007/3-540-48158-3_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66549-6
Online ISBN: 978-3-540-48158-4
eBook Packages: Springer Book Archive