Abstract
The Parallel Debugging Tool (PDT) of the Annai programming environment is developed within the Joint CSCS-ETH/NEC Collaboration in Parallel Processing [1]. Like the other components of the integrated environment, PDT aims to provide support for application developers to debug portable large-scale data-parallel programs based on HPF and message-passing programs based on the MPI standard. PDT supports MPI event tracing for race detection and deterministic replay for manually parallelized MPI programs as well as for code generated with the advanced techniques of a data-parallel compiler. This paper describes the tracing and replaying mechanisms included in PDT as well as their efficiency by presenting execution time overheads for several benchmark programs running on the NEC Cenju-2/3 distributed-memory parallel computers.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
C. Clémençon, K. M. Decker, A. Endo, J. Fritscher,G. Jost, N. Masuda, A. Müller, R. Rühl, W. Sawyer, E. de Sturler, B. J. N. Wylie, and F. Zimmermann. Application-Driven Development of an Integrated Tool Environment for Distributed Memory Parallel Processors. In R. Rao and C. P. Ravikumar, editors, Proceedings of the First International Workshop on Parallel Processing (Bangalore, India, December 27–30), 1994.
C. Clémençon, J. Fritscher, and R. Rühl. Execution control, visualization and replay of massively parallel programs within Annai's debugging tool. In Proc. High Performance Computing Symposium, HPCS'95, Montréal, CA, July 1995.
C. Clémençon, A. Endo, J. Fritscher, A. Müller, R. Rühl, and B. J. N. Wylie. The “Annai” Environment for Portable Distributed Parallel Programming. In Hesham El-Rewini and Bruce D. Shriver, editors, Proc. of the 28th Hawaii International Conference on System Sciences, Volume II (Maui, Hawaii, USA, 3–6 January, 1995), pages 242–251. IEEE Computer Society Press, January 1995.
A. Müller and R. Rühl. Extending HPF for the Support of Unstructured Computations. In Proc. ACM International Conference on Supercomputing, ICS'95, Barcelona, Spain, July 1995.
B. J. N. Wylie and A. Endo. Design and realization of the Annai integrated parallel programming environment performance monitor and analyser. Technical Report CSCS-TR-94-07, CSCS, CH-6928 Manno, Switzerland, November 1994.
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565, July 1978.
C. J. Fidge. Partial orders for parallel debugging. Proceedings of the ACM SIG-PLAN/SIGOPS Workshop on Parallel and Distributed Debugging, 24(1):183–194, January 1989. Published in ACM SIGPLAN Notices.
M. Singhal and A. Kshemkalyani. An efficient implementation of vector clocks. Information Processing Letters, 43(10):47–52, August 1992.
R. H. B. Netzer and B. P. Miller. Optimal tracing and replay for debugging message-passing parallel programs. In Proceedings of Supercomputing '92, pages 502–511, Minneapolis, MN, November 1992.
S. K. Damodaran-Kamal and J. M. Francioni. mdb: A semantic race detection tool for PVM. In Proceedings of the Scalable High-Performance Computing Conference, pages 702–709, May 1994.
E. Leu and A. Schiper. Execution replay: A mechanism for integrating a visualization tool with a symbolic debugger. In Proceedings of CONPAR '92, pages 55–66, September 1992.
J. May and F. Berman. Panorama: A portable, extensible parallel debugger. In Proceedings of ACM/ONR Workshop on Parallel and Distributed Debugging, pages 96–106, San Diego, California, May 1993.
Y. Saad. SPARSKIT: A basic tool kit for sparse matrix computation. CSRD Technical Report 1029, University of Illinois, IL, August 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Clémençon, C., Fritscher, J., Meehan, M.J., Rühl, R. (1995). An implementation of race detection and deterministic replay with MPI. In: Haridi, S., Ali, K., Magnusson, P. (eds) EURO-PAR '95 Parallel Processing. Euro-Par 1995. Lecture Notes in Computer Science, vol 966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020462
Download citation
DOI: https://doi.org/10.1007/BFb0020462
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60247-7
Online ISBN: 978-3-540-44769-6
eBook Packages: Springer Book Archive