Abstract
Execution replay is a debugging strategy where a program is run repeatedly on an input that manifests bugs. Replaying nondeterministic parallel programs requires special tools; otherwise, successive runs (on the same input) can differ, making bugs impossible to track. These tools must trace an execution so it can be replayed. We present improvements over our past work on an adaptive tracing strategy for shared-memory programs. Our past approach makes run-time tracing decisions by detecting and tracing exactly the non-transitive dynamic data dependences among the execution's shared data. Tracing the non-transitive dependences provides sufficient information for a replay. In this paper we show that tracing exactly these dependences is not necessary. Instead, we present two algorithms that introduce and trace artificial dependences among some events that are actually independent If no data dependence exists between two memory references during execution, we are free to artificially force them to execute in a specific order during replay. Artificial dependences reduce trace size, but introduce additional event orderings that have the potential of reducing the replay's parallelism. We present one algorithm that always adds dependences guaranteed not to be on the critical path (which do not slow replay). Another algorithm adds as many dependences as possible, slowing replay but reducing trace size further. Experiments show that we can improve the already high trace reduction of our past technique by up to two more orders of magnitude, without slowing replay. Our new techniques usually trace only 0.00025–0.2% of the shared-memory references, a 3–6 order of magnitude reduction over past approaches that trace every access.
This research was partly supported by ONR Contract N00014-91-J-4052 (ARPA Order 8225) and NSF grant CCR-9309311.
Preview
Unable to display preview. Download preview PDF.
References
Richard H. Carver and Kuo-Chung Tai, “Reproducible Testing of Concurrent Programs Based on Shared Variables,” 6th Intl. Conf. on Distributed Computing Systems, pp. 428–432 Boston, MA, (May 1986).
Anne Dinning and Edith Schonberg, “An Empirical Comparison of Monitoring Algorithms for Access Anomaly Detection,” 2nd ACM Symposium on Principles and Practice of Parallel Programming, pp. 1–10 Seattle, WA, (March 1990).
C. J. Fidge, “Partial Orders for Parallel Debugging,” SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, pp. 183–194 Madison, WI, (May 1988). Also appears in SIGPLAN Notices 24(1) (January 1989).
Thomas J. LeBlanc and John M. Mellor-Crummey, “Debugging Parallel Programs with Instant Replay,” IEEE Trans. on Computers C-36(4) pp. 471–482 (April 1987).
Robert H.B. Netzer and Barton P. Miller, “Optimal Tracing and Replay for Debugging Message-Passing Parallel Programs,” Supercomputing '92, pp. 502–511 Minneapolis, MN, (November 1992).
Robert H.B. Netzer, “Optimal Tracing and Replay for Debugging SharedMemory Parallel Programs,” ACM/ONR Workshop on Parallel and Distributed Debugging, pp. 1–11 San Diego, CA, (May 1993).
Robert H.B. Netzer and Jian Xu, “Adaptive Message Logging for Incremental Replay of Message-Passing Programs,” To appear in IEEE Parallel and Distributed Technology, (1994). Also appears in Supercomputing '93
Douglas Z. Pan and Mark A. Linton, “Supporting Reverse Execution of Parallel Programs,” SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, pp. 124–129 Madison, WI, (May 1988). Also appears in SIGPLAN Notices 24(1) (January 1989).
K. C. Tai, Richard H. Carver, and Evelyn E. Obaid, “Debugging Concurrent Ada Programs by Deterministic Execution,” IEEE Trans. on Software Engineering 17(1) pp. 45–63 (January 1991).
Jian Xu and Robert H.B. Netzer, “Adaptive Independent Checkpointing for Reducing Rollback Propagation,” IEEE Symp. on Parallel and Distributed Processing, Dallas, TX, (Dec 1993).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Netzer, R.H.B. (1994). Trace size vs parallelism in trace-and-replay debugging of shared-memory programs. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1993. Lecture Notes in Computer Science, vol 768. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57659-2_35
Download citation
DOI: https://doi.org/10.1007/3-540-57659-2_35
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57659-4
Online ISBN: 978-3-540-48308-3
eBook Packages: Springer Book Archive