Trace size vs parallelism in trace-and-replay debugging of shared-memory programs

Netzer, Robert H. B.

doi:10.1007/3-540-57659-2_35

Robert H. B. Netzer¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 768))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

144 Accesses

Abstract

Execution replay is a debugging strategy where a program is run repeatedly on an input that manifests bugs. Replaying nondeterministic parallel programs requires special tools; otherwise, successive runs (on the same input) can differ, making bugs impossible to track. These tools must trace an execution so it can be replayed. We present improvements over our past work on an adaptive tracing strategy for shared-memory programs. Our past approach makes run-time tracing decisions by detecting and tracing exactly the non-transitive dynamic data dependences among the execution's shared data. Tracing the non-transitive dependences provides sufficient information for a replay. In this paper we show that tracing exactly these dependences is not necessary. Instead, we present two algorithms that introduce and trace artificial dependences among some events that are actually independent If no data dependence exists between two memory references during execution, we are free to artificially force them to execute in a specific order during replay. Artificial dependences reduce trace size, but introduce additional event orderings that have the potential of reducing the replay's parallelism. We present one algorithm that always adds dependences guaranteed not to be on the critical path (which do not slow replay). Another algorithm adds as many dependences as possible, slowing replay but reducing trace size further. Experiments show that we can improve the already high trace reduction of our past technique by up to two more orders of magnitude, without slowing replay. Our new techniques usually trace only 0.00025–0.2% of the shared-memory references, a 3–6 order of magnitude reduction over past approaches that trace every access.

This research was partly supported by ONR Contract N00014-91-J-4052 (ARPA Order 8225) and NSF grant CCR-9309311.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Richard H. Carver and Kuo-Chung Tai, “Reproducible Testing of Concurrent Programs Based on Shared Variables,” 6th Intl. Conf. on Distributed Computing Systems, pp. 428–432 Boston, MA, (May 1986).
Google Scholar
Anne Dinning and Edith Schonberg, “An Empirical Comparison of Monitoring Algorithms for Access Anomaly Detection,” 2nd ACM Symposium on Principles and Practice of Parallel Programming, pp. 1–10 Seattle, WA, (March 1990).
Google Scholar
C. J. Fidge, “Partial Orders for Parallel Debugging,” SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, pp. 183–194 Madison, WI, (May 1988). Also appears in SIGPLAN Notices 24(1) (January 1989).
Google Scholar
Thomas J. LeBlanc and John M. Mellor-Crummey, “Debugging Parallel Programs with Instant Replay,” IEEE Trans. on Computers C-36(4) pp. 471–482 (April 1987).
Google Scholar
Robert H.B. Netzer and Barton P. Miller, “Optimal Tracing and Replay for Debugging Message-Passing Parallel Programs,” Supercomputing '92, pp. 502–511 Minneapolis, MN, (November 1992).
Google Scholar
Robert H.B. Netzer, “Optimal Tracing and Replay for Debugging SharedMemory Parallel Programs,” ACM/ONR Workshop on Parallel and Distributed Debugging, pp. 1–11 San Diego, CA, (May 1993).
Google Scholar
Robert H.B. Netzer and Jian Xu, “Adaptive Message Logging for Incremental Replay of Message-Passing Programs,” To appear in IEEE Parallel and Distributed Technology, (1994). Also appears in Supercomputing '93
Google Scholar
Douglas Z. Pan and Mark A. Linton, “Supporting Reverse Execution of Parallel Programs,” SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, pp. 124–129 Madison, WI, (May 1988). Also appears in SIGPLAN Notices 24(1) (January 1989).
Google Scholar
K. C. Tai, Richard H. Carver, and Evelyn E. Obaid, “Debugging Concurrent Ada Programs by Deterministic Execution,” IEEE Trans. on Software Engineering 17(1) pp. 45–63 (January 1991).
Article Google Scholar
Jian Xu and Robert H.B. Netzer, “Adaptive Independent Checkpointing for Reducing Rollback Propagation,” IEEE Symp. on Parallel and Distributed Processing, Dallas, TX, (Dec 1993).
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Brown University, Box 1910, 02912, Providence, RI
Robert H. B. Netzer

Authors

Robert H. B. Netzer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Netzer, R.H.B. (1994). Trace size vs parallelism in trace-and-replay debugging of shared-memory programs. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1993. Lecture Notes in Computer Science, vol 768. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57659-2_35

Download citation

DOI: https://doi.org/10.1007/3-540-57659-2_35
Published: 31 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57659-4
Online ISBN: 978-3-540-48308-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics