Non-Intrusive Detection of Synchronization Errors Using Execution Replay

Ronsse, Michiel; De Bosschere, Koen

doi:10.1023/A:1013236320820

Non-Intrusive Detection of Synchronization Errors Using Execution Replay

Published: January 2002

Volume 9, pages 95–121, (2002)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Michiel Ronsse¹ &
Koen De Bosschere¹

69 Accesses
4 Citations
Explore all metrics

Abstract

This paper presents a practical solution for detecting synchronization errors in parallel programs. These errors are: a lack of synchronization resulting in data races, conflicting synchronization resulting in deadlock and redundant synchronization resulting in a performance penalty.

The solution consists of a combination of RecPlay, an efficient execution replay mechanism combined with automatic on-the-fly data race detection, deadlock detection and the detection of redundant synchronization during a replayed execution. The detection of data races, deadlocks and redundant synchronization normally introduces an important overhead during an execution, possibly altering the execution. However, by performing these extensive operations during a replayed and therefore unaltered execution there is almost no probe effect. Furthermore, the memory consumption during the data race detection is limited through the use of multilevel bitmaps and snooped matrix clocks. As the record phase of RecPlay is highly efficient, there is no need to switch it off, hereby eliminating the possibility of Heisenbugs because tracing can be left on all the time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Adve, S., Hill, M., and Netzer, R. 1991. Detecting data races on weak memory systems. In Proceedings of the 18th Annual Symposium on Computer Architectures, pp. 234–243.
Aldrich, J., Chambers, C., Sirer, E.G., and Eggers, S. 1999. Eliminating unnecessary synchronization from Java programs. In Proceedings of the Static Analyses Symposium, Venice, Italy, pp. 19–38.
Apt, K. 1986. Correctness proofs of distributed termination algorithms. ACM Transactions on Programming Languages and Systems, 8:388–405.
Google Scholar
Audenaert, K. and Levrouw, L. 1995. Space efficient data race detection for parallel programs with series-parallel task graphs. In Proceedings of the Third Euromicro Workshop on Parallel and Distributed Processing, San Remo, pp. 508–515. Los Alamitos, CA: IEEE Computer Society Press.
Google Scholar
Beranek, A. 1992. Data race detection based on execution replay for parallel applications. In Proceedings of CONPAR '92, France: Lyon, pp. 109–114.
Google Scholar
Blanchet, B. 1999. Escape analysis for object oriented languages: Application to Java. In Proceedings of the 14th Annual Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA99), Denver, Colorado, pp. 20–34.
Chen, D.-K. and Yew, P.-C. 1999. Redundant synchronization elimination for DOACROSS loops. IEEE Transactions on Parallel and Distributed Systems, 10(5):459–470.
Google Scholar
Chen, S., Deng, Y., Attie, P., Serrano, M., Sreedhar, V.D., and Sun, W. 1996. Optimal deadlock detection in distributed systems based on locally constructed wait-for-graphs. In Proc. of the 16th Int. Conf. on Distributed Computing System, pp. 613–619. IEEE CS.
Choi, J.-D., Gupta, M., Serrano, M., Sreedhar, V.D., and Midkiff, S. 1999. Escape analysis for Java. In Proceedings of the 14th Annual Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA99). Denver, Colorado, pp. 1–19.
Choi, J.-D. and Min, S.L. 1991. RACE FRONTIER: Reproducing data races in parallel-program debugging. In Proc. of the Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, Vol. 26, pp. 145–154.
Google Scholar
Christiaens, M. and De Bosschere, K. 2001. TRaDe, A topological approach to on-the-fly race detection in Java programs. In Java Virtual Machine Research and Technology Symposium (JVM'01), pp. 105–116. USENIX.
Coffman, E., Elphick, M., and Shoshani, A. 1971. System deadlocks. ACM Computing Surveys, 3(2):67–78.
Google Scholar
De Bosschere, K. and Ronsse, M. 1997. Clock snooping and its application in on-the-fly data race detection. In Proceedings of the 1997 International Symposium on Parallel Algorithms and Networks (I-SPAN'97), Taipei, pp. 324–330. Los Alamitos, CA: IEEE Computer Society.
Google Scholar
Dijkstra, E. 1968. Co-operating sequential processes. Programming Languages, pp. 43–112.
Fidge, C.J. 1991. Logical time in distributed computing systems. In IEEE Computer, 24:28–33.
Google Scholar
Gait, J. 1986. A probe effect in concurrent programs. Software—Practice and Experience, 16(3):225–233.
Google Scholar
Knapp, E. 1987. Deadlock detection in distributed databases. ACM Computing Surveys, 19:303–328.
Google Scholar
Lamport, L. 1978. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565.
Google Scholar
Levrouw, L.J., Audenaert, K.M., and Van Campenhout, J.M. 1994. A new trace and replay system for shared memory programs based on lamport clocks. In Proceedings of the Second Euromicro Workshop on Parallel and Distributed Processing, pp. 471–478. Los Alamitos, CA: IEEE Computer Society Press.
Google Scholar
Lu, H.-I., Klein, P.N., and Netzer, R.H.B. 1993. Detecting race conditions in parallel programs that use one semaphore. In Workshop on Algorithms and Data Structures (WADS), Montreal, pp. 471–482.
Martin, J. and Jassim, S. 1997. A tool for proving deadlock freedom. In Parallel Programming and Java. Amsterdam: IOS Press, pp. 1–16.
Google Scholar
Martin, J.M.R. and Huddart, Y. 2000. Parallel algorithms for deadlock and livelock analysis of concurrent systems. In Proceedings of Communicating Process Architecture 2000. Amsterdam: IOS Press.
Google Scholar
Mattern, F. 1989. Virtual time and global states of distributed systems. In Cosnard, M. et al. editors, Proceedings of the Intl. Workshop on Parallel and Distributed Algorithms. North-Holland: Elsevier Science Publishers B.V., pp. 215–226.
Google Scholar
Mellor-Crummey, J.M. 1991. On-the-fly detection of data races for programs with nested fork-join parallelism. In Proceedings of Supercomputing '91, pp. 24–33.
Netzer, R. and Miller, B. 1991. Improving the accuracy of data race detection. In Proceedings of the 1991 Conference on the Principles and Practice of Parallel Programming, pp. 133–144.
Netzer, R.H. 1993. Optimal tracing and replay for debugging shared-memory parallel programs. In Proceedings ACM/ONR Workshop on Parallel and Distributed Debugging, pp. 1–11.
Netzer, R.H. and Miller, B.P. 1992. What are race conditions? Some issues and formalizations. ACM Letters on Programming Languages and Systems 1(1):79–88.
Google Scholar
Netzer, R.H.B. and Miller, B.P. 1990. On the complexity of event ordering for shared-memory parallel program executions. In International Conference on Parallel Processing, pp. 93–97.
Perkovic, D. and Keleher, P.J. 1996. Online data-race detection via coherency guarantees. Seattle, pp. 47–57. The Second Symposium on Operating Systems Design and Implementation (OSDI '96) Proceedings.
Philippsen, M. and Heinz, E. 1995. Automatic synchronization elimination in synchronous FORALLs. In Frontiers '95: The 5th Symposium on the Frontiers of Massively Parallel Computation, McLean, VA, pp. 350–357.
Rajamony, R. and Cox, A. 1997a. Optimally synchronizing DOACROSS loops on shared memory multiprocessors. In Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques (PACT'97), pp. 214–224.
Rajamony, R. and Cox, A. 1997b. Performance debugging shared memory parallel programs using run-time dependence analysis. In Proceedings of the 1997 ACMSIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Seattle, WA, pp. 75–87.
Raynal, M. and Singhal, M. 1996. Logical clocks: Capturing causality in distributed systems. IEEE Computer, (2):49–56.
Ronsse, M. and De Bosschere, K. 1999. RecPlay: pp. 43–54. A fully integrated practical record/replay system. ACM Transactions on Computer Systems, 17(2):133–152.
Google Scholar
Ronsse, M. and De Bosschere, K. 2001. JiTI: A robust just in time instrumentation technique. In Proceedings of WBT-2000 (Workshop on Binary Translation). Philadelphia. In Computer Architecture News, Vol. 29, No. 1, New York: ACM Press.
Google Scholar
Ronsse, M. and Levrouw, L. 1996. On the implementation of a replay mechanism. In L. Bouge, P. Fraigniaud, A. Mignotte, and Y. Robert, editors, In Computer Architecture News, Vol. 29(1) ACM Press, Proceedings of EuroPar '96, Vol. 1123 of LNCS, pp. 70–73. Lyon: Springer-Verlag.
Google Scholar
Ronsse, M., Levrouw, L., and Bastiaens, K. 1995. Efficient coding of execution-traces of parallel programs. In J.P. Veen, editor, Proceedings of the ProRISC/IEEE Benelux Workshop on Circuits, Systems and Signal Processing, pp. 251–258. Utrecht, STW.
Google Scholar
Sarin, S. and Lynch, L. 1987. Discarding obsolete information in a replicated data base system. In IEEE Transactions on Software Engineering, Vol. SE, pp. 39–46.
Google Scholar
Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., and Anderson, T. 1997. Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems, 15(4):391–411.
Google Scholar
Schonberg, E. 1989. On-the-fly detection of access anomalies. In Proceedings of the SIGPLAN '89 Conference on Programming Language Design and Implementation, ACM SIGPLAN Notices, 24(7):285–297.
Google Scholar
Shih, C. and Stankovic, J. 1990. Distributed deadlock detection in Ada runtime environments. In TRI-Ada'90 Proceedings, pp. 362–375. Baltimore, MD: ACM/SIGAda.
Google Scholar
SunSoft: 1994. lock lint User's Guide.
Tanenbaum, A.S. and Woodhull, A.S. 1997. Operating Systems Design and Implementation. Prentice-Hall, NJ: Englewood Cliffs.
Google Scholar
Whaley, J. and Rinard, M. 1999. Compositional pointer and escape analysis for java programs. In Proceedings of the 14th Annual Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA99), Denver, Colorado, pp. 187–206.
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. of the 22nd Annual International Symposium on Computer Architecture. pp. 24–36.
Wuu, G. and Bernstein, A. 1984. Efficient solutions to the replicated log and dictionary problems. In Proc. 3rd ACM Symp. Principles Distributed Computing, pp. 233–242. New York: ACM Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Information Systems, Ghent University, Belgium
Michiel Ronsse & Koen De Bosschere

Authors

Michiel Ronsse
View author publications
You can also search for this author in PubMed Google Scholar
Koen De Bosschere
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ronsse, M., De Bosschere, K. Non-Intrusive Detection of Synchronization Errors Using Execution Replay. Automated Software Engineering 9, 95–121 (2002). https://doi.org/10.1023/A:1013236320820

Download citation

Issue Date: January 2002
DOI: https://doi.org/10.1023/A:1013236320820

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-Intrusive Detection of Synchronization Errors Using Execution Replay

Abstract

Access this article

Similar content being viewed by others

Detection of High-Level Synchronization Anomalies in Parallel Programs

Dynamic Analyses for Data-Race Detection

Precise Detection of Atomicity Violations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Non-Intrusive Detection of Synchronization Errors Using Execution Replay

Abstract

Access this article

Similar content being viewed by others

Detection of High-Level Synchronization Anomalies in Parallel Programs

Dynamic Analyses for Data-Race Detection

Precise Detection of Atomicity Violations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation