skip to main content
10.1145/2451116.2451138acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Cyrus: unintrusive application-level record-replay for replay parallelism

Published:16 March 2013Publication History

ABSTRACT

Architectures for deterministic record-replay (R&R) of multithreaded code are attractive for program debugging, intrusion analysis, and fault-tolerance uses. However, very few of the proposed designs have focused on maximizing replay speed -- a key enabling property of these systems. The few efforts that focus on replay speed require intrusive hardware or software modifications, or target whole-system R&R rather than the more useful application-level R&R.

This paper presents the first hardware-based scheme for unintrusive, application-level R&R that explicitly targets high replay speed. Our scheme, called Cyrus, requires no modification to commodity snoopy cache coherence. It introduces the concept of an on-the-fly software Backend Pass during recording which, as the log is being generated, transforms it for high replay parallelism. This pass also fixes-up the log, and can flexibly trade-off replay parallelism for log size. We analyze the performance of Cyrus using full system (OS plus hardware) simulation. Our results show that Cyrus has negligible recording overhead. In addition, for 8-processor runs of SPLASH-2, Cyrus attains an average replay parallelism of 5, and a replay speed that is, on average, only about 50% lower than the recording speed.

References

  1. H. Agrawal et al. An Execution-Backtracking Approach to Debugging. IEEE Software, 8(3), May 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Altekar and I. Stoica. ODR: Output-deterministic replay for multicore debugging. In Symposium on Operating Systems Principles, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Basu et al. Karma: Scalable deterministic record-replay. In Int. Conference on Supercomputing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Bloom. Space/Time Trade-Offs in Hash Coding with Allowable Errors. Communications of the ACM, 11(7), July 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Bressoud and F. Schneider. Hypevisor-based fault-tolerance. ACM Transactions on Computer Systems, 14(1), Feb 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Charlesworth. Starfire: Extending the SMP Envelope. IEEE Micro, 18(1), Jan. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Chen et al. LReplay: A pending period based deterministic replay scheme. In Int. Symposium on Computer Architecture, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J.-D. Choi and H. Srinivasan. Deterministic replay of Java multithreaded applications. In Symposium on Parallel and Distributed Tools, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Cully et al. Remus: High availability via asynchronous virtual machine replication. In USENIX Symposium on Networked Systems Design and Implementation, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. W. Dunlap et al. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. In Symposium on Operating Systems Design and Implementation, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. W. Dunlap et al. Execution replay of multiprocessor virtual machines. In Int. Conference on Virtual Execution Environments, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Forin. Debugging of heterogeneous parallel systems. In Workshop on Parallel and Distributed Debugging, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. R. Hower and M. D. Hill. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In Int. Symposium on Computer Architecture, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. T. King and P. M. Chen. Backtracking intrusions. In Symposium on Operating Systems Principles, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. T. King et al. Debugging operating systems with time-traveling virtual machines. In USENIX Annual Technical Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Laadan et al. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In SIGMETRICS Int. Conference on Measurement and Modeling of Computer Systems, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7), July 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Trans. Comput., 36(4), Apr. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Lee et al. Offline symbolic analysis for multi-processor execution replay. In Int. Symposium on Microarchitecture, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Lee et al. Respec: Efficient online multiprocessor replayvia speculation and external determinism. In Int. Conference on Architectural Support for Programming Languages and Operating Systems, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Lee et al. Offline symbolic analysis to infer Total Store Order. In Int. Symposium on High Performance Computer Architecture, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2), February 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Montesinos et al. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In Int. Symposium on Computer Architecture, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Montesinos et al. Capo: A software-hardware interface for practical deterministic multiprocessor replay. In Int. Conference on Architectural Support for Programming Languages and Operating Systems, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Narayanasamy et al. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Int. Symposium on Computer Architecture, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Narayanasamy et al. Recording shared memory dependencies using strata. In Int. Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Z. Pan and M. A. Linton. Supporting reverse execution for parallel programs. In Workshop on Parallel and Distributed Debugging, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Park et al. PRES: Probabilistic replay with execution sketching on multiprocessors. In Symposium on Operating Systems Principles, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Patil et al. PinPlay: A framework for deterministic replay and reproducible analysis of parallel programs. In Int. Symposium on Code Generation and Optimization, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Pokam et al. Architecting a chunk-based memory race recorder in modern CMPs. In Int. Symposium on Microarchitecture, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Pokam et al. CoreRacer: A practical memory race recorder for multicore x86 TSO processors. In Int. Symposium on Microarchitecture, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Russinovich and B. Cogswell. Replay for concurrent non-deterministic shared-memory applications. In Programming Language Design and Implementation, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. M. Srinivasan et al. Flashback: a lightweight extension for rollback and deterministic replay for software debugging. In USENIX Annual Technical Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Veeraraghavan et al. DoublePlay: Parallelizing sequential logging and replay. In Int. Conference on Architectural Support for Programming Languages and Operating Systems, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Voskuilen et al. Timetraveler: Exploiting acyclic races for optimizing memory race recording. In Int. Symposium on Computer Architecture, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Xu et al. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In Int. Symposium on Computer Architecture, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Xu et al. A regulated transitive reduction (RTR) for longer memory race recording. In Int. Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Yuffe et al. A fully integrated multi-CPU, GPU and memory controller 32nm processor. In Int. Solid-State Circuits Conference, 2011.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Cyrus: unintrusive application-level record-replay for replay parallelism

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader