skip to main content
10.1145/2442516.2442537acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Scalable deterministic replay in a parallel full-system emulator

Authors Info & Claims
Published:23 February 2013Publication History

ABSTRACT

Full-system emulation has been an extremely useful tool in developing and debugging systems software like operating systems and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the reproducibility of concurrency bugs that is indispensable for analyzing and debugging the essentially multi-threaded systems software.

This paper analyzes the challenges in supporting deterministic replay in parallel full-system emulators and makes a comprehensive study on the sources of non-determinism. Unlike application-level replay systems, our system, called ReEmu, needs to log sources of non-determinism in both the guest software stack and the dynamic binary translator for faithful replay. To provide scalable and efficient record and replay on multicore machines, ReEmu makes several notable refinements to the CREW protocol that replays shared memory systems. First, being aware of the performance bottlenecks in frequent lock operations in the CREW protocol, ReEmu refines the CREW protocol with a seqlock-like design, to avoid serious contention and possible starvation in instrumentation code tracking dependence of racy accesses on a shared memory object. Second, to minimize the required log files, ReEmu only logs minimal local information regarding accesses to a shared memory location, but instead relies on an offline log processing tool to derive precise shared memory dependence for faithful replay. Third, ReEmu adopts an automatic lock clustering mechanism that clusters a set of uncontended memory objects to a bulk to reduce the frequencies of lock operations, which noticeably boost performance.

Our prototype ReEmu is based on our open-source COREMU system and supports scalable and efficient record and replay of full-system environments (both x64 and ARM). Performance evaluation shows that ReEmu has very good performance scalability on an Intel multicore machine. It incurs only 68.9% performance overhead on average (ranging from 51.8% to 94.7%) over vanilla COREMU to record five PARSEC benchmarks running on a 16-core emulated system.

References

  1. G. Altekar and I. Stoica. ODR: output-deterministic replay for multicore debugging. In Proc. SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Bellard. Qemu, a fast and portable dynamic translator. In Proc. USENIX ATC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. In Proc. SOSP, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Courtois, F. Heymans, and D. Parnas. Concurrent control with readers and writers. Comm. of the ACM, 14(10):667--668, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Ding, P. Chang, W. Hsu, and Y. Chung. PQEMU: A parallel system emulator based on QEMU. In Proc. ICPADS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: enabling intrusion analysis through virtual-machine logging and replay. In Proc. OSDI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay of multiprocessor virtual machines. In Proc. VEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Hong, C. Hsu, P. Yew, J. Wu, W. Hsu, P. Liu, C. Wang, and Y. Chung. HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores. In Proc. CGO, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Huang, P. Liu, and C. Zhang. LEAP: lightweight deterministic multi-processor replay of concurrent java programs. In Proc. SIG-SOFT FSE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proc. USENIX ATC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. O. Laadan, N. Viennot, and J. Nieh. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In Proc. SIGMETRICS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Lantz. Parallel SimOS - Performance and Scalability for Large System. PhD thesis, Stanford University, 2007.Google ScholarGoogle Scholar
  14. T. Leblanc and J. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. Computers, IEEE Transactions on Computers, C-36(4):471--482, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: efficient online multiprocessor replayvia speculation and external determinism. In Proc. ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. McLoughlin. The qcow image format, 2008.Google ScholarGoogle Scholar
  17. S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Proc. ISCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. Lee, and S. Lu. PRES: probabilistic replay with execution sketching on multiprocessors. In Proc. SOSP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Patil, C. Pereira, M. Stallcup, G. Lueck, and J. Cownie. PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs. In Proc. CGO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In Proc. HPCA, pages 13--24, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: parallelizing sequential logging and replay. In Proc. ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Wang, R. Liu, Y. Chen, X. Wu, H. Chen, Z. W., and B. Zang. Coremu: a scalable and portable parallel full-systememulator. In Proc. PPoPP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Xu, R. Bodik, and M. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In Proc. ISCA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Xu, V. Malyugin, J. Sheldon, G. Venkitachalam, and B.Weissman. Retrace: Collecting execution trace with virtual machine deterministic replay. In Proceedings of the Third Annual Workshop on Modeling, Benchmarking and Simulation, 2007.Google ScholarGoogle Scholar
  25. Z. Yang, M. Yang, L. Xu, H. Chen, and B. Zang. ORDER: object centric deterministic replay for java. In Proc. USENIX ATC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable deterministic replay in a parallel full-system emulator

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
          February 2013
          332 pages
          ISBN:9781450319225
          DOI:10.1145/2442516
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 48, Issue 8
            PPoPP '13
            August 2013
            309 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2517327
            Issue’s Table of Contents

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 February 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate230of1,014submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader