ABSTRACT
Full-system emulation has been an extremely useful tool in developing and debugging systems software like operating systems and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the reproducibility of concurrency bugs that is indispensable for analyzing and debugging the essentially multi-threaded systems software.
This paper analyzes the challenges in supporting deterministic replay in parallel full-system emulators and makes a comprehensive study on the sources of non-determinism. Unlike application-level replay systems, our system, called ReEmu, needs to log sources of non-determinism in both the guest software stack and the dynamic binary translator for faithful replay. To provide scalable and efficient record and replay on multicore machines, ReEmu makes several notable refinements to the CREW protocol that replays shared memory systems. First, being aware of the performance bottlenecks in frequent lock operations in the CREW protocol, ReEmu refines the CREW protocol with a seqlock-like design, to avoid serious contention and possible starvation in instrumentation code tracking dependence of racy accesses on a shared memory object. Second, to minimize the required log files, ReEmu only logs minimal local information regarding accesses to a shared memory location, but instead relies on an offline log processing tool to derive precise shared memory dependence for faithful replay. Third, ReEmu adopts an automatic lock clustering mechanism that clusters a set of uncontended memory objects to a bulk to reduce the frequencies of lock operations, which noticeably boost performance.
Our prototype ReEmu is based on our open-source COREMU system and supports scalable and efficient record and replay of full-system environments (both x64 and ARM). Performance evaluation shows that ReEmu has very good performance scalability on an Intel multicore machine. It incurs only 68.9% performance overhead on average (ranging from 51.8% to 94.7%) over vanilla COREMU to record five PARSEC benchmarks running on a 16-core emulated system.
- G. Altekar and I. Stoica. ODR: output-deterministic replay for multicore debugging. In Proc. SOSP, 2009. Google ScholarDigital Library
- F. Bellard. Qemu, a fast and portable dynamic translator. In Proc. USENIX ATC, 2005. Google ScholarDigital Library
- C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. PACT, 2008. Google ScholarDigital Library
- T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. In Proc. SOSP, 1995. Google ScholarDigital Library
- P. Courtois, F. Heymans, and D. Parnas. Concurrent control with readers and writers. Comm. of the ACM, 14(10):667--668, 1971. Google ScholarDigital Library
- J. Ding, P. Chang, W. Hsu, and Y. Chung. PQEMU: A parallel system emulator based on QEMU. In Proc. ICPADS, 2011. Google ScholarDigital Library
- G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: enabling intrusion analysis through virtual-machine logging and replay. In Proc. OSDI, 2002. Google ScholarDigital Library
- G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay of multiprocessor virtual machines. In Proc. VEE, 2008. Google ScholarDigital Library
- D. Hong, C. Hsu, P. Yew, J. Wu, W. Hsu, P. Liu, C. Wang, and Y. Chung. HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores. In Proc. CGO, 2012. Google ScholarDigital Library
- J. Huang, P. Liu, and C. Zhang. LEAP: lightweight deterministic multi-processor replay of concurrent java programs. In Proc. SIG-SOFT FSE, 2010. Google ScholarDigital Library
- S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proc. USENIX ATC, 2005. Google ScholarDigital Library
- O. Laadan, N. Viennot, and J. Nieh. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In Proc. SIGMETRICS, 2010. Google ScholarDigital Library
- R. Lantz. Parallel SimOS - Performance and Scalability for Large System. PhD thesis, Stanford University, 2007.Google Scholar
- T. Leblanc and J. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. Computers, IEEE Transactions on Computers, C-36(4):471--482, 1987. Google ScholarDigital Library
- D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: efficient online multiprocessor replayvia speculation and external determinism. In Proc. ASPLOS, 2010. Google ScholarDigital Library
- M. McLoughlin. The qcow image format, 2008.Google Scholar
- S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Proc. ISCA, 2005. Google ScholarDigital Library
- S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. Lee, and S. Lu. PRES: probabilistic replay with execution sketching on multiprocessors. In Proc. SOSP, 2009. Google ScholarDigital Library
- H. Patil, C. Pereira, M. Stallcup, G. Lueck, and J. Cownie. PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs. In Proc. CGO, 2010. Google ScholarDigital Library
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In Proc. HPCA, pages 13--24, 2007. Google ScholarDigital Library
- K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: parallelizing sequential logging and replay. In Proc. ASPLOS, 2011. Google ScholarDigital Library
- Z. Wang, R. Liu, Y. Chen, X. Wu, H. Chen, Z. W., and B. Zang. Coremu: a scalable and portable parallel full-systememulator. In Proc. PPoPP, 2011. Google ScholarDigital Library
- M. Xu, R. Bodik, and M. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In Proc. ISCA, 2003. Google ScholarDigital Library
- M. Xu, V. Malyugin, J. Sheldon, G. Venkitachalam, and B.Weissman. Retrace: Collecting execution trace with virtual machine deterministic replay. In Proceedings of the Third Annual Workshop on Modeling, Benchmarking and Simulation, 2007.Google Scholar
- Z. Yang, M. Yang, L. Xu, H. Chen, and B. Zang. ORDER: object centric deterministic replay for java. In Proc. USENIX ATC, 2011. Google ScholarDigital Library
Index Terms
- Scalable deterministic replay in a parallel full-system emulator
Recommendations
Execution replay of multiprocessor virtual machines
VEE '08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environmentsExecution replay of virtual machines is a technique which has many important applications, including debugging, fault-tolerance, and security. Execution replay for single processor virtual machines is well-understood, and available commercially. With ...
Scalable deterministic replay in a parallel full-system emulator
PPoPP '13Full-system emulation has been an extremely useful tool in developing and debugging systems software like operating systems and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the ...
Live migration of virtual machine based on full system trace and replay
HPDC '09: Proceedings of the 18th ACM international symposium on High performance distributed computingLive migration of virtual machines (VM) across distinct physical hosts provides a significant new benefit for administrators of data centers and clusters. Previous migration schemes focused on transferring the runtime memory state of the VM. Those ...
Comments