ABSTRACT
Debugging failures of deployed concurrent software is important for quality assurance. However, such failures are difficult to debug because their behavior is non-deterministic and limited information can be obtained with conventional means. Reverse debuggers such as REPT [10] assists with debugging by recovering data values before the failure. This is achieved by using a hardware-tracer to log control-flow information, then using the information and a conventional coredump to recover data values via reverse-execution at machine-level. REPT's algorithm for data value recovery is reliable and fast. But the implementation cost is high because of its dependence on architecture. Applying REPT to more abstract IR (Intermediate Representation) level instructions in an attempt to counter this yielded limited results with low accuracy compared to the original x86_64 implementation.
In this paper, we present STRAB (State Recovery at Abstract-level), a collection of our proposed methods to solve these problems. STRAB works in two phases. First, the data values in the coredump are lifted from machine-level to IR-level using rich debug information and mid-recovery lifting. Second, REPT modified with our hybrid memory location resolving algorithm to solve problems that occur only at IR-level is used to recover data values with higher accuracy than REPT.
Experimental results on a variety of real-world concurrent programs show that STRAB has significantly higher accuracy compared to REPT at IR-level (+40% on average) with only minor slowdowns (x2.7 on average), while also achieving architecture-independence.
- 2020. LLVM Language Reference. Retrieved September 4, 2020 from https://llvm.org/docs/LangRef.html#alloca-instructionGoogle Scholar
- Rui Abreu, Peter Zoeteweij, and Arjan J.c. Van Gemund. 2006. An Evaluation of Similarity Coefficients for Software Fault Localization. In 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).Google Scholar
- Gautam Altekar and Ion Stoica. 2009. ODR: Output-Deterministic Replay for Multicore Debugging. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09). Association for Computing Machinery.Google ScholarDigital Library
- Thomas Ball and Sriram K. Rajamani. 2001. The SLAM Toolkit. In Proceedings of the 13th International Conference on Computer Aided Verification (CAV '01). Springer-Verlag.Google Scholar
- Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. 2010. CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery.Google ScholarDigital Library
- Yunji Chen, Weiwu Hu, Tianshi Chen, and Ruiyang Wu. 2010. LReplay: A Pending Period Based Deterministic Replay Scheme. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). Association for Computing Machinery.Google ScholarDigital Library
- Yunji Chen, Shijin Zhang, Qi Guo, Ling Li, Ruiyang Wu, and Tianshi Chen. 2015. Deterministic Replay: A Survey. ACM Comput. Surv. 48, 2, Article 17 (Sept. 2015).Google ScholarDigital Library
- James C. Corbett, Matthew B. Dwyer, John Hatcliff, Shawn Laubach, Corina S. Păsărenau, Robby, and Hongjun Zheng. 2000. Bandera: extracting finite-state models from Java source code. In Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.Google ScholarDigital Library
- Intel Corporation. 2017. Intel 64 and IA-32 architectures software developer's manual. Retrieved September 28, 2020 from https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.htmlGoogle Scholar
- Weidong Cui, Xinyang Ge, Baris Kasikci, Ben Niu, Upamanyu Sharma, Ruoyu Wang, and Insu Yun. 2018. REPT: Reverse Debugging of Failures in Deployed Software. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association.Google Scholar
- Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. 2009. DMP: Deterministic Shared Memory Multiprocessing. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV). Association for Computing Machinery.Google ScholarDigital Library
- Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: A Relaxed Consistency Deterministic Computer. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). Association for Computing Machinery.Google ScholarDigital Library
- George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza A. Basrai, and Peter M. Chen. 2002. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (Copyright Restrictions Prevent ACM from Being Able to Make the PDFs for This Conference Available for Downloading) (OSDI '02). USENIX Association.Google ScholarDigital Library
- Matthew B. Dwyer, John Hatcliff, Robby, and Venkatesh Prasad Ranganath. 2004. Exploiting Object Escape and Locking Information in Partial-Order Reductions for Concurrent Object-Oriented Programs. Form. Methods Syst. Des. 25, 2--3 (Sept. 2004).Google ScholarDigital Library
- Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: Efficient and Precise Dynamic Race Detection. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09). Association for Computing Machinery.Google Scholar
- Cormac Flanagan and Patrice Godefroid. 2005. Dynamic Partial-Order Reduction for Model Checking Software. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05). Association for Computing Machinery.Google ScholarDigital Library
- Patrice Godefroid. 1997. Model Checking for Programming Languages Using VeriSoft. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'97). Association for Computing Machinery.Google ScholarDigital Library
- Patrice Godefroid. 2003. Software Model Checking: The VeriSoft Approach. Formal Methods in System Design 26 (09 2003).Google Scholar
- Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Grégoire Sutre. 2002. Lazy Abstraction. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'02). Association for Computing Machinery.Google ScholarDigital Library
- Derek R. Hower, Polina Dudnik, Mark D. Hill, and David A. Wood. 2011. Calvin: Deterministic or Not? Free Will to Choose. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA '11). IEEE Computer Society.Google Scholar
- Derek R. Hower and Mark D. Hill. 2008. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society.Google Scholar
- Jeff Huang and Arun K. Rajagopalan. 2016. Precise and Maximal Race Detection from Incomplete Traces. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). Association for Computing Machinery.Google ScholarDigital Library
- Ali Jannesari, Kaibin Bao, Victor Pankratius, and Walter F. Tichy. 2009. Helgrind+: An efficient dynamic race detector. In 2009 IEEE International Symposium on Parallel Distributed Processing. 1--13.Google Scholar
- Huang Jeff, Zhang Charles, and Dolby Julian. 2013. CLAP: Recording Local Executions to Reproduce Concurrency Failures. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'13). Association for Computing Machinery.Google Scholar
- Guoliang Jin, Aditya Thakur, Ben Liblit, and Shan Lu. 2010. Instrumentation and Sampling Strategies for Cooperative Concurrency Bug Isolation. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '10). Association for Computing Machinery.Google ScholarDigital Library
- James A. Jones and Mary Jean Harrold. 2005. Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering (ASE '05). Association for Computing Machinery.Google Scholar
- Baris Kasikci, Weidong Cui, Xinyang Ge, and Ben Niu. 2017. Lazy Diagnosis of In-Production Concurrency Bugs. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17).Google ScholarDigital Library
- Baris Kasikci, Benjamin Schubert, Cristiano Pereira, Gilles Pokam, and George Candea. 2015. Failure Sketching: A Technique for Automated Root Cause Diagnosis of in-Production Failures. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP'15). Association for Computing Machinery.Google ScholarDigital Library
- Dongyoon Lee, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2012. Chimera: Hybrid Program Analysis for Determinism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). Association for Computing Machinery.Google ScholarDigital Library
- Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn. 2010. Respec: Efficient Online Multiprocessor Replayvia Speculation and External Determinism. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery.Google ScholarDigital Library
- Pablo Montesinos, Luis Ceze, and Josep Torrellas. 2008. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?Ciently. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society.Google ScholarDigital Library
- Marek Olszewski, Jason Ansel, and Saman Amarasinghe. 2009. Kendo: Efficient Deterministic Multithreading in Software. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV). Association for Computing Machinery.Google ScholarDigital Library
- Soyeon Park, Yuanyuan Zhou, Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H. Lee, and Shan Lu. 2009. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09). Association for Computing Machinery.Google ScholarDigital Library
- João Carlos Pereira, Nuno Machado, and Jorge Sousa Pinto. 2020. Testing for Race Conditions in Distributed Systems via SMT Solving. In Tests and Proofs, Wolfgang Ahrendt and Heike Wehrheim (Eds.). Springer International Publishing.Google Scholar
- Polyvios Pratikakis, Jeffrey S. Foster, and Michael Hicks. 2006. LOCKSMITH: Context-Sensitive Correlation Analysis for Race Detection. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '06). Association for Computing Machinery.Google ScholarDigital Library
- Michiel Ronsse and Koen De Bosschere. 1999. RecPlay: A Fully Integrated Practical Record/Replay System. ACM Trans. Comput. Syst. 17, 2 (May 1999).Google ScholarDigital Library
- Kaushik Veeraraghavan, Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2011. DoublePlay: Parallelizing Sequential Logging and Replay. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). Association for Computing Machinery.Google ScholarDigital Library
- Willem Visser, Guillaume Brat, Klaus Havelund, and SeungJoon Park. 2000. Model checking programs. In Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.Google ScholarCross Ref
- Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar. 2010. Timetraveler: Exploiting Acyclic Races for Optimizing Memory Race Recording. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). Association for Computing Machinery.Google Scholar
- Min Xu, Rastislav Bodik, and Mark D. Hill. 2003. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. SIGARCH Comput. Archit. News 31, 2 (May 2003).Google ScholarDigital Library
- Min Xu, Mark D. Hill, and Rastislav Bodik. 2006. A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). Association for Computing Machinery.Google ScholarDigital Library
Index Terms
- STRAB: state recovery using reverse execution at IR level for concurrent programs
Recommendations
Postmortem accurate IR-level state recovery for deployed concurrent programs
Debugging failures of deployed concurrent software is important for quality assurance. However, such failures are difficult to debug because their behavior is non-deterministic and limited information can be obtained with conventional means. Reverse ...
Finding Atomicity-Violation Bugs through Unserializable Interleaving Testing
Multicore hardware is making concurrent programs pervasive. Unfortunately, concurrent programs are prone to bugs. Among different types of concurrency bugs, atomicity violations are common and important. How to test the interleaving space and expose ...
Comments