skip to main content
10.1145/3412841.3442028acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

STRAB: state recovery using reverse execution at IR level for concurrent programs

Published:22 April 2021Publication History

ABSTRACT

Debugging failures of deployed concurrent software is important for quality assurance. However, such failures are difficult to debug because their behavior is non-deterministic and limited information can be obtained with conventional means. Reverse debuggers such as REPT [10] assists with debugging by recovering data values before the failure. This is achieved by using a hardware-tracer to log control-flow information, then using the information and a conventional coredump to recover data values via reverse-execution at machine-level. REPT's algorithm for data value recovery is reliable and fast. But the implementation cost is high because of its dependence on architecture. Applying REPT to more abstract IR (Intermediate Representation) level instructions in an attempt to counter this yielded limited results with low accuracy compared to the original x86_64 implementation.

In this paper, we present STRAB (State Recovery at Abstract-level), a collection of our proposed methods to solve these problems. STRAB works in two phases. First, the data values in the coredump are lifted from machine-level to IR-level using rich debug information and mid-recovery lifting. Second, REPT modified with our hybrid memory location resolving algorithm to solve problems that occur only at IR-level is used to recover data values with higher accuracy than REPT.

Experimental results on a variety of real-world concurrent programs show that STRAB has significantly higher accuracy compared to REPT at IR-level (+40% on average) with only minor slowdowns (x2.7 on average), while also achieving architecture-independence.

References

  1. 2020. LLVM Language Reference. Retrieved September 4, 2020 from https://llvm.org/docs/LangRef.html#alloca-instructionGoogle ScholarGoogle Scholar
  2. Rui Abreu, Peter Zoeteweij, and Arjan J.c. Van Gemund. 2006. An Evaluation of Similarity Coefficients for Software Fault Localization. In 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).Google ScholarGoogle Scholar
  3. Gautam Altekar and Ion Stoica. 2009. ODR: Output-Deterministic Replay for Multicore Debugging. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Thomas Ball and Sriram K. Rajamani. 2001. The SLAM Toolkit. In Proceedings of the 13th International Conference on Computer Aided Verification (CAV '01). Springer-Verlag.Google ScholarGoogle Scholar
  5. Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. 2010. CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yunji Chen, Weiwu Hu, Tianshi Chen, and Ruiyang Wu. 2010. LReplay: A Pending Period Based Deterministic Replay Scheme. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yunji Chen, Shijin Zhang, Qi Guo, Ling Li, Ruiyang Wu, and Tianshi Chen. 2015. Deterministic Replay: A Survey. ACM Comput. Surv. 48, 2, Article 17 (Sept. 2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. James C. Corbett, Matthew B. Dwyer, John Hatcliff, Shawn Laubach, Corina S. Păsărenau, Robby, and Hongjun Zheng. 2000. Bandera: extracting finite-state models from Java source code. In Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Intel Corporation. 2017. Intel 64 and IA-32 architectures software developer's manual. Retrieved September 28, 2020 from https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.htmlGoogle ScholarGoogle Scholar
  10. Weidong Cui, Xinyang Ge, Baris Kasikci, Ben Niu, Upamanyu Sharma, Ruoyu Wang, and Insu Yun. 2018. REPT: Reverse Debugging of Failures in Deployed Software. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association.Google ScholarGoogle Scholar
  11. Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. 2009. DMP: Deterministic Shared Memory Multiprocessing. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: A Relaxed Consistency Deterministic Computer. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza A. Basrai, and Peter M. Chen. 2002. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (Copyright Restrictions Prevent ACM from Being Able to Make the PDFs for This Conference Available for Downloading) (OSDI '02). USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Matthew B. Dwyer, John Hatcliff, Robby, and Venkatesh Prasad Ranganath. 2004. Exploiting Object Escape and Locking Information in Partial-Order Reductions for Concurrent Object-Oriented Programs. Form. Methods Syst. Des. 25, 2--3 (Sept. 2004).Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: Efficient and Precise Dynamic Race Detection. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09). Association for Computing Machinery.Google ScholarGoogle Scholar
  16. Cormac Flanagan and Patrice Godefroid. 2005. Dynamic Partial-Order Reduction for Model Checking Software. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Patrice Godefroid. 1997. Model Checking for Programming Languages Using VeriSoft. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'97). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Patrice Godefroid. 2003. Software Model Checking: The VeriSoft Approach. Formal Methods in System Design 26 (09 2003).Google ScholarGoogle Scholar
  19. Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Grégoire Sutre. 2002. Lazy Abstraction. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'02). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Derek R. Hower, Polina Dudnik, Mark D. Hill, and David A. Wood. 2011. Calvin: Deterministic or Not? Free Will to Choose. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA '11). IEEE Computer Society.Google ScholarGoogle Scholar
  21. Derek R. Hower and Mark D. Hill. 2008. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society.Google ScholarGoogle Scholar
  22. Jeff Huang and Arun K. Rajagopalan. 2016. Precise and Maximal Race Detection from Incomplete Traces. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ali Jannesari, Kaibin Bao, Victor Pankratius, and Walter F. Tichy. 2009. Helgrind+: An efficient dynamic race detector. In 2009 IEEE International Symposium on Parallel Distributed Processing. 1--13.Google ScholarGoogle Scholar
  24. Huang Jeff, Zhang Charles, and Dolby Julian. 2013. CLAP: Recording Local Executions to Reproduce Concurrency Failures. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'13). Association for Computing Machinery.Google ScholarGoogle Scholar
  25. Guoliang Jin, Aditya Thakur, Ben Liblit, and Shan Lu. 2010. Instrumentation and Sampling Strategies for Cooperative Concurrency Bug Isolation. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '10). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. James A. Jones and Mary Jean Harrold. 2005. Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering (ASE '05). Association for Computing Machinery.Google ScholarGoogle Scholar
  27. Baris Kasikci, Weidong Cui, Xinyang Ge, and Ben Niu. 2017. Lazy Diagnosis of In-Production Concurrency Bugs. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Baris Kasikci, Benjamin Schubert, Cristiano Pereira, Gilles Pokam, and George Candea. 2015. Failure Sketching: A Technique for Automated Root Cause Diagnosis of in-Production Failures. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP'15). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dongyoon Lee, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2012. Chimera: Hybrid Program Analysis for Determinism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn. 2010. Respec: Efficient Online Multiprocessor Replayvia Speculation and External Determinism. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pablo Montesinos, Luis Ceze, and Josep Torrellas. 2008. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?Ciently. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Marek Olszewski, Jason Ansel, and Saman Amarasinghe. 2009. Kendo: Efficient Deterministic Multithreading in Software. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Soyeon Park, Yuanyuan Zhou, Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H. Lee, and Shan Lu. 2009. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. João Carlos Pereira, Nuno Machado, and Jorge Sousa Pinto. 2020. Testing for Race Conditions in Distributed Systems via SMT Solving. In Tests and Proofs, Wolfgang Ahrendt and Heike Wehrheim (Eds.). Springer International Publishing.Google ScholarGoogle Scholar
  35. Polyvios Pratikakis, Jeffrey S. Foster, and Michael Hicks. 2006. LOCKSMITH: Context-Sensitive Correlation Analysis for Race Detection. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '06). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Michiel Ronsse and Koen De Bosschere. 1999. RecPlay: A Fully Integrated Practical Record/Replay System. ACM Trans. Comput. Syst. 17, 2 (May 1999).Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kaushik Veeraraghavan, Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2011. DoublePlay: Parallelizing Sequential Logging and Replay. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Willem Visser, Guillaume Brat, Klaus Havelund, and SeungJoon Park. 2000. Model checking programs. In Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  39. Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar. 2010. Timetraveler: Exploiting Acyclic Races for Optimizing Memory Race Recording. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). Association for Computing Machinery.Google ScholarGoogle Scholar
  40. Min Xu, Rastislav Bodik, and Mark D. Hill. 2003. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. SIGARCH Comput. Archit. News 31, 2 (May 2003).Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Min Xu, Mark D. Hill, and Rastislav Bodik. 2006. A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. STRAB: state recovery using reverse execution at IR level for concurrent programs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing
      March 2021
      2075 pages
      ISBN:9781450381048
      DOI:10.1145/3412841

      Copyright © 2021 ACM

      Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 April 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,650of6,669submissions,25%
    • Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader