research-article

STRAB: state recovery using reverse execution at IR level for concurrent programs

Authors:
Shinji Hoshino

Tokyo Institute of Technology, Tokyo, Japan

Tokyo Institute of Technology, Tokyo, Japan
View Profile

,
Yoshitaka Arahori

Tokyo Institute of Technology, Tokyo, Japan

Tokyo Institute of Technology, Tokyo, Japan
View Profile

,
Katsuhiko Gondow

Tokyo Institute of Technology, Tokyo, Japan

Tokyo Institute of Technology, Tokyo, Japan
View Profile

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied ComputingMarch 2021Pages 1532–1541https://doi.org/10.1145/3412841.3442028

Published:22 April 2021Publication History

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

Pages 1532–1541

ABSTRACT

Debugging failures of deployed concurrent software is important for quality assurance. However, such failures are difficult to debug because their behavior is non-deterministic and limited information can be obtained with conventional means. Reverse debuggers such as REPT [10] assists with debugging by recovering data values before the failure. This is achieved by using a hardware-tracer to log control-flow information, then using the information and a conventional coredump to recover data values via reverse-execution at machine-level. REPT's algorithm for data value recovery is reliable and fast. But the implementation cost is high because of its dependence on architecture. Applying REPT to more abstract IR (Intermediate Representation) level instructions in an attempt to counter this yielded limited results with low accuracy compared to the original x86_64 implementation.

In this paper, we present STRAB (State Recovery at Abstract-level), a collection of our proposed methods to solve these problems. STRAB works in two phases. First, the data values in the coredump are lifted from machine-level to IR-level using rich debug information and mid-recovery lifting. Second, REPT modified with our hybrid memory location resolving algorithm to solve problems that occur only at IR-level is used to recover data values with higher accuracy than REPT.

Experimental results on a variety of real-world concurrent programs show that STRAB has significantly higher accuracy compared to REPT at IR-level (+40% on average) with only minor slowdowns (x2.7 on average), while also achieving architecture-independence.

References

2020. LLVM Language Reference. Retrieved September 4, 2020 from https://llvm.org/docs/LangRef.html#alloca-instructionGoogle Scholar
Rui Abreu, Peter Zoeteweij, and Arjan J.c. Van Gemund. 2006. An Evaluation of Similarity Coefficients for Software Fault Localization. In 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06).Google Scholar
Gautam Altekar and Ion Stoica. 2009. ODR: Output-Deterministic Replay for Multicore Debugging. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09). Association for Computing Machinery.Google ScholarDigital Library
Thomas Ball and Sriram K. Rajamani. 2001. The SLAM Toolkit. In Proceedings of the 13th International Conference on Computer Aided Verification (CAV '01). Springer-Verlag.Google Scholar
Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. 2010. CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery.Google ScholarDigital Library
Yunji Chen, Weiwu Hu, Tianshi Chen, and Ruiyang Wu. 2010. LReplay: A Pending Period Based Deterministic Replay Scheme. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). Association for Computing Machinery.Google ScholarDigital Library
Yunji Chen, Shijin Zhang, Qi Guo, Ling Li, Ruiyang Wu, and Tianshi Chen. 2015. Deterministic Replay: A Survey. ACM Comput. Surv. 48, 2, Article 17 (Sept. 2015).Google ScholarDigital Library
James C. Corbett, Matthew B. Dwyer, John Hatcliff, Shawn Laubach, Corina S. Păsărenau, Robby, and Hongjun Zheng. 2000. Bandera: extracting finite-state models from Java source code. In Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.Google ScholarDigital Library
Intel Corporation. 2017. Intel 64 and IA-32 architectures software developer's manual. Retrieved September 28, 2020 from https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.htmlGoogle Scholar
Weidong Cui, Xinyang Ge, Baris Kasikci, Ben Niu, Upamanyu Sharma, Ruoyu Wang, and Insu Yun. 2018. REPT: Reverse Debugging of Failures in Deployed Software. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association.Google Scholar
Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. 2009. DMP: Deterministic Shared Memory Multiprocessing. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV). Association for Computing Machinery.Google ScholarDigital Library
Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: A Relaxed Consistency Deterministic Computer. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). Association for Computing Machinery.Google ScholarDigital Library
George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza A. Basrai, and Peter M. Chen. 2002. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (Copyright Restrictions Prevent ACM from Being Able to Make the PDFs for This Conference Available for Downloading) (OSDI '02). USENIX Association.Google ScholarDigital Library
Matthew B. Dwyer, John Hatcliff, Robby, and Venkatesh Prasad Ranganath. 2004. Exploiting Object Escape and Locking Information in Partial-Order Reductions for Concurrent Object-Oriented Programs. Form. Methods Syst. Des. 25, 2--3 (Sept. 2004).Google ScholarDigital Library
Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: Efficient and Precise Dynamic Race Detection. In Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '09). Association for Computing Machinery.Google Scholar
Cormac Flanagan and Patrice Godefroid. 2005. Dynamic Partial-Order Reduction for Model Checking Software. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05). Association for Computing Machinery.Google ScholarDigital Library
Patrice Godefroid. 1997. Model Checking for Programming Languages Using VeriSoft. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'97). Association for Computing Machinery.Google ScholarDigital Library
Patrice Godefroid. 2003. Software Model Checking: The VeriSoft Approach. Formal Methods in System Design 26 (09 2003).Google Scholar
Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Grégoire Sutre. 2002. Lazy Abstraction. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'02). Association for Computing Machinery.Google ScholarDigital Library
Derek R. Hower, Polina Dudnik, Mark D. Hill, and David A. Wood. 2011. Calvin: Deterministic or Not? Free Will to Choose. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA '11). IEEE Computer Society.Google Scholar
Derek R. Hower and Mark D. Hill. 2008. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society.Google Scholar
Jeff Huang and Arun K. Rajagopalan. 2016. Precise and Maximal Race Detection from Incomplete Traces. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). Association for Computing Machinery.Google ScholarDigital Library
Ali Jannesari, Kaibin Bao, Victor Pankratius, and Walter F. Tichy. 2009. Helgrind+: An efficient dynamic race detector. In 2009 IEEE International Symposium on Parallel Distributed Processing. 1--13.Google Scholar
Huang Jeff, Zhang Charles, and Dolby Julian. 2013. CLAP: Recording Local Executions to Reproduce Concurrency Failures. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'13). Association for Computing Machinery.Google Scholar
Guoliang Jin, Aditya Thakur, Ben Liblit, and Shan Lu. 2010. Instrumentation and Sampling Strategies for Cooperative Concurrency Bug Isolation. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA '10). Association for Computing Machinery.Google ScholarDigital Library
James A. Jones and Mary Jean Harrold. 2005. Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering (ASE '05). Association for Computing Machinery.Google Scholar
Baris Kasikci, Weidong Cui, Xinyang Ge, and Ben Niu. 2017. Lazy Diagnosis of In-Production Concurrency Bugs. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17).Google ScholarDigital Library
Baris Kasikci, Benjamin Schubert, Cristiano Pereira, Gilles Pokam, and George Candea. 2015. Failure Sketching: A Technique for Automated Root Cause Diagnosis of in-Production Failures. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP'15). Association for Computing Machinery.Google ScholarDigital Library
Dongyoon Lee, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2012. Chimera: Hybrid Program Analysis for Determinism. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). Association for Computing Machinery.Google ScholarDigital Library
Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn. 2010. Respec: Efficient Online Multiprocessor Replayvia Speculation and External Determinism. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery.Google ScholarDigital Library
Pablo Montesinos, Luis Ceze, and Josep Torrellas. 2008. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?Ciently. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA '08). IEEE Computer Society.Google ScholarDigital Library
Marek Olszewski, Jason Ansel, and Saman Amarasinghe. 2009. Kendo: Efficient Deterministic Multithreading in Software. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV). Association for Computing Machinery.Google ScholarDigital Library
Soyeon Park, Yuanyuan Zhou, Weiwei Xiong, Zuoning Yin, Rini Kaushik, Kyu H. Lee, and Shan Lu. 2009. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP '09). Association for Computing Machinery.Google ScholarDigital Library
João Carlos Pereira, Nuno Machado, and Jorge Sousa Pinto. 2020. Testing for Race Conditions in Distributed Systems via SMT Solving. In Tests and Proofs, Wolfgang Ahrendt and Heike Wehrheim (Eds.). Springer International Publishing.Google Scholar
Polyvios Pratikakis, Jeffrey S. Foster, and Michael Hicks. 2006. LOCKSMITH: Context-Sensitive Correlation Analysis for Race Detection. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '06). Association for Computing Machinery.Google ScholarDigital Library
Michiel Ronsse and Koen De Bosschere. 1999. RecPlay: A Fully Integrated Practical Record/Replay System. ACM Trans. Comput. Syst. 17, 2 (May 1999).Google ScholarDigital Library
Kaushik Veeraraghavan, Dongyoon Lee, Benjamin Wester, Jessica Ouyang, Peter M. Chen, Jason Flinn, and Satish Narayanasamy. 2011. DoublePlay: Parallelizing Sequential Logging and Replay. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). Association for Computing Machinery.Google ScholarDigital Library
Willem Visser, Guillaume Brat, Klaus Havelund, and SeungJoon Park. 2000. Model checking programs. In Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.Google ScholarCross Ref
Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar. 2010. Timetraveler: Exploiting Acyclic Races for Optimizing Memory Race Recording. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). Association for Computing Machinery.Google Scholar
Min Xu, Rastislav Bodik, and Mark D. Hill. 2003. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. SIGARCH Comput. Archit. News 31, 2 (May 2003).Google ScholarDigital Library
Min Xu, Mark D. Hill, and Rastislav Bodik. 2006. A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). Association for Computing Machinery.Google ScholarDigital Library

Index Terms

STRAB: state recovery using reverse execution at IR level for concurrent programs
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Postmortem accurate IR-level state recovery for deployed concurrent programs

Debugging failures of deployed concurrent software is important for quality assurance. However, such failures are difficult to debug because their behavior is non-deterministic and limited information can be obtained with conventional means. Reverse ...
Read More
Finding Atomicity-Violation Bugs through Unserializable Interleaving Testing

Multicore hardware is making concurrent programs pervasive. Unfortunately, concurrent programs are prone to bugs. Among different types of concurrency bugs, atomicity violations are common and important. How to test the interleaving space and expose ...
Read More
WFR-TM

Transactional Memory (TM) is a promising concurrent programming paradigm which employs transactions to achieve synchronization in accessing common data known as transactional variables. A transaction may either commit, making its updates to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing
March 2021
2075 pages
ISBN:9781450381048
DOI:10.1145/3412841
Conference Chairs:
Chih-Cheng Hung
Kennesaw State University
,
Jiman Hong
Soongsil University, South Korea
,
Program Chairs:
Alessio Bechini
University of Pisa, Italy
,
Eunjee Song
Baylor University
Copyright © 2021 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
concurrent programming
data value inference
intermediate representation
reverse debugging
state recovery
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 60
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

STRAB: state recovery using reverse execution at IR level for concurrent programs

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Postmortem accurate IR-level state recovery for deployed concurrent programs

Finding Atomicity-Violation Bugs through Unserializable Interleaving Testing

WFR-TM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

STRAB: state recovery using reverse execution at IR level for concurrent programs

SAC '21: Proceedings of the 36th Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Postmortem accurate IR-level state recovery for deployed concurrent programs

Finding Atomicity-Violation Bugs through Unserializable Interleaving Testing

WFR-TM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media