Abstract
Program reduction is a highly practical, widely demanded technique to help debug language tools, such as compilers, interpreters and debuggers. Given a program P that exhibits a property ψ, conceptually, program reduction iteratively applies various program transformations to generate a vast number of variants from P by deleting certain tokens and returns the minimal variant preserving ψ as the result.
A program reduction process inevitably generates duplicate variants, and the number of them can be significant. Our study reveals that on average 61.8% and 24.3% of the generated variants in two representative program reducers HDD and Perses, respectively, are duplicates. Checking them against ψ is thus redundant and unnecessary, which wastes time and computation resources. Although it seems that simply caching the generated variants can avoid redundant property tests, such a trivial method is impractical in the real world due to the significant memory footprint. Therefore, a memory-efficient caching scheme for program reduction is in great demand.
This study is the first effort to conduct a systematic, extensive analysis of memory-efficient caching schemes for program reduction. We first propose to use two well-known compression methods,
Our extensive evaluation on 31 real-world C compiler bugs demonstrates that caching schemes help avoid issuing redundant queries by 61.8% and 24.3% in HDD and Perses, respectively; correspondingly, the runtime performance is notably boosted by 22.8% and 18.2%. With regard to the memory efficiency, all three methods use less memory than the state-of-the-art string-based scheme
- [1] . 2015. ObjectExplorer. Retrieved May 29, 2023 from https://github.com/DimitrisAndreou/memory-measurerGoogle Scholar
- [2] . 2012. CCG: A Random C Code Generator. Retrieved May 29, 2023 from https://github.com/Merkil/ccg/Google Scholar
- [3] . 2014. ORBS: Language-independent program slicing. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22). , , and (Eds.), ACM, 109–120.
DOI: Google ScholarDigital Library - [4] . 2020. JShrink: In-depth investigation into debloating modern Java applications. In Proceedings of the ESEC/FSE’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event. , , and (Eds.), ACM, 135–146.
DOI: Google ScholarDigital Library - [5] . 2023. Fundamental limits of combinatorial multi-access caching. IEEE Transactions on Information Theory 69, 2 (2023), 1037–1056.
DOI: Google ScholarCross Ref - [6] . 2009. Cache-efficient, intranode, large-message MPI communication with MPICH2-Nemesis. In Proceedings of the ICPP 2009, International Conference on Parallel Processing. IEEE Computer Society, 462–469.
DOI: Google ScholarDigital Library - [7] . 2013. Taming compiler fuzzers. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’13. and (Eds.), ACM, 197–208.
DOI: Google ScholarDigital Library - [8] . 1986. An evaluation of buffer management strategies for relational database systems. Algorithmica 1, 1 (1986), 311–336.
DOI: Google ScholarDigital Library - [9] . 1996. Semantic data caching and replacement. In VLDB’96, Proceedings of 22th International Conference on Very Large Data Bases. , , , and (Eds.), Morgan Kaufmann, 330–341. Retrieved from http://www.vldb.org/conf/1996/P330.PDFGoogle ScholarDigital Library
- [10] and Jean-loup Gailly. 1996. ZLIB Compressed Data Format Specification version 3.3 (RFC’50), RFC Editor, 11 pages. https://www.rfc-editor.org/info/rfc1950Google Scholar
- [11] . 2021. Test Case Reduction: Beyond Bugs. Retrieved May 29, 2023 from https://blog.sigplan.org/2021/05/25/test-case-reduction-beyond-bugsGoogle Scholar
- [12] . 2017. A Guide to Testcase Reduction. Retrieved May 29, 2023 from https://gcc.gnu.org/wiki/A_guide_to_testcase_reductionGoogle Scholar
- [13] . 1996. A Massively Spiffy Yet Delicately Unobtrusive Compression Library. Retrieved Jul 1, 2023 from https://zlib.netGoogle Scholar
- [14] . 2011. SHA-512/256. In Proceedings of the 8th International Conference on Information Technology: New Generations, ITNG 2011. (Ed.), IEEE Computer Society, 354–358.
DOI: Google ScholarDigital Library - [15] . 1998. The Cache Memory Book (2nd Ed.): The Authoritative Reference on Cache Design. Academic Press, Inc.Google Scholar
- [16] . 2018. Effective program debloating via reinforcement learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018. , , , and (Eds.), ACM, 380–394.
DOI: Google ScholarDigital Library - [17] . 2017. Automatically reducing tree-structured test inputs. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017. , , and (Eds.), IEEE Computer Society, 861–871.
DOI: Google ScholarCross Ref - [18] . 2016. Modernizing hierarchical delta debugging. In Proceedings of the 7th International Workshop on Automating Test Case Design, Selection, and Evaluation, A-TEST@SIGSOFT FSE 2016. , , and (Eds.), ACM, 31–37.
DOI: Google ScholarDigital Library - [19] . 2017. Coarse hierarchical delta debugging. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017. IEEE Computer Society, 194–203.
DOI: Google ScholarCross Ref - [20] . 2017. Tree preprocessing and test outcome caching for efficient hierarchical delta debugging. In 12th IEEE/ACM International Workshop on Automation of Software Testing (AST@ICSE’17, Buenos Aires, Argentina, May 20-21, 2017), IEEE Computer Society, 23–29.
DOI: Google ScholarDigital Library - [21] . 1994. 2Q: A low overhead high performance buffer management replacement algorithm. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases. , , and (Eds.), Morgan Kaufmann, 439–450. Retrieved from http://www.vldb.org/conf/1994/P439.PDFGoogle Scholar
- [22] . 2019. Binary reduction of dependency graphs. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019. , , , and (Eds.), ACM, 556–566.
DOI: Google ScholarDigital Library - [23] . 2021. Logical bytecode reduction. In Proceedings of the PLDI’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event. and (Eds.), ACM, 1003–1016.
DOI: Google ScholarDigital Library - [24] . 2016. SHA-3 derived functions: CSHAKE, KMAC, TupleHash and ParallelHash.NISTSpecialPublication 800 (2016), 185 pages. https://www.nist.gov/publications/sha-3-derived-functions-cshake-kmac-tuplehash-and-parallelhashGoogle Scholar
- [25] . 2018. HDDr: A recursive variant of the hierarchical Delta debugging algorithm. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, A-TEST@SIGSOFT FSE 2018. , , and (Eds.), ACM, 16–22.
DOI: Google ScholarDigital Library - [26] . 2014. Compiler validation via equivalence modulo inputs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’14. and (Eds.), ACM, 216–226.
DOI: Google ScholarDigital Library - [27] . 2015. Finding deep compiler bugs via guided stochastic program mutation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015. and (Eds.), ACM, 386–399.
DOI: Google ScholarDigital Library - [28] . 2023. Program reconditioning: Avoiding undefined behaviour when finding and reducing compiler bugs. Proceedings of the ACM on Programming Languages 7, PLDI (2023), 25 pages.
DOI: Google ScholarDigital Library - [29] . 2015. Many-core compiler fuzzing. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. and (Eds.), ACM, 65–76.
DOI: Google ScholarDigital Library - [30] . 2017. IncBricks: Toward in-network computation with an in-network cache. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2017. , , and (Eds.), ACM, 795–809.
DOI: Google ScholarDigital Library - [31] . 2020. Random testing for C and C++ compilers with YARPGen. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 196:1–196:25.
DOI: Google ScholarDigital Library - [32] . 2017. How to Submit an LLVM Bug Report. Retrieved Mar 20, 2023 from https://llvm.org/docs/HowToSubmitABug.htmlGoogle Scholar
- [33] . 2022. Clang Documentation – LibTooling. Retrieved May 29, 2023 from https://clang.llvm.org/docs/LibTooling.htmlGoogle Scholar
- [34] . 2006. HDD: Hierarchical delta debugging. In Proceedings of the 28th International Conference on Software Engineering (ICSE 2006). , , and (Eds.), ACM, 142–151.
DOI: Google ScholarDigital Library - [35] . 1993. The LRU-K page replacement algorithm for database disk buffering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’93 Washington, DC, USA, May 26-28, 1993), Peter Buneman and Sushil Jajodia (Eds.). ACM Press, 297–306. Google ScholarDigital Library
- [36] . 2015. Reducers are Fuzzers – EMBEDDED IN ACADEMIA. Retrieved Jul 1, 2023 from https://blog.regehr.org/archives/1284Google Scholar
- [37] . 2016. [creduce-dev] cache. Retrieved May 29, 2023 from http://www.flux.utah.edu/listarchives/creduce-dev/msg00284.htmlGoogle Scholar
- [38] . 2012. Test-case reduction for C compiler bugs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’12. , , and (Eds.), ACM, 335–346.
DOI: Google ScholarDigital Library - [39] . 2013. Precimonious: Tuning assistant for floating-point precision. In SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1–12.Google ScholarDigital Library
- [40] . 2008. Lithium: Line-Based Testcase Reducer. Retrieved May 29, 2023 from https://github.com/MozillaSecurity/lithiumGoogle Scholar
- [41] . 2016. Finding compiler bugs via live code mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016. and (Eds.), ACM, 849–863.
DOI: Google ScholarDigital Library - [42] . 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016. and (Eds.), ACM, 294–305.
DOI: Google ScholarDigital Library - [43] . 2018. Perses: Syntax-guided program reduction. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018. , , , and (Eds.), ACM, 361–371.
DOI: Google ScholarDigital Library - [44] . 2022. XDebloat: Towards automated feature-oriented app debloating. IEEE Transactions on Software Engineering 48, 11 (2022), 4501–4520.
DOI: Google ScholarCross Ref - [45] . 2023. Ad hoc syntax-guided program reduction. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023. , , and (Eds.), ACM, New York, NY.Google Scholar
- [46] . 2023. Revisiting the evaluation of deep learning-based compiler testing. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023. (Ed.), ijcai.org.Google ScholarCross Ref
- [47] . 2002. Practical extraction techniques for Java. ACM Transactions on Programming Languages and Systems 24, 6 (2002), 625–666.
DOI: Google ScholarDigital Library - [48] . 2021. Probabilistic delta debugging. In Proceedings of the ESEC/FSE’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. , , , and (Eds.), ACM, 881–892.
DOI: Google ScholarDigital Library - [49] . 2023. Compilation consistency modulo debug information. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023. , , and (Eds.), ACM, 146–158.
DOI: Google ScholarDigital Library - [50] . 2023. Pushing the limit of 1-minimality of language-agnostic program reduction. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 29 pages.
DOI: Google ScholarDigital Library - [51] . 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011. and (Eds.), ACM, 283–294.
DOI: Google ScholarDigital Library - [52] . 2002. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering 28, 2 (2002), 183–200.
DOI: Google ScholarDigital Library - [53] . 2023. PPR: Pairwise program reduction. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). , , and (Eds.), ACM, New York, NY.Google Scholar
Index Terms
- On the Caching Schemes to Speed Up Program Reduction
Recommendations
Perses: syntax-guided program reduction
ICSE '18: Proceedings of the 40th International Conference on Software EngineeringGiven a program P that exhibits a certain property Ψ (e.g., a C program that crashes GCC when it is being compiled), the goal of program reduction is to minimize P to a smaller variant P′ that still exhibits the same property, i.e., Ψ(P′). Program ...
PPR: Pairwise Program Reduction
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringProgram reduction is a practical technique widely used for debugging compilers. To report a compiler bug with a bug-triggering program, one needs to minimize the program by removing bugirrelevant program elements first. Though existing program ...
Type Batched Program Reduction
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and AnalysisGiven a program with a property of interest, program reduction searches for a smaller program that preserves the property and is easier to understand. Domain agnostic program reducers can reduce programs of multiple languages without extra domain ...
Comments