skip to main content
research-article

On the Caching Schemes to Speed Up Program Reduction

Published:24 November 2023Publication History
Skip Abstract Section

Abstract

Program reduction is a highly practical, widely demanded technique to help debug language tools, such as compilers, interpreters and debuggers. Given a program P that exhibits a property ψ, conceptually, program reduction iteratively applies various program transformations to generate a vast number of variants from P by deleting certain tokens and returns the minimal variant preserving ψ as the result.

A program reduction process inevitably generates duplicate variants, and the number of them can be significant. Our study reveals that on average 61.8% and 24.3% of the generated variants in two representative program reducers HDD and Perses, respectively, are duplicates. Checking them against ψ is thus redundant and unnecessary, which wastes time and computation resources. Although it seems that simply caching the generated variants can avoid redundant property tests, such a trivial method is impractical in the real world due to the significant memory footprint. Therefore, a memory-efficient caching scheme for program reduction is in great demand.

This study is the first effort to conduct a systematic, extensive analysis of memory-efficient caching schemes for program reduction. We first propose to use two well-known compression methods, ZIP and SHA, to compress the generated variants before they are stored in the cache. Furthermore, our keen understanding on the program reduction process motivates us to propose a novel, domain-specific, both memory and computation-efficient caching scheme, Refreshable Compact Caching (RCC). Our key insight is two-fold: ① by leveraging the correlation between variants and the original program P, we losslessly encode each variant into an equivalent, compact, canonical representation; ② periodically, stale cache entries, which will never be accessed, are timely removed to minimize the memory footprint over time.

Our extensive evaluation on 31 real-world C compiler bugs demonstrates that caching schemes help avoid issuing redundant queries by 61.8% and 24.3% in HDD and Perses, respectively; correspondingly, the runtime performance is notably boosted by 22.8% and 18.2%. With regard to the memory efficiency, all three methods use less memory than the state-of-the-art string-based scheme STR. Specifically, ZIP and SHA cut down the memory footprint by more than 80% and 90% in both Perses and HDD compared to STR; moreover, the highly-scalable, domain-specific RCC dominates peer schemes, and outperforms the SHA by 96.4% and 91.74% in HDD and Perses, respectively.

REFERENCES

  1. [1] Andreour Dimitris. 2015. ObjectExplorer. Retrieved May 29, 2023 from https://github.com/DimitrisAndreou/memory-measurerGoogle ScholarGoogle Scholar
  2. [2] Balestrat Antoine. 2012. CCG: A Random C Code Generator. Retrieved May 29, 2023 from https://github.com/Merkil/ccg/Google ScholarGoogle Scholar
  3. [3] Binkley David W., Gold Nicolas, Harman Mark, Islam Syed S., Krinke Jens, and Yoo Shin. 2014. ORBS: Language-independent program slicing. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22). Cheung Shing-Chi, Orso Alessandro, and Storey Margaret-Anne D. (Eds.), ACM, 109120. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Bruce Bobby R., Zhang Tianyi, Arora Jaspreet, Xu Guoqing Harry, and Kim Miryung. 2020. JShrink: In-depth investigation into debloating modern Java applications. In Proceedings of the ESEC/FSE’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event. Devanbu Prem, Cohen Myra B., and Zimmermann Thomas (Eds.), ACM, 135146. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Brunero Federico and Elia Petros. 2023. Fundamental limits of combinatorial multi-access caching. IEEE Transactions on Information Theory 69, 2 (2023), 10371056. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Buntinas Darius, Goglin Brice, Goodell David, Mercier Guillaume, and Moreaud Stéphanie. 2009. Cache-efficient, intranode, large-message MPI communication with MPICH2-Nemesis. In Proceedings of the ICPP 2009, International Conference on Parallel Processing. IEEE Computer Society, 462469. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Chen Yang, Groce Alex, Zhang Chaoqiang, Wong Weng-Keen, Fern Xiaoli Z., Eide Eric, and Regehr John. 2013. Taming compiler fuzzers. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’13. Boehm Hans-Juergen and Flanagan Cormac (Eds.), ACM, 197208. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Chou Hong Tai and DeWitt David J.. 1986. An evaluation of buffer management strategies for relational database systems. Algorithmica 1, 1 (1986), 311336. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Dar Shaul, Franklin Michael J., Jónsson Björn Þór, Srivastava Divesh, and Tan Michael. 1996. Semantic data caching and replacement. In VLDB’96, Proceedings of 22th International Conference on Very Large Data Bases. Vijayaraman T. M., Buchmann Alejandro P., Mohan C., and Sarda Nandlal L. (Eds.), Morgan Kaufmann, 330341. Retrieved from http://www.vldb.org/conf/1996/P330.PDFGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Deutsch L. Peter and Jean-loup Gailly. 1996. ZLIB Compressed Data Format Specification version 3.3 (RFC’50), RFC Editor, 11 pages. https://www.rfc-editor.org/info/rfc1950Google ScholarGoogle Scholar
  11. [11] Donaldson Alastair and MacIver David. 2021. Test Case Reduction: Beyond Bugs. Retrieved May 29, 2023 from https://blog.sigplan.org/2021/05/25/test-case-reduction-beyond-bugsGoogle ScholarGoogle Scholar
  12. [12] GCC. 2017. A Guide to Testcase Reduction. Retrieved May 29, 2023 from https://gcc.gnu.org/wiki/A_guide_to_testcase_reductionGoogle ScholarGoogle Scholar
  13. [13] Roelofs Mark Adler Greg. 1996. A Massively Spiffy Yet Delicately Unobtrusive Compression Library. Retrieved Jul 1, 2023 from https://zlib.netGoogle ScholarGoogle Scholar
  14. [14] Gueron Shay, Johnson Simon, and Walker Jesse. 2011. SHA-512/256. In Proceedings of the 8th International Conference on Information Technology: New Generations, ITNG 2011.Latifi Shahram (Ed.), IEEE Computer Society, 354358. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Handy Jim. 1998. The Cache Memory Book (2nd Ed.): The Authoritative Reference on Cache Design. Academic Press, Inc.Google ScholarGoogle Scholar
  16. [16] Heo Kihong, Lee Woosuk, Pashakhanloo Pardis, and Naik Mayur. 2018. Effective program debloating via reinforcement learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018. Lie David, Mannan Mohammad, Backes Michael, and Wang XiaoFeng (Eds.), ACM, 380394. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Herfert Satia, Patra Jibesh, and Pradel Michael. 2017. Automatically reducing tree-structured test inputs. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017. Rosu Grigore, Penta Massimiliano Di, and Nguyen Tien N. (Eds.), IEEE Computer Society, 861871. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Hodován Renáta and Kiss Ákos. 2016. Modernizing hierarchical delta debugging. In Proceedings of the 7th International Workshop on Automating Test Case Design, Selection, and Evaluation, A-TEST@SIGSOFT FSE 2016. Vos Tanja E. J., Eldh Sigrid, and Prasetya Wishnu (Eds.), ACM, 3137. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Hodován Renáta, Kiss Ákos, and Gyimóthy Tibor. 2017. Coarse hierarchical delta debugging. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017. IEEE Computer Society, 194203. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Hodován Renáta, Kiss Ákos, and Gyimóthy Tibor. 2017. Tree preprocessing and test outcome caching for efficient hierarchical delta debugging. In 12th IEEE/ACM International Workshop on Automation of Software Testing (AST@ICSE’17, Buenos Aires, Argentina, May 20-21, 2017), IEEE Computer Society, 23–29. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Johnson Theodore and Shasha Dennis E.. 1994. 2Q: A low overhead high performance buffer management replacement algorithm. In VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases. Bocca Jorge B., Jarke Matthias, and Zaniolo Carlo (Eds.), Morgan Kaufmann, 439450. Retrieved from http://www.vldb.org/conf/1994/P439.PDFGoogle ScholarGoogle Scholar
  22. [22] Kalhauge Christian Gram and Palsberg Jens. 2019. Binary reduction of dependency graphs. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019.Dumas Marlon, Pfahl Dietmar, Apel Sven, and Russo Alessandra (Eds.), ACM, 556566. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Kalhauge Christian Gram and Palsberg Jens. 2021. Logical bytecode reduction. In Proceedings of the PLDI’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event. Freund Stephen N. and Yahav Eran (Eds.), ACM, 10031016. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Kelsey J., Change S., Perlner and R.. 2016. SHA-3 derived functions: CSHAKE, KMAC, TupleHash and ParallelHash.NISTSpecialPublication 800 (2016), 185 pages. https://www.nist.gov/publications/sha-3-derived-functions-cshake-kmac-tuplehash-and-parallelhashGoogle ScholarGoogle Scholar
  25. [25] Kiss Ákos, Hodován Renáta, and Gyimóthy Tibor. 2018. HDDr: A recursive variant of the hierarchical Delta debugging algorithm. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, A-TEST@SIGSOFT FSE 2018. Prasetya Wishnu, Vos Tanja E. J., and Getir Sinem (Eds.), ACM, 1622. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Le Vu, Afshari Mehrdad, and Su Zhendong. 2014. Compiler validation via equivalence modulo inputs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’14.O’Boyle Michael F. P. and Pingali Keshav (Eds.), ACM, 216226. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Le Vu, Sun Chengnian, and Su Zhendong. 2015. Finding deep compiler bugs via guided stochastic program mutation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015. Aldrich Jonathan and Eugster Patrick (Eds.), ACM, 386399. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Lecoeur Bastien, Mohsin Hasan, and Donaldson Alastair F.. 2023. Program reconditioning: Avoiding undefined behaviour when finding and reducing compiler bugs. Proceedings of the ACM on Programming Languages 7, PLDI (2023), 25 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Lidbury Christopher, Lascu Andrei, Chong Nathan, and Donaldson Alastair F.. 2015. Many-core compiler fuzzing. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation.Grove David and Blackburn Stephen M. (Eds.), ACM, 6576. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Liu Ming, Luo Liang, Nelson Jacob, Ceze Luis, Krishnamurthy Arvind, and Atreya Kishore. 2017. IncBricks: Toward in-network computation with an in-network cache. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2017. Chen Yunji, Temam Olivier, and Carter John (Eds.), ACM, 795809. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Livinskii Vsevolod, Babokin Dmitry, and Regehr John. 2020. Random testing for C and C++ compilers with YARPGen. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 196:1–196:25. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] LLVM. 2017. How to Submit an LLVM Bug Report. Retrieved Mar 20, 2023 from https://llvm.org/docs/HowToSubmitABug.htmlGoogle ScholarGoogle Scholar
  33. [33] LLVM/Clang. 2022. Clang Documentation – LibTooling. Retrieved May 29, 2023 from https://clang.llvm.org/docs/LibTooling.htmlGoogle ScholarGoogle Scholar
  34. [34] Misherghi Ghassan and Su Zhendong. 2006. HDD: Hierarchical delta debugging. In Proceedings of the 28th International Conference on Software Engineering (ICSE 2006). Osterweil Leon J., Rombach H. Dieter, and Soffa Mary Lou (Eds.), ACM, 142151. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] O’Neil Elizabeth J., O’Neil Patrick E., and Weikum Gerhard. 1993. The LRU-K page replacement algorithm for database disk buffering. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’93 Washington, DC, USA, May 26-28, 1993), Peter Buneman and Sushil Jajodia (Eds.). ACM Press, 297–306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Regehr John. 2015. Reducers are Fuzzers – EMBEDDED IN ACADEMIA. Retrieved Jul 1, 2023 from https://blog.regehr.org/archives/1284Google ScholarGoogle Scholar
  37. [37] Regehr John. 2016. [creduce-dev] cache. Retrieved May 29, 2023 from http://www.flux.utah.edu/listarchives/creduce-dev/msg00284.htmlGoogle ScholarGoogle Scholar
  38. [38] Regehr John, Chen Yang, Cuoq Pascal, Eide Eric, Ellison Chucky, and Yang Xuejun. 2012. Test-case reduction for C compiler bugs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’12. Vitek Jan, Lin Haibo, and Tip Frank (Eds.), ACM, 335346. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Rubio-González Cindy, Nguyen Cuong, Nguyen Hong Diep, Demmel James, Kahan William, Sen Koushik, Bailey David H., Iancu Costin, and Hough David. 2013. Precimonious: Tuning assistant for floating-point precision. In SC’13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Security Mozilla. 2008. Lithium: Line-Based Testcase Reducer. Retrieved May 29, 2023 from https://github.com/MozillaSecurity/lithiumGoogle ScholarGoogle Scholar
  41. [41] Sun Chengnian, Le Vu, and Su Zhendong. 2016. Finding compiler bugs via live code mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016. Visser Eelco and Smaragdakis Yannis (Eds.), ACM, 849863. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Sun Chengnian, Le Vu, Zhang Qirun, and Su Zhendong. 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016. Zeller Andreas and Roychoudhury Abhik (Eds.), ACM, 294305. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Sun Chengnian, Li Yuanbo, Zhang Qirun, Gu Tianxiao, and Su Zhendong. 2018. Perses: Syntax-guided program reduction. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018. Chaudron Michel, Crnkovic Ivica, Chechik Marsha, and Harman Mark (Eds.), ACM, 361371. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Tang Yutian, Zhou Hao, Luo Xiapu, Chen Ting, Wang Haoyu, Xu Zhou, and Cai Yan. 2022. XDebloat: Towards automated feature-oriented app debloating. IEEE Transactions on Software Engineering 48, 11 (2022), 45014520. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Tian Jia Le, Zhang Mengxiao, Xu Zhenyang, Tian Yongqiang, Dong Yiwen, and Sun Chengnian. 2023. Ad hoc syntax-guided program reduction. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023.Chandra Satish, Blincoe Kelly, and Tonella Paolo (Eds.), ACM, New York, NY.Google ScholarGoogle Scholar
  46. [46] Tian Yongqiang, Xu Zhenyang, Dong Yiwen, Sun Chengnian, and Cheung Shing-Chi. 2023. Revisiting the evaluation of deep learning-based compiler testing. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023. Elkind Edith (Ed.), ijcai.org.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Tip Frank, Sweeney Peter F., Laffra Chris, Eisma Aldo, and Streeter David. 2002. Practical extraction techniques for Java. ACM Transactions on Programming Languages and Systems 24, 6 (2002), 625666. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Wang Guancheng, Shen Ruobing, Chen Junjie, Xiong Yingfei, and Zhang Lu. 2021. Probabilistic delta debugging. In Proceedings of the ESEC/FSE’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Spinellis Diomidis, Gousios Georgios, Chechik Marsha, and Penta Massimiliano Di (Eds.), ACM, 881892. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Wang Theodore Luo, Tian Yongqiang, Dong Yiwen, Xu Zhenyang, and Sun Chengnian. 2023. Compilation consistency modulo debug information. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023. Aamodt Tor M., Jerger Natalie D. Enright, and Swift Michael M. (Eds.), ACM, 146158. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Xu Zhenyang, Tian Yongqiang, Zhang Mengxiao, Zhao Gaosen, Jiang Yu, and Sun Chengnian. 2023. Pushing the limit of 1-minimality of language-agnostic program reduction. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 29 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Yang Xuejun, Chen Yang, Eide Eric, and Regehr John. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011. Hall Mary W. and Padua David A. (Eds.), ACM, 283294. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Zeller Andreas and Hildebrandt Ralf. 2002. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering 28, 2 (2002), 183200. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Zhang Mengxiao, Xu Zhenyang, Tian Yongqiang, Jiang Yu, and Sun Chengnian. 2023. PPR: Pairwise program reduction. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Chandra Satish, Blincoe Kelly, and Tonella Paolo (Eds.), ACM, New York, NY.Google ScholarGoogle Scholar

Index Terms

  1. On the Caching Schemes to Speed Up Program Reduction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Software Engineering and Methodology
      ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 1
      January 2024
      933 pages
      ISSN:1049-331X
      EISSN:1557-7392
      DOI:10.1145/3613536
      • Editor:
      • Mauro Pezzè
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 November 2023
      • Online AM: 5 September 2023
      • Accepted: 24 July 2023
      • Revised: 7 June 2023
      • Received: 5 October 2022
      Published in tosem Volume 33, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)248
      • Downloads (Last 6 weeks)50

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text