Abstract
Compilers are a kind of important software, and similar to the quality assurance of other software, compiler testing is one of the most widely-used ways of guaranteeing their quality. Compiler bugs tend to occur in compiler optimizations. Detecting optimization bugs needs to consider two main factors: (1) the optimization flags controlling the accessability of the compiler buggy code should be turned on; and (2) the test program should be able to trigger the buggy code. However, existing compiler testing approaches only consider the latter to generate effective test programs, but just run them under several pre-defined optimization levels (e.g.,
To better understand the influence of compiler optimizations on compiler testing, we conduct the first empirical study, and find that (1) all the bugs detected under the widely-used optimization levels are also detected under the explored optimization settings (we call a combination of optimization flags turned on for compilation an optimization setting), while 83.54% of bugs are only detected under the latter; (2) there exist both inhibition effect and promotion effect among optimization flags for compiler testing, indicating the necessity and challenges of considering the factor of compiler optimizations in compiler testing.
We then propose the first approach, called COTest, by considering both factors to test compilers. Specifically, COTest first adopts machine-learning (the XGBoost algorithm) to model the relationship between test programs and optimization settings, to predict the bug-triggering probability of a test program under an optimization setting. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. The experiments on GCC and LLVM demonstrate its effectiveness, especially COTest detects 17 previously unknown bugs, 11 of which have been fixed or confirmed by developers.
- [1] 2021. GCC. Retrieved December 1st, 2021 from https://gcc.gnu.org/.Google Scholar
- [2] 2021. IBM XL C Compiler. Retrieved December 1st, 2021 from https://www.ibm.com/products/xl-cpp-linux-compiler-power.Google Scholar
- [3] 2021. Intel C++ Compiler Classic. Retrieved December 1st, 2021 from https://software.intel.com/content/www/us/en/develop/articles/oneapi-c-compiler-release-notes.html.Google Scholar
- [4] 2021. LLVM. Retrieved December 1st, 2021 from https://llvm.org/.Google Scholar
- [5] 2021. OpenJ9 JIT Compiler. Retrieved December 1st, 2021 from https://www.eclipse.org/openj9/docs/jit/.Google Scholar
- [6] 2021. TVM. Retrieved December 1st, 2021 from https://tvm.apache.org/.Google Scholar
- [7] 2021. x86 Open64 Compiler Suite. Retrieved December 1st, 2021 from https://developer.amd.com/x86-open64-compiler-suite/.Google Scholar
- [8] . 2016. Generating focused random tests using directed swarm testing. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 70–81.Google ScholarDigital Library
- [9] . 2004. Modern Compiler Implementation in C. Cambridge university press.Google ScholarDigital Library
- [10] . 2018. A survey on compiler autotuning using machine learning. Computing Surveys 51, 5 (2018), 1–42.Google ScholarDigital Library
- [11] . 2016. Cobayn: Compiler autotuning framework using bayesian networks. ACM Transactions on Architecture and Code Optimization (TACO) 13, 2 (2016), 1–25.Google ScholarDigital Library
- [12] . 2014. A bayesian network approach for compiler auto-tuning for embedded processors. In Proceedings of the 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia). IEEE, 90–97.Google ScholarCross Ref
- [13] . 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798–1828.Google ScholarDigital Library
- [14] . 2007. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’07). IEEE, 185–197.Google ScholarCross Ref
- [15] . 2018. Learning to accelerate compiler testing. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 472–475.Google ScholarDigital Library
- [16] . 2017. Learning to prioritize test programs for compiler testing. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering. 700–711.Google ScholarDigital Library
- [17] . 2016. Test case prioritization for compilers: A text-vector based approach. In Proceedings of the 2016 IEEE International Conference on Software Testing, Verification and Validation. 266–277.Google ScholarCross Ref
- [18] . 2019. Compiler bug isolation via effective witness test program generation. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 223–234.Google ScholarDigital Library
- [19] . 2016. An empirical comparison of compiler testing techniques. In Proceedings of the 38th International Conference on Software Engineering. 180–190.Google ScholarDigital Library
- [20] . 2018. Optimizing test prioritization via test distribution analysis. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 656–667.Google ScholarDigital Library
- [21] . 2020. Enhanced compiler bug isolation via memoized search. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. to appear.Google ScholarDigital Library
- [22] . 2020. A survey of compiler testing. ACM Computing Surveys 53, 1 (2020), 1–36.Google ScholarDigital Library
- [23] . 2019. History-guided configuration diversification for compiler test-program generation. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering. 305–316.Google ScholarDigital Library
- [24] . 2018. Coverage prediction for accelerating compiler testing. Transactions on Software Engineering 47, 2 (2018), 261–278.Google Scholar
- [25] . 2021. Efficient compiler autotuning via Bayesian optimization. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 1198–1209.Google ScholarDigital Library
- [26] . 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. 785–794.Google ScholarDigital Library
- [27] . 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases.
Technical Report . Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong.Google Scholar - [28] . 2013. Taming compiler fuzzers. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 197–208.Google ScholarDigital Library
- [29] . 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 95–105.Google ScholarDigital Library
- [30] . 2014. Deep learning: Methods and applications. Foundations and Trends in Signal Processing 7, 3–4 (2014), 197–387.Google ScholarDigital Library
- [31] . 2017. Automated testing of graphics shader compilers. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1–29.Google ScholarDigital Library
- [32] . 2020. Putting randomized compiler testing into production (experience report). In Proceedings of the 34th European Conference on Object-Oriented Programming (ECOOP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google Scholar
- [33] . 2016. Metamorphic testing for (graphics) compilers. In Proceedings of the 1st International Workshop on Metamorphic Testing. 44–47.Google ScholarDigital Library
- [34] . 2015. Practical iterative optimization for the data center. ACM Transactions on Architecture and Code Optimization (TACO) 12, 2 (2015), 1–26.Google ScholarDigital Library
- [35] Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, François Bodin, Phil Barnard, Elton Ashton, Edwin V. Bonilla, John Thomson, Christopher K. I. Williams, and Michael F. P. O’Boyle. 2011. Milepost gcc: Machine learning enabled self-tuning compiler. International Journal of Parallel Programming 39, 3 (2011), 296–327.Google Scholar
- [36] . 2012. Swarm testing. In Proceedings of the 2012 International Symposium on Software Testing and Analysis. 78–88.Google ScholarDigital Library
- [37] . 2018. Deep learning in agriculture: A survey. Computers and Electronics in Agriculture 147 (2018), 70–90.Google ScholarCross Ref
- [38] . 1995. Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks, Vol. 4. IEEE, 1942–1948.Google ScholarCross Ref
- [39] . 2003. Comparison of similarity metrics for texture image retrieval. In Proceedings of the TENCON 2003 Conference on Convergent Technologies for the Asia-Pacific Region, Vol. 2. 571–575.Google ScholarCross Ref
- [40] . 2014. Compiler validation via equivalence modulo inputs. ACM SIGPLAN Notices 49, 6 (2014), 216–226.Google ScholarDigital Library
- [41] . 2015. Finding deep compiler bugs via guided stochastic program mutation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 386–399.Google ScholarDigital Library
- [42] . 2015. Many-core compiler fuzzing. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. 65–76.Google ScholarDigital Library
- [43] Rushi Longadge and Snehalata Dongre. 2013. Class Imbalance Problem in Data Mining Review. CoRR abs/1305.1707.Google Scholar
- [44] . 2019. Compiler fuzzing: How much does it matter? Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1–29.Google ScholarDigital Library
- [45] . 1998. Differential testing for software. Digital Technical Journal 10, 1 (1998), 100–107.Google Scholar
- [46] . 2017. Deep learning in bioinformatics. Briefings in Bioinformatics 18, 5 (2017), 851–869.Google Scholar
- [47] . 2011. The Art of Software Testing. John Wiley & Sons.Google ScholarDigital Library
- [48] . 2020. Deep learning for software defect prediction: A survey. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. 209–214.Google ScholarDigital Library
- [49] S. Gopal Krishna Patro and Kishore Kumar Sahu. 2015. Normalization: A Preprocessing Stage. CoRR abs/1503.06462.Google Scholar
- [50] . 2012. Test-case reduction for C compiler bugs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 335–346.Google ScholarDigital Library
- [51] . 2018. Beyond one-hot encoding: Lower dimensional target embedding. Image and Vision Computing 75 (2018), 21–31.Google ScholarCross Ref
- [52] . 2021. A comprehensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 968–980.Google ScholarDigital Library
- [53] . 1999. Empirical study of particle swarm optimization. In Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), Vol. 3. IEEE, 1945–1950.Google ScholarCross Ref
- [54] . 2016. Finding and analyzing compiler warning defects. In Proceedings of the 2016 IEEE/ACM 38th International Conference on Software Engineering. 203–213.Google ScholarDigital Library
- [55] . 2016. Finding compiler bugs via live code mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 849–863.Google ScholarDigital Library
- [56] . 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 294–305.Google ScholarDigital Library
- [57] . 2009. A scalable auto-tuning framework for compiler optimization. In Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing. 1–12.Google ScholarDigital Library
- [58] . 2003. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization. 204–215.Google ScholarDigital Library
- [59] David Waitzman, Craig Partridge, and Stephen E. Deering. 1988. Distance Vector Multicast Routing Protocol. RFC 1075, 1–24.Google Scholar
- [60] . 2020. Deep learning library testing via effective model generation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 788–799.Google ScholarDigital Library
- [61] . 2021. Exposing numerical bugs in deep learning via gradient back-propagation. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 627–638.Google ScholarDigital Library
- [62] . 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. 283–294.Google ScholarDigital Library
- [63] . 2018. An industrial case study in compiler testing (tool demo). In Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering. 97–102.Google ScholarDigital Library
- [64] . 1999. Yesterday, my program worked. today, it does not. why?. In Proceedings of the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering. 253–267.Google ScholarCross Ref
- [65] . 2019. Automatic discovery and cleansing of numerical metamorphic relations. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution. IEEE, 235–245.Google Scholar
- [66] . 2016. Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of the 38th International Conference on Software Engineering. 309–320.Google ScholarDigital Library
- [67] . 2017. Skeletal program enumeration for rigorous compiler testing. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 347–361.Google ScholarDigital Library
- [68] . 2022. History-driven test program synthesis for JVM testing. In Proceedings of the 44th International Conference on Software Engineering.
to appear. Google ScholarDigital Library
Index Terms
- Boosting Compiler Testing via Compiler Optimization Exploration
Recommendations
A Survey of Compiler Testing
Virtually any software running on a computer has been processed by a compiler or a compiler-like tool. Because compilers are such a crucial piece of infrastructure for building software, their correctness is of paramount importance. To validate and ...
Learning to accelerate compiler testing
ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion ProceeedingsCompilers are one of the most important software infrastructures. Compiler testing is an effective and widely-used way to assure the quality of compilers. While many compiler testing techniques have been proposed to detect compiler bugs, these ...
Enriching Compiler Testing with Real Program from Bug Report
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software EngineeringResearchers have proposed various approaches to generate test programs. The state-of-the-art approaches can be roughly divided into random-based and mutation-based approaches: random-based approaches generate random programs and mutation-based ...
Comments