skip to main content
research-article

Boosting Compiler Testing via Compiler Optimization Exploration

Authors Info & Claims
Published:22 August 2022Publication History
Skip Abstract Section

Abstract

Compilers are a kind of important software, and similar to the quality assurance of other software, compiler testing is one of the most widely-used ways of guaranteeing their quality. Compiler bugs tend to occur in compiler optimizations. Detecting optimization bugs needs to consider two main factors: (1) the optimization flags controlling the accessability of the compiler buggy code should be turned on; and (2) the test program should be able to trigger the buggy code. However, existing compiler testing approaches only consider the latter to generate effective test programs, but just run them under several pre-defined optimization levels (e.g., -O0, -O1, -O2, -O3, -Os in GCC).

To better understand the influence of compiler optimizations on compiler testing, we conduct the first empirical study, and find that (1) all the bugs detected under the widely-used optimization levels are also detected under the explored optimization settings (we call a combination of optimization flags turned on for compilation an optimization setting), while 83.54% of bugs are only detected under the latter; (2) there exist both inhibition effect and promotion effect among optimization flags for compiler testing, indicating the necessity and challenges of considering the factor of compiler optimizations in compiler testing.

We then propose the first approach, called COTest, by considering both factors to test compilers. Specifically, COTest first adopts machine-learning (the XGBoost algorithm) to model the relationship between test programs and optimization settings, to predict the bug-triggering probability of a test program under an optimization setting. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. The experiments on GCC and LLVM demonstrate its effectiveness, especially COTest detects 17 previously unknown bugs, 11 of which have been fixed or confirmed by developers.

REFERENCES

  1. [1] 2021. GCC. Retrieved December 1st, 2021 from https://gcc.gnu.org/.Google ScholarGoogle Scholar
  2. [2] 2021. IBM XL C Compiler. Retrieved December 1st, 2021 from https://www.ibm.com/products/xl-cpp-linux-compiler-power.Google ScholarGoogle Scholar
  3. [3] 2021. Intel C++ Compiler Classic. Retrieved December 1st, 2021 from https://software.intel.com/content/www/us/en/develop/articles/oneapi-c-compiler-release-notes.html.Google ScholarGoogle Scholar
  4. [4] 2021. LLVM. Retrieved December 1st, 2021 from https://llvm.org/.Google ScholarGoogle Scholar
  5. [5] 2021. OpenJ9 JIT Compiler. Retrieved December 1st, 2021 from https://www.eclipse.org/openj9/docs/jit/.Google ScholarGoogle Scholar
  6. [6] 2021. TVM. Retrieved December 1st, 2021 from https://tvm.apache.org/.Google ScholarGoogle Scholar
  7. [7] 2021. x86 Open64 Compiler Suite. Retrieved December 1st, 2021 from https://developer.amd.com/x86-open64-compiler-suite/.Google ScholarGoogle Scholar
  8. [8] Alipour Mohammad Amin, Groce Alex, Gopinath Rahul, and Christi Arpit. 2016. Generating focused random tests using directed swarm testing. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 7081.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Appel Andrew W. 2004. Modern Compiler Implementation in C. Cambridge university press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Ashouri Amir H., Killian William, Cavazos John, Palermo Gianluca, and Silvano Cristina. 2018. A survey on compiler autotuning using machine learning. Computing Surveys 51, 5 (2018), 142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Ashouri Amir Hossein, Mariani Giovanni, Palermo Gianluca, Park Eunjung, Cavazos John, and Silvano Cristina. 2016. Cobayn: Compiler autotuning framework using bayesian networks. ACM Transactions on Architecture and Code Optimization (TACO) 13, 2 (2016), 125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Ashouri Amir Hossein, Mariani Giovanni, Palermo Gianluca, and Silvano Cristina. 2014. A bayesian network approach for compiler auto-tuning for embedded processors. In Proceedings of the 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia). IEEE, 9097.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Bengio Yoshua, Courville Aaron C., and Vincent Pascal. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 17981828.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Cavazos John, Fursin Grigori, Agakov Felix, Bonilla Edwin, O’Boyle Michael FP, and Temam Olivier. 2007. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’07). IEEE, 185197.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Chen Junjie. 2018. Learning to accelerate compiler testing. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 472475.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Chen Junjie, Bai Yanwei, Hao Dan, Xiong Yingfei, Zhang Hongyu, and Xie Bing. 2017. Learning to prioritize test programs for compiler testing. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering. 700711.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Chen Junjie, Bai Yanwei, Hao Dan, Xiong Yingfei, Zhang Hongyu, Zhang Lu, and Xie Bing. 2016. Test case prioritization for compilers: A text-vector based approach. In Proceedings of the 2016 IEEE International Conference on Software Testing, Verification and Validation. 266277.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Chen Junjie, Han Jiaqi, Sun Peiyi, Zhang Lingming, Hao Dan, and Zhang Lu. 2019. Compiler bug isolation via effective witness test program generation. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 223234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Chen Junjie, Hu Wenxiang, Hao Dan, Xiong Yingfei, Zhang Hongyu, Zhang Lu, and Xie Bing. 2016. An empirical comparison of compiler testing techniques. In Proceedings of the 38th International Conference on Software Engineering. 180190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Chen Junjie, Lou Yiling, Zhang Lingming, Zhou Jianyi, Wang Xiaoleng, Hao Dan, and Zhang Lu. 2018. Optimizing test prioritization via test distribution analysis. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 656667.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Chen Junjie, Ma Haoyang, and Zhang Lingming. 2020. Enhanced compiler bug isolation via memoized search. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. to appear.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Chen Junjie, Patra Jibesh, Pradel Michael, Xiong Yingfei, Zhang Hongyu, Hao Dan, and Zhang Lu. 2020. A survey of compiler testing. ACM Computing Surveys 53, 1 (2020), 136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Chen Junjie, Wang Guancheng, Hao Dan, Xiong Yingfei, Zhang Hongyu, and Zhang Lu. 2019. History-guided configuration diversification for compiler test-program generation. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering. 305316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Chen Junjie, Wang Guancheng, Hao Dan, Xiong Yingfei, Zhang Hongyu, Zhang Lu, and Xie Bing. 2018. Coverage prediction for accelerating compiler testing. Transactions on Software Engineering 47, 2 (2018), 261–278.Google ScholarGoogle Scholar
  25. [25] Chen Junjie, Xu Ningxin, Chen Peiqi, and Zhang Hongyu. 2021. Efficient compiler autotuning via Bayesian optimization. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 11981209.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Chen Tianqi and Guestrin Carlos. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. 785794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Chen Tsong Y., Cheung Shing C., and Yiu Shiu Ming. 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong.Google ScholarGoogle Scholar
  28. [28] Chen Yang, Groce Alex, Zhang Chaoqiang, Wong Weng-Keen, Fern Xiaoli Z., Eide Eric, and Regehr John. 2013. Taming compiler fuzzers. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 197208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Cummins Chris, Petoumenos Pavlos, Murray Alastair, and Leather Hugh. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 95105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Deng Li and Yu Dong. 2014. Deep learning: Methods and applications. Foundations and Trends in Signal Processing 7, 3–4 (2014), 197387.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Donaldson Alastair F., Evrard Hugues, Lascu Andrei, and Thomson Paul. 2017. Automated testing of graphics shader compilers. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Donaldson Alastair F., Evrard Hugues, and Thomson Paul. 2020. Putting randomized compiler testing into production (experience report). In Proceedings of the 34th European Conference on Object-Oriented Programming (ECOOP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google ScholarGoogle Scholar
  33. [33] Donaldson Alastair F. and Lascu Andrei. 2016. Metamorphic testing for (graphics) compilers. In Proceedings of the 1st International Workshop on Metamorphic Testing. 4447.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Fang Shuangde, Xu Wenwen, Chen Yang, Eeckhout Lieven, Temam Olivier, Chen Yunji, Wu Chengyong, and Feng Xiaobing. 2015. Practical iterative optimization for the data center. ACM Transactions on Architecture and Code Optimization (TACO) 12, 2 (2015), 126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, François Bodin, Phil Barnard, Elton Ashton, Edwin V. Bonilla, John Thomson, Christopher K. I. Williams, and Michael F. P. O’Boyle. 2011. Milepost gcc: Machine learning enabled self-tuning compiler. International Journal of Parallel Programming 39, 3 (2011), 296327.Google ScholarGoogle Scholar
  36. [36] Groce Alex, Zhang Chaoqiang, Eide Eric, Chen Yang, and Regehr John. 2012. Swarm testing. In Proceedings of the 2012 International Symposium on Software Testing and Analysis. 7888.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Kamilaris Andreas and Prenafeta-Boldú Francesc X.. 2018. Deep learning in agriculture: A survey. Computers and Electronics in Agriculture 147 (2018), 7090.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Kennedy James and Eberhart Russell. 1995. Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks, Vol. 4. IEEE, 19421948.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Kokare Manesh, Chatterji B. N., and Biswas P. K.. 2003. Comparison of similarity metrics for texture image retrieval. In Proceedings of the TENCON 2003 Conference on Convergent Technologies for the Asia-Pacific Region, Vol. 2. 571575.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Le Vu, Afshari Mehrdad, and Su Zhendong. 2014. Compiler validation via equivalence modulo inputs. ACM SIGPLAN Notices 49, 6 (2014), 216226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Le Vu, Sun Chengnian, and Su Zhendong. 2015. Finding deep compiler bugs via guided stochastic program mutation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 386399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Lidbury Christopher, Lascu Andrei, Chong Nathan, and Donaldson Alastair F.. 2015. Many-core compiler fuzzing. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. 6576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Rushi Longadge and Snehalata Dongre. 2013. Class Imbalance Problem in Data Mining Review. CoRR abs/1305.1707.Google ScholarGoogle Scholar
  44. [44] Marcozzi Michaël, Tang Qiyi, Donaldson Alastair F., and Cadar Cristian. 2019. Compiler fuzzing: How much does it matter? Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] McKeeman William M.. 1998. Differential testing for software. Digital Technical Journal 10, 1 (1998), 100107.Google ScholarGoogle Scholar
  46. [46] Min Seonwoo, Lee Byunghan, and Yoon Sungroh. 2017. Deep learning in bioinformatics. Briefings in Bioinformatics 18, 5 (2017), 851869.Google ScholarGoogle Scholar
  47. [47] Myers Glenford J., Sandler Corey, and Badgett Tom. 2011. The Art of Software Testing. John Wiley & Sons.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Omri Safa and Sinz Carsten. 2020. Deep learning for software defect prediction: A survey. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. 209214.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] S. Gopal Krishna Patro and Kishore Kumar Sahu. 2015. Normalization: A Preprocessing Stage. CoRR abs/1503.06462.Google ScholarGoogle Scholar
  50. [50] Regehr John, Chen Yang, Cuoq Pascal, Eide Eric, Ellison Chucky, and Yang Xuejun. 2012. Test-case reduction for C compiler bugs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 335346.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Rodríguez Pau, Bautista Miguel A., Gonzalez Jordi, and Escalera Sergio. 2018. Beyond one-hot encoding: Lower dimensional target embedding. Image and Vision Computing 75 (2018), 2131.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Shen Qingchao, Ma Haoyang, Chen Junjie, Tian Yongqiang, Cheung Shing-Chi, and Chen Xiang. 2021. A comprehensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 968980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Shi Yuhui and Eberhart Russell C.. 1999. Empirical study of particle swarm optimization. In Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), Vol. 3. IEEE, 19451950.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Sun Chengnian, Le Vu, and Su Zhendong. 2016. Finding and analyzing compiler warning defects. In Proceedings of the 2016 IEEE/ACM 38th International Conference on Software Engineering. 203213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Sun Chengnian, Le Vu, and Su Zhendong. 2016. Finding compiler bugs via live code mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 849863.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Sun Chengnian, Le Vu, Zhang Qirun, and Su Zhendong. 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 294305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Tiwari Ananta, Chen Chun, Chame Jacqueline, Hall Mary, and Hollingsworth Jeffrey K.. 2009. A scalable auto-tuning framework for compiler optimization. In Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing. 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Triantafyllis Spyridon, Vachharajani Manish, Vachharajani Neil, and August David I.. 2003. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization. 204215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] David Waitzman, Craig Partridge, and Stephen E. Deering. 1988. Distance Vector Multicast Routing Protocol. RFC 1075, 1–24.Google ScholarGoogle Scholar
  60. [60] Wang Zan, Yan Ming, Chen Junjie, Liu Shuang, and Zhang Dongdi. 2020. Deep learning library testing via effective model generation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 788799.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Yan Ming, Chen Junjie, Zhang Xiangyu, Tan Lin, Wang Gan, and Wang Zan. 2021. Exposing numerical bugs in deep learning via gradient back-propagation. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 627638.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Yang Xuejun, Chen Yang, Eide Eric, and Regehr John. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. 283294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Zaytsev Vadim. 2018. An industrial case study in compiler testing (tool demo). In Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering. 97102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Zeller Andreas. 1999. Yesterday, my program worked. today, it does not. why?. In Proceedings of the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering. 253267.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Zhang Bo, Zhang Hongyu, Chen Junjie, Hao Dan, and Moscato Pablo. 2019. Automatic discovery and cleansing of numerical metamorphic relations. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution. IEEE, 235245.Google ScholarGoogle Scholar
  66. [66] Zhang Feng, Zheng Quan, Zou Ying, and Hassan Ahmed E.. 2016. Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of the 38th International Conference on Software Engineering. 309320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Zhang Qirun, Sun Chengnian, and Su Zhendong. 2017. Skeletal program enumeration for rigorous compiler testing. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 347361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Zhao Yingquan, Wang Zan, Chen Junjie, Liu Mengdi, Wu Mingyuan, Zhang Yuqun, and Zhang Lingming. 2022. History-driven test program synthesis for JVM testing. In Proceedings of the 44th International Conference on Software Engineering. to appear.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Boosting Compiler Testing via Compiler Optimization Exploration

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Software Engineering and Methodology
        ACM Transactions on Software Engineering and Methodology  Volume 31, Issue 4
        October 2022
        867 pages
        ISSN:1049-331X
        EISSN:1557-7392
        DOI:10.1145/3543992
        • Editor:
        • Mauro Pezzè
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 August 2022
        • Online AM: 5 March 2022
        • Accepted: 1 December 2021
        • Revised: 1 October 2021
        • Received: 1 December 2020
        Published in tosem Volume 31, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format