research-article

Boosting Compiler Testing via Compiler Optimization Exploration

Authors:
Junjie Chen

College of Intelligence and Computing, Tianjin University, Tianjin, China

College of Intelligence and Computing, Tianjin University, Tianjin, China

0000-0003-3056-9962
View Profile

,
Chenyao Suo

College of Intelligence and Computing, Tianjin University, Tianjin, China

College of Intelligence and Computing, Tianjin University, Tianjin, China
View Profile

ACM Transactions on Software Engineering and Methodology Volume 31 Issue 4Article No.: 72pp 1–33https://doi.org/10.1145/3508362

Published:22 August 2022Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Compilers are a kind of important software, and similar to the quality assurance of other software, compiler testing is one of the most widely-used ways of guaranteeing their quality. Compiler bugs tend to occur in compiler optimizations. Detecting optimization bugs needs to consider two main factors: (1) the optimization flags controlling the accessability of the compiler buggy code should be turned on; and (2) the test program should be able to trigger the buggy code. However, existing compiler testing approaches only consider the latter to generate effective test programs, but just run them under several pre-defined optimization levels (e.g., -O0, -O1, -O2, -O3, -Os in GCC).

To better understand the influence of compiler optimizations on compiler testing, we conduct the first empirical study, and find that (1) all the bugs detected under the widely-used optimization levels are also detected under the explored optimization settings (we call a combination of optimization flags turned on for compilation an optimization setting), while 83.54% of bugs are only detected under the latter; (2) there exist both inhibition effect and promotion effect among optimization flags for compiler testing, indicating the necessity and challenges of considering the factor of compiler optimizations in compiler testing.

We then propose the first approach, called COTest, by considering both factors to test compilers. Specifically, COTest first adopts machine-learning (the XGBoost algorithm) to model the relationship between test programs and optimization settings, to predict the bug-triggering probability of a test program under an optimization setting. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. The experiments on GCC and LLVM demonstrate its effectiveness, especially COTest detects 17 previously unknown bugs, 11 of which have been fixed or confirmed by developers.

REFERENCES

[1] 2021. GCC. Retrieved December 1st, 2021 from https://gcc.gnu.org/.Google Scholar
[2] 2021. IBM XL C Compiler. Retrieved December 1st, 2021 from https://www.ibm.com/products/xl-cpp-linux-compiler-power.Google Scholar
[3] 2021. Intel C++ Compiler Classic. Retrieved December 1st, 2021 from https://software.intel.com/content/www/us/en/develop/articles/oneapi-c-compiler-release-notes.html.Google Scholar
[4] 2021. LLVM. Retrieved December 1st, 2021 from https://llvm.org/.Google Scholar
[5] 2021. OpenJ9 JIT Compiler. Retrieved December 1st, 2021 from https://www.eclipse.org/openj9/docs/jit/.Google Scholar
[6] 2021. TVM. Retrieved December 1st, 2021 from https://tvm.apache.org/.Google Scholar
[7] 2021. x86 Open64 Compiler Suite. Retrieved December 1st, 2021 from https://developer.amd.com/x86-open64-compiler-suite/.Google Scholar
[8] Alipour Mohammad Amin, Groce Alex, Gopinath Rahul, and Christi Arpit. 2016. Generating focused random tests using directed swarm testing. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 70–81.Google ScholarDigital Library
[9] Appel Andrew W. 2004. Modern Compiler Implementation in C. Cambridge university press.Google ScholarDigital Library
[10] Ashouri Amir H., Killian William, Cavazos John, Palermo Gianluca, and Silvano Cristina. 2018. A survey on compiler autotuning using machine learning. Computing Surveys 51, 5 (2018), 1–42.Google ScholarDigital Library
[11] Ashouri Amir Hossein, Mariani Giovanni, Palermo Gianluca, Park Eunjung, Cavazos John, and Silvano Cristina. 2016. Cobayn: Compiler autotuning framework using bayesian networks. ACM Transactions on Architecture and Code Optimization (TACO) 13, 2 (2016), 1–25.Google ScholarDigital Library
[12] Ashouri Amir Hossein, Mariani Giovanni, Palermo Gianluca, and Silvano Cristina. 2014. A bayesian network approach for compiler auto-tuning for embedded processors. In Proceedings of the 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia). IEEE, 90–97.Google ScholarCross Ref
[13] Bengio Yoshua, Courville Aaron C., and Vincent Pascal. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798–1828.Google ScholarDigital Library
[14] Cavazos John, Fursin Grigori, Agakov Felix, Bonilla Edwin, O’Boyle Michael FP, and Temam Olivier. 2007. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’07). IEEE, 185–197.Google ScholarCross Ref
[15] Chen Junjie. 2018. Learning to accelerate compiler testing. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 472–475.Google ScholarDigital Library
[16] Chen Junjie, Bai Yanwei, Hao Dan, Xiong Yingfei, Zhang Hongyu, and Xie Bing. 2017. Learning to prioritize test programs for compiler testing. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering. 700–711.Google ScholarDigital Library
[17] Chen Junjie, Bai Yanwei, Hao Dan, Xiong Yingfei, Zhang Hongyu, Zhang Lu, and Xie Bing. 2016. Test case prioritization for compilers: A text-vector based approach. In Proceedings of the 2016 IEEE International Conference on Software Testing, Verification and Validation. 266–277.Google ScholarCross Ref
[18] Chen Junjie, Han Jiaqi, Sun Peiyi, Zhang Lingming, Hao Dan, and Zhang Lu. 2019. Compiler bug isolation via effective witness test program generation. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 223–234.Google ScholarDigital Library
[19] Chen Junjie, Hu Wenxiang, Hao Dan, Xiong Yingfei, Zhang Hongyu, Zhang Lu, and Xie Bing. 2016. An empirical comparison of compiler testing techniques. In Proceedings of the 38th International Conference on Software Engineering. 180–190.Google ScholarDigital Library
[20] Chen Junjie, Lou Yiling, Zhang Lingming, Zhou Jianyi, Wang Xiaoleng, Hao Dan, and Zhang Lu. 2018. Optimizing test prioritization via test distribution analysis. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 656–667.Google ScholarDigital Library
[21] Chen Junjie, Ma Haoyang, and Zhang Lingming. 2020. Enhanced compiler bug isolation via memoized search. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. to appear.Google ScholarDigital Library
[22] Chen Junjie, Patra Jibesh, Pradel Michael, Xiong Yingfei, Zhang Hongyu, Hao Dan, and Zhang Lu. 2020. A survey of compiler testing. ACM Computing Surveys 53, 1 (2020), 1–36.Google ScholarDigital Library
[23] Chen Junjie, Wang Guancheng, Hao Dan, Xiong Yingfei, Zhang Hongyu, and Zhang Lu. 2019. History-guided configuration diversification for compiler test-program generation. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering. 305–316.Google ScholarDigital Library
[24] Chen Junjie, Wang Guancheng, Hao Dan, Xiong Yingfei, Zhang Hongyu, Zhang Lu, and Xie Bing. 2018. Coverage prediction for accelerating compiler testing. Transactions on Software Engineering 47, 2 (2018), 261–278.Google Scholar
[25] Chen Junjie, Xu Ningxin, Chen Peiqi, and Zhang Hongyu. 2021. Efficient compiler autotuning via Bayesian optimization. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 1198–1209.Google ScholarDigital Library
[26] Chen Tianqi and Guestrin Carlos. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. 785–794.Google ScholarDigital Library
[27] Chen Tsong Y., Cheung Shing C., and Yiu Shiu Ming. 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases. Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong.Google Scholar
[28] Chen Yang, Groce Alex, Zhang Chaoqiang, Wong Weng-Keen, Fern Xiaoli Z., Eide Eric, and Regehr John. 2013. Taming compiler fuzzers. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 197–208.Google ScholarDigital Library
[29] Cummins Chris, Petoumenos Pavlos, Murray Alastair, and Leather Hugh. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. 95–105.Google ScholarDigital Library
[30] Deng Li and Yu Dong. 2014. Deep learning: Methods and applications. Foundations and Trends in Signal Processing 7, 3–4 (2014), 197–387.Google ScholarDigital Library
[31] Donaldson Alastair F., Evrard Hugues, Lascu Andrei, and Thomson Paul. 2017. Automated testing of graphics shader compilers. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1–29.Google ScholarDigital Library
[32] Donaldson Alastair F., Evrard Hugues, and Thomson Paul. 2020. Putting randomized compiler testing into production (experience report). In Proceedings of the 34th European Conference on Object-Oriented Programming (ECOOP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google Scholar
[33] Donaldson Alastair F. and Lascu Andrei. 2016. Metamorphic testing for (graphics) compilers. In Proceedings of the 1st International Workshop on Metamorphic Testing. 44–47.Google ScholarDigital Library
[34] Fang Shuangde, Xu Wenwen, Chen Yang, Eeckhout Lieven, Temam Olivier, Chen Yunji, Wu Chengyong, and Feng Xiaobing. 2015. Practical iterative optimization for the data center. ACM Transactions on Architecture and Code Optimization (TACO) 12, 2 (2015), 1–26.Google ScholarDigital Library
[35] Grigori Fursin, Yuriy Kashnikov, Abdul Wahid Memon, Zbigniew Chamski, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Bilha Mendelson, Ayal Zaks, Eric Courtois, François Bodin, Phil Barnard, Elton Ashton, Edwin V. Bonilla, John Thomson, Christopher K. I. Williams, and Michael F. P. O’Boyle. 2011. Milepost gcc: Machine learning enabled self-tuning compiler. International Journal of Parallel Programming 39, 3 (2011), 296–327.Google Scholar
[36] Groce Alex, Zhang Chaoqiang, Eide Eric, Chen Yang, and Regehr John. 2012. Swarm testing. In Proceedings of the 2012 International Symposium on Software Testing and Analysis. 78–88.Google ScholarDigital Library
[37] Kamilaris Andreas and Prenafeta-Boldú Francesc X.. 2018. Deep learning in agriculture: A survey. Computers and Electronics in Agriculture 147 (2018), 70–90.Google ScholarCross Ref
[38] Kennedy James and Eberhart Russell. 1995. Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks, Vol. 4. IEEE, 1942–1948.Google ScholarCross Ref
[39] Kokare Manesh, Chatterji B. N., and Biswas P. K.. 2003. Comparison of similarity metrics for texture image retrieval. In Proceedings of the TENCON 2003 Conference on Convergent Technologies for the Asia-Pacific Region, Vol. 2. 571–575.Google ScholarCross Ref
[40] Le Vu, Afshari Mehrdad, and Su Zhendong. 2014. Compiler validation via equivalence modulo inputs. ACM SIGPLAN Notices 49, 6 (2014), 216–226.Google ScholarDigital Library
[41] Le Vu, Sun Chengnian, and Su Zhendong. 2015. Finding deep compiler bugs via guided stochastic program mutation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 386–399.Google ScholarDigital Library
[42] Lidbury Christopher, Lascu Andrei, Chong Nathan, and Donaldson Alastair F.. 2015. Many-core compiler fuzzing. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. 65–76.Google ScholarDigital Library
[43] Rushi Longadge and Snehalata Dongre. 2013. Class Imbalance Problem in Data Mining Review. CoRR abs/1305.1707.Google Scholar
[44] Marcozzi Michaël, Tang Qiyi, Donaldson Alastair F., and Cadar Cristian. 2019. Compiler fuzzing: How much does it matter? Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1–29.Google ScholarDigital Library
[45] McKeeman William M.. 1998. Differential testing for software. Digital Technical Journal 10, 1 (1998), 100–107.Google Scholar
[46] Min Seonwoo, Lee Byunghan, and Yoon Sungroh. 2017. Deep learning in bioinformatics. Briefings in Bioinformatics 18, 5 (2017), 851–869.Google Scholar
[47] Myers Glenford J., Sandler Corey, and Badgett Tom. 2011. The Art of Software Testing. John Wiley & Sons.Google ScholarDigital Library
[48] Omri Safa and Sinz Carsten. 2020. Deep learning for software defect prediction: A survey. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. 209–214.Google ScholarDigital Library
[49] S. Gopal Krishna Patro and Kishore Kumar Sahu. 2015. Normalization: A Preprocessing Stage. CoRR abs/1503.06462.Google Scholar
[50] Regehr John, Chen Yang, Cuoq Pascal, Eide Eric, Ellison Chucky, and Yang Xuejun. 2012. Test-case reduction for C compiler bugs. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 335–346.Google ScholarDigital Library
[51] Rodríguez Pau, Bautista Miguel A., Gonzalez Jordi, and Escalera Sergio. 2018. Beyond one-hot encoding: Lower dimensional target embedding. Image and Vision Computing 75 (2018), 21–31.Google ScholarCross Ref
[52] Shen Qingchao, Ma Haoyang, Chen Junjie, Tian Yongqiang, Cheung Shing-Chi, and Chen Xiang. 2021. A comprehensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 968–980.Google ScholarDigital Library
[53] Shi Yuhui and Eberhart Russell C.. 1999. Empirical study of particle swarm optimization. In Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), Vol. 3. IEEE, 1945–1950.Google ScholarCross Ref
[54] Sun Chengnian, Le Vu, and Su Zhendong. 2016. Finding and analyzing compiler warning defects. In Proceedings of the 2016 IEEE/ACM 38th International Conference on Software Engineering. 203–213.Google ScholarDigital Library
[55] Sun Chengnian, Le Vu, and Su Zhendong. 2016. Finding compiler bugs via live code mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 849–863.Google ScholarDigital Library
[56] Sun Chengnian, Le Vu, Zhang Qirun, and Su Zhendong. 2016. Toward understanding compiler bugs in GCC and LLVM. In Proceedings of the 25th International Symposium on Software Testing and Analysis. 294–305.Google ScholarDigital Library
[57] Tiwari Ananta, Chen Chun, Chame Jacqueline, Hall Mary, and Hollingsworth Jeffrey K.. 2009. A scalable auto-tuning framework for compiler optimization. In Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing. 1–12.Google ScholarDigital Library
[58] Triantafyllis Spyridon, Vachharajani Manish, Vachharajani Neil, and August David I.. 2003. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization. 204–215.Google ScholarDigital Library
[59] David Waitzman, Craig Partridge, and Stephen E. Deering. 1988. Distance Vector Multicast Routing Protocol. RFC 1075, 1–24.Google Scholar
[60] Wang Zan, Yan Ming, Chen Junjie, Liu Shuang, and Zhang Dongdi. 2020. Deep learning library testing via effective model generation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 788–799.Google ScholarDigital Library
[61] Yan Ming, Chen Junjie, Zhang Xiangyu, Tan Lin, Wang Gan, and Wang Zan. 2021. Exposing numerical bugs in deep learning via gradient back-propagation. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 627–638.Google ScholarDigital Library
[62] Yang Xuejun, Chen Yang, Eide Eric, and Regehr John. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. 283–294.Google ScholarDigital Library
[63] Zaytsev Vadim. 2018. An industrial case study in compiler testing (tool demo). In Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering. 97–102.Google ScholarDigital Library
[64] Zeller Andreas. 1999. Yesterday, my program worked. today, it does not. why?. In Proceedings of the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering. 253–267.Google ScholarCross Ref
[65] Zhang Bo, Zhang Hongyu, Chen Junjie, Hao Dan, and Moscato Pablo. 2019. Automatic discovery and cleansing of numerical metamorphic relations. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution. IEEE, 235–245.Google Scholar
[66] Zhang Feng, Zheng Quan, Zou Ying, and Hassan Ahmed E.. 2016. Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of the 38th International Conference on Software Engineering. 309–320.Google ScholarDigital Library
[67] Zhang Qirun, Sun Chengnian, and Su Zhendong. 2017. Skeletal program enumeration for rigorous compiler testing. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. 347–361.Google ScholarDigital Library
[68] Zhao Yingquan, Wang Zan, Chen Junjie, Liu Mengdi, Wu Mingyuan, Zhang Yuqun, and Zhang Lingming. 2022. History-driven test program synthesis for JVM testing. In Proceedings of the 44th International Conference on Software Engineering. to appear.Google ScholarDigital Library

Index Terms

Boosting Compiler Testing via Compiler Optimization Exploration
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. Compilers

Recommendations

A Survey of Compiler Testing

Virtually any software running on a computer has been processed by a compiler or a compiler-like tool. Because compilers are such a crucial piece of infrastructure for building software, their correctness is of paramount importance. To validate and ...
Read More
Learning to accelerate compiler testing
ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings

Compilers are one of the most important software infrastructures. Compiler testing is an effective and widely-used way to assure the quality of compilers. While many compiler testing techniques have been proposed to detect compiler bugs, these ...
Read More
Enriching Compiler Testing with Real Program from Bug Report
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Researchers have proposed various approaches to generate test programs. The state-of-the-art approaches can be roughly divided into random-based and mutation-based approaches: random-based approaches generate random programs and mutation-based ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 31, Issue 4
October 2022
867 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3543992
Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 August 2022
- Online AM: 5 March 2022
- Accepted: 1 December 2021
- Revised: 1 October 2021
- Received: 1 December 2020
Published in tosem Volume 31, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Compiler testing
compiler optimization
machine learning
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 1,179
  Total Downloads
- Downloads (Last 12 months)556
- Downloads (Last 6 weeks)65
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Boosting Compiler Testing via Compiler Optimization Exploration

ACM Transactions on Software Engineering and Methodology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

A Survey of Compiler Testing

Learning to accelerate compiler testing

Enriching Compiler Testing with Real Program from Bug Report