Abstract
Genetic Improvement (GI) uses automated search to improve existing software. It can be used to improve runtime, energy consumption, fix bugs, and any other software property, provided that such property can be encoded into a fitness function. GI usually relies on testing to check whether the changes disrupt the intended functionality of the software, which makes test suites important artefacts for the overall success of GI. The objective of this work is to establish which characteristics of the test suites correlate with the effectiveness of GI. We hypothesise that different test suite properties may have different levels of correlation to the ratio between overfitting and non-overfitting patches generated by the GI algorithm. In order to test our hypothesis, we perform a set of experiments with automatically generated test suites using EvoSuite and 4 popular coverage criteria. We used these test suites as input to a GI process and collected the patches generated throughout such a process. We find that while test suite coverage has an impact on the ability of GI to produce correct patches, with branch coverage leading to least overfitting, the overfitting rate was still significant. We also compared automatically generated tests with manual, developer-written ones and found that while manual tests had lower coverage, the GI runs with manual tests led to less overfitting than in the case of automatically generated tests. Finally, we did not observe enough statistically significant correlations between the coverage metrics and overfitting ratios of patches, i.e., the coverage of test suites cannot be used as a linear predictor for the level of overfitting of the generated patches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We use the word ‘potentially’ here, as although the patch might improve upon our training and test set, it does not mean the runtime improvement will generalise to all possible usages of software. Manual check is thus necessary.
- 2.
See gin.util.TestCaseGenerator at https://github.com/justynapt/ssbse2020RENE.
- 3.
Following advice given here: https://github.com/EvoSuite/evosuite/issues/48.
References
An, G., Kim, J., Yoo, S.: Comparing line and AST granularity level for program repair using pyggi. In: Petke, J., Stolee, K.T., Langdon, W.B., Weimer, W. (eds.) Proceedings of the 4th International Genetic Improvement Workshop, GI@ICSE 2018, pp. 19–26. ACM (2018). https://doi.org/10.1145/3194810.3194814
Assiri, F.Y., Bieman, J.M.: An assessment of the quality of automated program operator repair. In: Seventh IEEE International Conference on Software Testing, Verification and Validation, ICST 2014, pp. 273–282. IEEE Computer Society (2014). https://doi.org/10.1109/ICST.2014.40
Barr, E.T., Harman, M., Jia, Y., Marginean, A., Petke, J.: Automated software transplantation. In: Young, M., Xie, T. (eds.) Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015), pp. 257–269. ACM (2015). https://doi.org/10.1145/2771783.2771796
Barr, E.T., Harman, M., McMinn, P., Shahbaz, M., Yoo, S.: The oracle problem in software testing: a survey. IEEE Trans. Software Eng. 41(5), 507–525 (2015). https://doi.org/10.1109/TSE.2014.2372785
Basios, M., Li, L., Wu, F., Kanthan, L., Barr, E.T.: Darwinian data structure selection. In: Leavens, G.T., Garcia, A., Pasareanu, C.S. (eds.) Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT 2018, pp. 118–128. ACM (2018). https://doi.org/10.1145/3236024.3236043
Brownlee, A.E.I., Petke, J., Alexander, B., Barr, E.T., Wagner, M., White, D.R.: Gin: genetic improvement research made easy. In: Auger, A., Stützle, T. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2019, pp. 985–993. ACM (2019). https://doi.org/10.1145/3321707.3321841
Bruce, B.R., Petke, J., Harman, M., Barr, E.T.: Approximate oracles and synergy in software energy search spaces. IEEE Trans. Software Eng. 45(11), 1150–1169 (2019). https://doi.org/10.1109/TSE.2018.2827066
Chekam, T.T., Papadakis, M., Le Traon, Y., Harman, M.: An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp. 597–608 (2017)
Cody-Kenny, B., Lopez, E.G., Barrett, S.: locoGP: improving performance by genetic programming Java source code. In: Langdon, W.B., Petke, J., White, D.R. (eds.) Genetic Improvement 2015 Workshop, pp. 811–818. ACM (2015). https://doi.org/10.1145/2739482.2768419
Fisher, R.A.: On the interpretation of chi-squared from contingency tables, and the calculation of P. J. R. Stat. Soc. 85(1), 87–94 (1922). https://doi.org/10.2307/2340521
Fraser, G., Arcuri, A.: Evolutionary generation of whole test suites. In: Núñez, M., Hierons, R.M., Merayo, M.G. (eds.) Proceedings of the 11th International Conference on Quality Software, QSIC 2011, pp. 31–40. IEEE Computer Society (2011). https://doi.org/10.1109/QSIC.2011.19
Langdon, W.B., Harman, M.: Optimizing existing software with genetic programming. IEEE Trans. Evol. Comput. 19(1), 118–135 (2015). https://doi.org/10.1109/TEVC.2013.2281544
Langdon, W.B., Lam, B.Y.H., Petke, J., Harman, M.: Improving CUDA DNA analysis software with genetic programming. In: Silva, S., Esparcia-Alcázar, A.I. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2015, pp. 1063–1070. ACM (2015). https://doi.org/10.1145/2739480.2754652
Le Goues, C., Nguyen, T., Forrest, S., Weimer, W.: GenProg: a generic method for automatic software repair. IEEE Trans. Software Eng. 38(1), 54–72 (2012). https://doi.org/10.1109/TSE.2011.104
Offutt, A.J., Lee, S.D.: How strong is weak mutation? In: Howden, W.E. (ed.) Proceedings of the Symposium on Testing, Analysis, and Verification, TAV 1991, Victoria, British Columbia, Canada, 8–10 October 1991, pp. 200–213. ACM (1991). https://doi.org/10.1145/120807.120826
Offutt, A.J., Lee, S.D.: An empirical evaluation of weak mutation. IEEE Trans. Software Eng. 20(5), 337–344 (1994). https://doi.org/10.1109/32.286422
Offutt, A.J., Untch, R.H.: Mutation 2000: uniting the orthogonal. In: Wong, W.E. (ed.) Mutation Testing for the New Century, pp. 34–44. Springer, Boston (2001). https://doi.org/10.1007/978-1-4757-5939-6-7
Petke, J., Haraldsson, S.O., Harman, M., White, D.R., Woodward, J.R.: Genetic improvement of software: a comprehensive survey. IEEE Trans. Evol. Comput. (2017). https://doi.org/10.1109/TEVC.2017.2693219
Petke, J., Harman, M., Langdon, W.B., Weimer, W.: Using genetic improvement and code transplants to specialise a C++ program to a problem class. In: Nicolau, M., et al. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 137–149. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44303-3_12
Smith, E.K., Barr, E.T., Le Goues, C., Brun, Y.: Is the cure worse than the disease? Overfitting in automated program repair. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pp. 532–543 (2015). https://doi.org/10.1145/2786805.2786825
Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904). https://doi.org/10.2307/1422689
White, D.R.: GI in no time. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2017), pp. 1549–1550. ACM (2017). https://doi.org/10.1145/3067695.3082515
Wu, F., Weimer, W., Harman, M., Jia, Y., Krinke, J.: Deep parameter optimisation. In: Silva, S., Esparcia-Alcázar, A.I. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2015, pp. 1375–1382. ACM (2015). https://doi.org/10.1145/2739480.2754648
Yi, J., Tan, S.H., Mechtaev, S., Böhme, M., Roychoudhury, A.: A correlation study between automated program repair and test-suite metrics. Empirical Softw. Eng. 23(5), 2948–2979 (2017). https://doi.org/10.1007/s10664-017-9552-y
Acknowlegements
This work was funded by the EPSRC grant EP/P023991/1 and the ERC grant 741278 Evolving Program Improvement Collaborators (EPIC). The authors would also like to thank Prof. Gordon Fraser from University of Passau for consultation on the output diversity metric.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lim, M., Guizzo, G., Petke, J. (2020). Impact of Test Suite Coverage on Overfitting in Genetic Improvement of Software. In: Aleti, A., Panichella, A. (eds) Search-Based Software Engineering. SSBSE 2020. Lecture Notes in Computer Science(), vol 12420. Springer, Cham. https://doi.org/10.1007/978-3-030-59762-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-59762-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59761-0
Online ISBN: 978-3-030-59762-7
eBook Packages: Computer ScienceComputer Science (R0)