Impact of Test Suite Coverage on Overfitting in Genetic Improvement of Software

Lim, Mingyi; Guizzo, Giovani; Petke, Justyna

doi:10.1007/978-3-030-59762-7_14

Mingyi Lim¹⁰,
Giovani Guizzo¹⁰ &
Justyna Petke¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12420))

Included in the following conference series:

International Symposium on Search Based Software Engineering

905 Accesses
1 Altmetric

Abstract

Genetic Improvement (GI) uses automated search to improve existing software. It can be used to improve runtime, energy consumption, fix bugs, and any other software property, provided that such property can be encoded into a fitness function. GI usually relies on testing to check whether the changes disrupt the intended functionality of the software, which makes test suites important artefacts for the overall success of GI. The objective of this work is to establish which characteristics of the test suites correlate with the effectiveness of GI. We hypothesise that different test suite properties may have different levels of correlation to the ratio between overfitting and non-overfitting patches generated by the GI algorithm. In order to test our hypothesis, we perform a set of experiments with automatically generated test suites using EvoSuite and 4 popular coverage criteria. We used these test suites as input to a GI process and collected the patches generated throughout such a process. We find that while test suite coverage has an impact on the ability of GI to produce correct patches, with branch coverage leading to least overfitting, the overfitting rate was still significant. We also compared automatically generated tests with manual, developer-written ones and found that while manual tests had lower coverage, the GI runs with manual tests led to less overfitting than in the case of automatically generated tests. Finally, we did not observe enough statistically significant correlations between the coverage metrics and overfitting ratios of patches, i.e., the coverage of test suites cannot be used as a linear predictor for the level of overfitting of the generated patches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analysing the fitness landscape of search-based software testing problems

Article 23 March 2016

Evolution or revolution: the critical need in genetic algorithm based testing

Article 27 August 2016

Using Augmented Genetic Algorithm for Search-Based Software Testing

Notes

1.
We use the word ‘potentially’ here, as although the patch might improve upon our training and test set, it does not mean the runtime improvement will generalise to all possible usages of software. Manual check is thus necessary.
2.
See gin.util.TestCaseGenerator at https://github.com/justynapt/ssbse2020RENE.
3.
Following advice given here: https://github.com/EvoSuite/evosuite/issues/48.

References

An, G., Kim, J., Yoo, S.: Comparing line and AST granularity level for program repair using pyggi. In: Petke, J., Stolee, K.T., Langdon, W.B., Weimer, W. (eds.) Proceedings of the 4th International Genetic Improvement Workshop, GI@ICSE 2018, pp. 19–26. ACM (2018). https://doi.org/10.1145/3194810.3194814
Assiri, F.Y., Bieman, J.M.: An assessment of the quality of automated program operator repair. In: Seventh IEEE International Conference on Software Testing, Verification and Validation, ICST 2014, pp. 273–282. IEEE Computer Society (2014). https://doi.org/10.1109/ICST.2014.40
Barr, E.T., Harman, M., Jia, Y., Marginean, A., Petke, J.: Automated software transplantation. In: Young, M., Xie, T. (eds.) Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015), pp. 257–269. ACM (2015). https://doi.org/10.1145/2771783.2771796
Barr, E.T., Harman, M., McMinn, P., Shahbaz, M., Yoo, S.: The oracle problem in software testing: a survey. IEEE Trans. Software Eng. 41(5), 507–525 (2015). https://doi.org/10.1109/TSE.2014.2372785
Article Google Scholar
Basios, M., Li, L., Wu, F., Kanthan, L., Barr, E.T.: Darwinian data structure selection. In: Leavens, G.T., Garcia, A., Pasareanu, C.S. (eds.) Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT 2018, pp. 118–128. ACM (2018). https://doi.org/10.1145/3236024.3236043
Brownlee, A.E.I., Petke, J., Alexander, B., Barr, E.T., Wagner, M., White, D.R.: Gin: genetic improvement research made easy. In: Auger, A., Stützle, T. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2019, pp. 985–993. ACM (2019). https://doi.org/10.1145/3321707.3321841
Bruce, B.R., Petke, J., Harman, M., Barr, E.T.: Approximate oracles and synergy in software energy search spaces. IEEE Trans. Software Eng. 45(11), 1150–1169 (2019). https://doi.org/10.1109/TSE.2018.2827066
Article Google Scholar
Chekam, T.T., Papadakis, M., Le Traon, Y., Harman, M.: An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp. 597–608 (2017)
Google Scholar
Cody-Kenny, B., Lopez, E.G., Barrett, S.: locoGP: improving performance by genetic programming Java source code. In: Langdon, W.B., Petke, J., White, D.R. (eds.) Genetic Improvement 2015 Workshop, pp. 811–818. ACM (2015). https://doi.org/10.1145/2739482.2768419
Fisher, R.A.: On the interpretation of chi-squared from contingency tables, and the calculation of P. J. R. Stat. Soc. 85(1), 87–94 (1922). https://doi.org/10.2307/2340521
Article Google Scholar
Fraser, G., Arcuri, A.: Evolutionary generation of whole test suites. In: Núñez, M., Hierons, R.M., Merayo, M.G. (eds.) Proceedings of the 11th International Conference on Quality Software, QSIC 2011, pp. 31–40. IEEE Computer Society (2011). https://doi.org/10.1109/QSIC.2011.19
Langdon, W.B., Harman, M.: Optimizing existing software with genetic programming. IEEE Trans. Evol. Comput. 19(1), 118–135 (2015). https://doi.org/10.1109/TEVC.2013.2281544
Article Google Scholar
Langdon, W.B., Lam, B.Y.H., Petke, J., Harman, M.: Improving CUDA DNA analysis software with genetic programming. In: Silva, S., Esparcia-Alcázar, A.I. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2015, pp. 1063–1070. ACM (2015). https://doi.org/10.1145/2739480.2754652
Le Goues, C., Nguyen, T., Forrest, S., Weimer, W.: GenProg: a generic method for automatic software repair. IEEE Trans. Software Eng. 38(1), 54–72 (2012). https://doi.org/10.1109/TSE.2011.104
Article Google Scholar
Offutt, A.J., Lee, S.D.: How strong is weak mutation? In: Howden, W.E. (ed.) Proceedings of the Symposium on Testing, Analysis, and Verification, TAV 1991, Victoria, British Columbia, Canada, 8–10 October 1991, pp. 200–213. ACM (1991). https://doi.org/10.1145/120807.120826
Offutt, A.J., Lee, S.D.: An empirical evaluation of weak mutation. IEEE Trans. Software Eng. 20(5), 337–344 (1994). https://doi.org/10.1109/32.286422
Article Google Scholar
Offutt, A.J., Untch, R.H.: Mutation 2000: uniting the orthogonal. In: Wong, W.E. (ed.) Mutation Testing for the New Century, pp. 34–44. Springer, Boston (2001). https://doi.org/10.1007/978-1-4757-5939-6-7
Chapter Google Scholar
Petke, J., Haraldsson, S.O., Harman, M., White, D.R., Woodward, J.R.: Genetic improvement of software: a comprehensive survey. IEEE Trans. Evol. Comput. (2017). https://doi.org/10.1109/TEVC.2017.2693219
Petke, J., Harman, M., Langdon, W.B., Weimer, W.: Using genetic improvement and code transplants to specialise a C++ program to a problem class. In: Nicolau, M., et al. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 137–149. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44303-3_12
Chapter Google Scholar
Smith, E.K., Barr, E.T., Le Goues, C., Brun, Y.: Is the cure worse than the disease? Overfitting in automated program repair. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pp. 532–543 (2015). https://doi.org/10.1145/2786805.2786825
Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904). https://doi.org/10.2307/1422689
Article Google Scholar
White, D.R.: GI in no time. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2017), pp. 1549–1550. ACM (2017). https://doi.org/10.1145/3067695.3082515
Wu, F., Weimer, W., Harman, M., Jia, Y., Krinke, J.: Deep parameter optimisation. In: Silva, S., Esparcia-Alcázar, A.I. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2015, pp. 1375–1382. ACM (2015). https://doi.org/10.1145/2739480.2754648
Yi, J., Tan, S.H., Mechtaev, S., Böhme, M., Roychoudhury, A.: A correlation study between automated program repair and test-suite metrics. Empirical Softw. Eng. 23(5), 2948–2979 (2017). https://doi.org/10.1007/s10664-017-9552-y
Article Google Scholar

Download references

Acknowlegements

This work was funded by the EPSRC grant EP/P023991/1 and the ERC grant 741278 Evolving Program Improvement Collaborators (EPIC). The authors would also like to thank Prof. Gordon Fraser from University of Passau for consultation on the output diversity metric.

Author information

Authors and Affiliations

Department of Computer Science, University College London, London, UK
Mingyi Lim, Giovani Guizzo & Justyna Petke

Authors

Mingyi Lim
View author publications
You can also search for this author in PubMed Google Scholar
Giovani Guizzo
View author publications
You can also search for this author in PubMed Google Scholar
Justyna Petke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Justyna Petke .

Editor information

Editors and Affiliations

Monash University, Melbourne, VIC, Australia
Aldeida Aleti
Delft University of Technology, Delft, The Netherlands
Annibale Panichella

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lim, M., Guizzo, G., Petke, J. (2020). Impact of Test Suite Coverage on Overfitting in Genetic Improvement of Software. In: Aleti, A., Panichella, A. (eds) Search-Based Software Engineering. SSBSE 2020. Lecture Notes in Computer Science(), vol 12420. Springer, Cham. https://doi.org/10.1007/978-3-030-59762-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-59762-7_14
Published: 30 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59761-0
Online ISBN: 978-3-030-59762-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Impact of Test Suite Coverage on Overfitting in Genetic Improvement of Software