Skip to main content
Log in

A Learning Classifier System for Automated Test Case Prioritization and Selection

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

A Correction to this article was published on 01 September 2022

This article has been updated

Abstract

For many everyday devices, each newly released model contains more functionality. This technological advance relies heavily on software solutions of increasing complexity which results in novel challenges in the domain of software testing. Most prominently, while an ever higher number of test cases is required to meet quality demands, performing a large number of test cases frequently amounts to a significant increase in development time and costs. In order to overcome this issue, agile development methods such as continuous integration usually only execute a subset of important test cases to meet both time and testing demands. One way of selecting such a subset of important test cases is to assign priorities to all the available test cases and then greedily pick the ones with the highest priority until the available time budget is spent. For this, in a previous work, we presented a new machine learning approach based on a learning classifier system (LCS). In the present article, we summarize our earlier findings (which are spread over several publications) and provide insights about the most recent adaptations we made to the method. We also provide an extended experimental analysis that outlines more in detail how it compares to a state of the art artificial neural network. It can be observed that the performance of our LCS-based approach is often much higher than the one of the network. Since our work has already been deployed by a major company, we give an overview of the resulting product as well as several of its in-production quality attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

(adapted from [29])

Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of Data and Material

The used data sets may be found here: https://bitbucket.org/HelgeS/atcs-data/src/master/.

Change history

  • 29 August 2022

    Figures were not placed nearer to its citation. Now, the placment of the figures have been corrected.

  • 01 September 2022

    A Correction to this paper has been published: https://doi.org/10.1007/s42979-022-01352-1

Notes

  1. Note that, despite its name, NAPFD actually measures test case failures and not faults. Faults are system errors caused by bugs etc. and each fault may result in multiple test cases failing.

  2. In those publications we report on the AIS approach performing equal to or better than the NSGA-II of Lachmann et al. [14]. The NSGA-II used by Arrieta et al. [2] differs from the one used by Lachmann et al. only in the employed crossover operator.

  3. Spieker et al. [32] originally formulated the use case as a reinforcement learning one. We intend to provide a more high-level machine learning view.

  4. Of course they are adapted to numbers. Mutating a number translates to drawing a new random number. Crossover consists of first performing an arithmetic crossover (for two numbers xy, this corresponds to \(\zeta x + (1 - \zeta ) y\) and \(\zeta y + (1 - \zeta ) x\)) and then the same two-point crossover as used for the ternary subconditions.

  5. Note that the normalization by \(\gamma\) and \(\Gamma\) is necessary to ensure that the result is indeed a probability distribution.

  6. Our code is available here: https://github.com/LagLukas/transfer_learning.

  7. The data sets can be downloaded here: https://bitbucket.org/HelgeS/atcs-data/src/master/.

  8. Spieker et al.’s implementation of their NN-based approach can be found here https://bitbucket.org/HelgeS/retecs.

  9. We take the average over three succeeding values. We consider disjoint CI cycle sets with indexes \(\{3k, 3k+1, 3k+2\}\).

  10. We used one-sided Wilcoxon tests to compare each combination of one of the three ER methods with one of the failure count or time ranked value functions with each combination of the three ER methods with the test case failure value function (a total of \((3 \times 2) \times (3 \times 1) = 18\) comparisons) and for each checked the null hypothesis of whether the first performs worse than the second. Since all the p-values are less than \(10^{-21}\), we conclude that the failure count and time-ranked value functions yield significantly better results.

  11. For the failure count value we can observe similar results; the corresponding plots can be found in Appendix A.

  12. We examined null hypotheses of the form: Our transfer learning approach leads to worse results than the raw XCSF-ER on data set x with value function y.

References

  1. Anand S, Burke EK, Chen TY, Clark J, Cohen MB, Grieskamp W, Harman M, Harrold MJ, McMinn P, Bertolino A, Li JJ, Zhu H. An orchestrated survey of methodologies for automated software test case generation. J Syst Softw. 2013;86(8):1978–2001.

    Article  Google Scholar 

  2. Arrieta A, Wang S, Arruabarrena A, Markiegi U, Sagardui G, Etxeberria L. Multi-objective black-box test case selection for cost-effectively testing simulation models. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’18, 2018. New York: Association for Computing Machinery, p. 1411–8.

  3. Butz MV, Wilson SW. An algorithmic description of XCS. In: Lanzi PL, Stolzmann W, Wilson SW, editors. Advances in learning classifier systems. Berlin: Springer; 2001. p. 253–72.

    Chapter  Google Scholar 

  4. Dijkstra EW. Chapter I: notes on structured programming. GBR: Academic Press Ltd.; 1972. p. 1–82.

    Google Scholar 

  5. Fedus W, Ramachandran P, Agarwal R, Bengio Y, Larochelle H, Rowland M, Dabney W. Revisiting fundamentals of experience replay. CoRR. http://arxiv.org/abs/2007.06700, 2020.

  6. International Organization for Standardization. ISO/IEC 25010. https://iso25000.com/index.php/en/iso-25000-standards/iso-25010, 2014. Accessed 15 Jun 2021.

  7. Fowler M. Continuous integration. https://www.martinfowler.com/articles/continuousIntegration.html, 2006. Accessed 21 Feb 2021.

  8. Fraser G, Wotawa F. Redundancy based test-suite reduction. In: Dwyer MB, Lopes A, editors. Fundamental approaches to software engineering. Berlin: Springer; 2007. p. 291–305.

    Chapter  Google Scholar 

  9. Hsu H-Y, Orso A. Mints: a general framework and tool for supporting test-suite minimization. In: 2009 IEEE 31st International Conference on Software Engineering, 2009. p. 419–429.

  10. Huang R, Sun W, Xu Y, Chen H, Towey D, Xia X. A survey on adaptive random testing. IEEE Trans Softw Eng. 2021;47(10):2052–83.

    Article  Google Scholar 

  11. Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9(3):90–5.

    Article  Google Scholar 

  12. Kirdey S, Cureton K, Rick S, Ramanathan S, Mrinal S. Lerner—using RL agents for test case scheduling. https://netflixtechblog.com/lerner-using-rl-agents-for-test-case-scheduling-3e0686211198, 2019. Accessed 21 Feb 2021

  13. Kruskal WH, Allen WW. Use of ranks in one-criterion variance analysis. J Am Stat Assoc. 1952;47(260):583–621.

    Article  Google Scholar 

  14. Lachmann R, Felderer M, Nieke M, Schulze S, Seidl C, Schaefer I. Multi-objective black-box test case selection for system testing. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’17. 2017. New York: Association for Computing Machinery, p. 1311–8.

  15. Lin L-J. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Pittsburgh, PA, USA, 1992. UMI Order No. GAX93-22750.

  16. Lukasczyk S, Kroiß F, Fraser G. Automated unit test generation for python. CoRR, abs/2007.14049, 2020.

  17. Müller-Schloer C, Tomforde S. Organic computing—technical systems for survival in the real world. In: Autonomic Systems, 2017.

  18. Papadakis M, Kintis M, Zhang J, Jia Y, Traon TL, Harman M. Chapter six—mutation testing advances: an analysis and survey. volume 112 of Advances in Computers, p. 275–378. Elsevier, 2019.

  19. Pätzel D, Heider M, Wagner ARM. An overview of LCS research from 2020 to 2021. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’21. New York: Association for Computing Machinery, 2021, pp. 1648–56.

  20. Pätzel D, Stein A, Nakata M. An overview of lcs research from iwlcs 2019–2020. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, GECCO ’20. New York: Association for Computing Machinery, 2020, pp. 1782–8.

  21. Prothmann H, Tomforde S, Branke J, Hähner J, Müller-Schloer C, Schmeck H. Organic traffic control; 2011.

  22. Qu X, Cohen MB, Woolf KM. Combinatorial interaction regression testing: a study of test case generation and prioritization. In: 2007 IEEE International Conference on Software Maintenance. 2007, p. 255–64.

  23. Richards M, Ford N. Fundamentals of software architecture: an engineering approach. London: O’Reilly Media Incorporated; 2019.

    Google Scholar 

  24. Rosenbauer L, Stein A, Hähner J. An artificial immune system for adaptive test selection. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020; p. 2940–7.

  25. Rosenbauer L, Pätzel D, Stein A, Hähner J. Transfer learning for automated test case prioritization using xcsf. In: EvoApplications: 24th International Conference on the Applications of Evolutionary Computation as part of evostar 2021, April 2021, Seville, Spain, 2021.

  26. Rosenbauer L, Pätzel D, Stein A, Hähner J. An organic computing system for automated testing. In: Bauer L, Pionteck T, editors. Architecture of computing systems—ARCS 2021. Cham: Springer International Publishing; 2021.

    Google Scholar 

  27. Rosenbauer L, Stein A, Hähner J. An artificial immune system for black box test case selection. In: EvoCop: 21st European Conference on Evolutionary Computation in Combinatorial Optimisation as part of evostar 2021, April 2021, Seville, Spain, 2021.

  28. Rosenbauer L, Stein A, Maier R, Pätzel D, Hähner J. Xcs as a reinforcement learning approach to automatic test case prioritization. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, GECCO’20, New York: Association for Computing Machinery, 2020, p. 1798–806.

  29. Rosenbauer L, Stein A, Pätzel D, Hähner J. Xcsf for automatic test case prioritization. In: Merelo JJ, Garibaldi J, Wagner C, Bäck T, Madani K, Warwick K (eds) Proceedings of the 12th International Joint Conference on Computational Intelligence (ECTA), November 2–4, 2020, 2020.

  30. Rosenbauer L, Stein A, Pätzel D, Hähner J. Xcsf with experience replay for automatic test case prioritization. In: Abbass H, Coello Coello CA, Singh HK (eds) 2020 IEEE Symposium Series on Computational Intelligence (SSCI), virtual event, Canberra, Australia, 1–4 December 2020, 2020.

  31. Smart JF. Jenkins: the Definitive Guide. Beijing: O’Reilly; 2011.

    Google Scholar 

  32. Spieker H, Gotlieb A, Marijan D, Mossige M. Reinforcement learning for automatic test case prioritization and selection in continuous integration. CoRR. 1811.04122, 2018.

  33. Stein A, Maier R, Rosenbauer L, Hähner J. Xcs classifier system with experience replay. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO’20. New York: Association for Computing Machinery, 2020, p. 404–13.

  34. Stein A, Menssen S, Hähner J. What about interpolation? a radial basis function approach to classifier prediction modeling in xcsf. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’18. New York: Association for Computing Machinery, 2018, p. 537–44.

  35. Stein A, Rudolph S, Tomforde S, Hähner J. Self-learning smart cameras—harnessing the generalisation capability of XCS. In: Proceedings of the 9th International Joint Conference on Computational Intelligence, Funchal, Portugal, 2017.

  36. Ståhl D, Bosch J. Modeling continuous integration practice differences in industry software development. J Syst Softw. 2014;87:48–59.

    Article  Google Scholar 

  37. Urbanowicz RJ, Browne WN. Introduction to Learning Classifier Systems. Springer Publishing Company, Incorporated, 1st edn., 2017.

  38. Wilson S. Classifiers that approximate functions. Nat Comput. 2002;1:1–2.

    Article  MathSciNet  Google Scholar 

  39. Wilson SW. Classifier fitness based on accuracy. Evol Comput. 1995;3(2):149–75.

    Article  Google Scholar 

  40. Yoo S, Harman M. Regression testing minimization, selection and prioritization: a survey. Softw Test Verif Reliab. 2012;22(2):67–120.

    Article  Google Scholar 

  41. Yu Y, Jones JA, Harrold MJ. An empirical study of the effects of test-suite reduction on fault localization. In: Proceedings of the 30th International Conference on Software Engineering, ICSE ’08. New York: Association for Computing Machinery, 2008, p. 201–10.

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Not applicable.

Corresponding author

Correspondence to Lukas Rosenbauer.

Ethics declarations

Conflict of interest

Not applicable.

Code Availability

The source code for the ML approaches etc. can be retrieved from here: https://github.com/LagLukas/transfer_learning.

Consent to Participate

Not applicable (no medical study).

Consent to Publish

Not applicable (no medical study).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised due to incorrect given name and family name of the authors in all references. Now, they have been corrected.

This article is part of the topical collection “Computational Intelligence” guest edited by Kurosh Madani, Kevin Warwick, Juan Julian Merelo, Thomas Bäck and Anna Kononova.

Appendices

A Transfer Learning for Failure Count

Figure 9 displays the transfer learning experiments if the failure count value instead of the time ranked value is employed.

Fig. 9
figure 9

Overview of the effects of transfer learning on XCSF-ER (both temporal and in terms of the distribution). Note that XCSF-TL denotes XCSF-ER with transfer learning and the temporal plots are based on [25]. Here we employ the failure count value

B Random Testing Comparison using Failure Count

See Fig. 10 and Table 6.

Fig. 10
figure 10

Boxplots displaying the performance of the used LCSs and random selection

Table 6 p values for XCSF and XCSF-ER with a pure random selection. For the methods we consider the failure count value

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rosenbauer, L., Pätzel, D., Stein, A. et al. A Learning Classifier System for Automated Test Case Prioritization and Selection. SN COMPUT. SCI. 3, 373 (2022). https://doi.org/10.1007/s42979-022-01255-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01255-1

Keywords

Navigation