Automation of software test data generation using genetic algorithm and reinforcement learning
Introduction
After a software is developed, it should be tested. According to a research conducted by NIST, the damage caused by software problems is very heavy (Newman et al., 2002). Therefore, the test stage is one of the most sensitive stages of software development and includes about 50% of the software development costs (Myers et al., 2011). Software testing consumes a lot of resources but does not add any new functionality to the product, so a lot of effort has been put into reducing its cost by introducing automated software testing tools. In the last decade, various methods have been introduced for automatic software testing that aimed at maximizing error detection by producing the least amount of test input data.
In automatic testing, the specific goal is to automatically generate test input data for a software under the test (SUT) so that the specified test coverage criterion is met. A test coverage criterion is a measure which indicates to what extent an SUT is tested. Coverage criteria can be defined using different information (Ammann et al., 2016). Some criteria are defined according to the space of input parameters of an SUT. Others are defined based on the control graph of the SUT. Yet others are stated considering only conditional statements. Among these criteria, in this paper, we focus on graph-based ones.
Automatic test case generation using graph-based criteria can be carried out in two ways; dynamic or static (Lonetti et al., 2018). In static methods, the SUT is not executed at all, and the required coverage must be measured using symbolic execution techniques. On the other hand, in dynamic methods, for measuring the coverage of a test set, the SUT will be executed for each generated test case in that set. Although dynamic methods are significantly time consuming, they are preferable to static methods due to the inherent problems in the latter ones; dependency on the programming language, no simple solution for handling arrays and pointers, and so on.
Dynamic methods usually utilize meta-heuristic search algorithms for generating test cases (McMinn et al., 2004). Many existing algorithms such as simulated annealing, genetic algorithm, and particle swarm optimization have been reported in literature for this purpose (Civicioglu & Besdok, 2013).
Genetic algorithm is one of the algorithms that has been widely used in the field of automation of test data generation. Classic genetic algorithms perform well in generating suitable test sets, however, they are highly time consuming. This can be partially improved by applying existing knowledge of the problem or adding a local search phase to the evolutionary cycle. A remarkable problem with the genetic algorithm is that chromosomes do not try to improve themselves; they are only wait for a mutation or a recombination to blindly improve them. In addition, the genetic algorithm does not separate different sub-sections of a chromosome, and its evaluation function considers all parts as a whole. This makes the genetic search to be very similar to a blind search.
To address such problems of the genetic algorithm in automating test data generation, in this paper, we have utilized reinforcement learning as a memetic search method augmented to the genetic algorithm. This augmented search part focuses on best chromosomes ever obtained and improves them by manipulating their genes. Q-learning has been used for guiding these manipulations.
For an even better performance, we have also introduced new recombination and mutation operators, which are better suited to the problem of automating test data generation. In case of the recombination operator, sub-sections of each parent are considered and the recombination index is selected according to the information encoded in each sub-section. Mutation is performed only if there is repeated or duplicated sub-sections within a chromosome.
To evaluate the efficiency of the proposed method, it has been used to automatically generate test data on a number of different SUTs and its results have been compared with that of many existing meta-heuristic search algorithms. Results have shown that even though the provided coverage of all methods are identical, the proposed method needs significantly lesser number of fitness evaluations.
The rest of this paper is as follows. In Section 2, the related work is reviewed. The proposed method is provided in detail in Section 3. Section 4 includes the experimental results. Finally, Section 5 concludes the paper and provides directions for future studies.
Section snippets
Related work
In a dynamic structural test, the test program must be run, so we must generate some data for the input parameters of the test program. In this method, these input data can be generated using search algorithms. So far, various algorithms have been used for this purpose, and in this section, a brief description of the work done by this algorithm in this field is given.
The genetic algorithm in the software test was able to succeed in both parameter and structural optimization (Kim et al., 2013).
Formulation of the problem
The process of test case generation using graph-based criteria needs the control flow graph (CFG) of the SUT to be generated (Ammann et al., 2016). The CFG provides a means for enumerating different execution paths, from the entry point of the SUT up to its exit. A sample SUT with its corresponding CFG is depicted in Fig. 1.
Input parameters of an SUT determine the execution path, and to cover different paths in the SUT, one has to provide different input parameters. The problem here is to
Experimental results
Experiments have been performed for several different SUTs; each has different branching, nesting, and complexity structures. Results of the proposed MAAT method has been compared with that of a number of meta-heuristic and evolutionary algorithms: basic genetic algorithm (GA), simulated annealing (SA), particle swarm optimization (PSO), ant colony optimization (ACO), bees algorithm, and tabu search (TS). Results are averaged over 50 repeated executions of each algorithm.
Conclusion
In this paper, a memetic algorithm for automatically generating test data has been presented. The aim is to cover all finite paths of the control flow graph of the software under the test. The proposed method, called MAAT, is a genetic algorithm which utilizes the reinforcement learning as its local search step. Each chromosome in MAAT is a test set, consists of a number of test cases. The reinforcement learning part accepts the best test set, found so far in the genetic algorithm, and tries to
CRediT authorship contribution statement
Mehdi Esnaashari: Conceptualization, Methodology, Validation, Writing - original draft, Writing - review & editing, Supervision. Amir Hossein Damia: Software, Investigation, Writing - original draft.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (37)
Lonetti, F. and Marchetti, E., “Emerging software testing technologies”, Advances in Computers. Vol. 108
Elsevier
(2018)- et al.
Automatic test data generation based on reduced adaptive particle swarm optimization algorithm
Neurocomputing
(2015) - et al.
PSO based test case generation for critical path using improved combined fitness function
Journal of King Saud University-Computer and Information Sciences
(2020) - et al.
2018 Sharifipour, Hossein, Mojtaba Shakeri, and Hassan Haghighi. “Structural test data generation using a memetic ant colony optimization based on evolution strategies.”
Swarm and Evolutionary Computation
(2018) - Newman et al., 2002 Newman, M. “Software errors cost us economy $59.5 billion annually,”NIST Assesses Technical Needs...
- Myers, G. J., Sandler, S., & Badgett, T. (2011). The art of software testing,” John Wiley &...
Introduction to software testing
(2016)- et al.
Automatic detection of infeasible paths in software testing
IET software
(2010) - Yang et al., 2020 Yang, S., Zhang, X., & Gong, Y. Z. (2020). Infeasible Path Detection Based on Code Pattern and...
“Search-based software test data generation: A survey”, Software testing
Verification and reliability
(2004)
Automated software test data generation
IEEE Transactions on software engineering
A conceptual comparison of the Cuckoo-search, particle swarm optimization, differential evolution and artificial bee colony algorithms
Artificial intelligence review
Optimization of automatic generated test cases for path testing using genetic algorithm
Computational Intelligence & Communication Technology (CICT), 2016 Second International Conference on
Automated test data generation for branch testing using incremental genetic algorithm
Sādhanā
A path and branch based approach to fitness computation for program test data generation using genetic algorithm
Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), 2015 International Conference on
Cited by (38)
Multi-task modeling and multifactorial optimization for path coverage problem of automated test case generation
2024, Applied Soft ComputingThe role of Reinforcement Learning in software testing
2023, Information and Software TechnologyA local dimming method based on improved multi-objective evolutionary algorithm
2022, Expert Systems with ApplicationsCitation Excerpt :To effectively solve the complex local dimming problem, the local dimming methods based on Evolutionary Algorithms (EAs) were proposed in our previous work (Zhang et al., 2018, 2019). EAs such as Genetic Algorithm(GA) (Esnaashari & Damia, 2021), Particle Swarm Optimization(PSO) (Gu et al., 2022) and Shuffled Frog Leapfrog Algorithm (SFLA) (Eusuff & Lansey, 2003) apply heuristic search strategies and have shown good performance in solving complex optimization problems (Kaur et al., 2020; Moattari & Moradi, 2020). In many real applications, plenty of variants of EA were proposed to make EA achieve higher solution quality of efficiency (Yi et al., 2021).
A binary dandelion algorithm using seeding and chaos population strategies for feature selection
2022, Applied Soft ComputingCitation Excerpt :Metaheuristic algorithm (MH) [1] is a kind of optimization technique that incorporates stochastic search and local search for excellence, which introduces higher-level strategies into the search process to guide the optimization process to perform a powerful search in the search space. MHs are proposed from various fields of nature, science, and so on, and can be divided into four main types: the first type is evolutionary algorithms, such as genetic algorithms (GA) [2], differential evolutionary algorithms (DE) [3], genetic programming (GP) [4], evolutionary strategies (ES) [5], and so on, which is inspired by the evolution of biological populations in nature; the second type is swarm intelligence algorithm, such as particle swarm optimization (PSO) [6], ant colony algorithm (ACO) [7], fireworks algorithm (FWA) [8], artificial bee colony algorithm (ABC) [9], artificial fish swarm algorithm (AFSA) [10], fruit fly optimization algorithm (FOA) [11], whale optimization algorithm(WOA) [12] etc. This category of algorithms is formed by the interaction between individuals in a self-organized, naturally or artificially formed population.
Test case generation using improved differential evolution algorithms with novel hypercube-based learning strategies
2022, Engineering Applications of Artificial IntelligenceCitation Excerpt :By contrast, structural test cases economically and efficiently check for logical defects in the source code of a program. The coverage criteria (Esnaashari and Damia, 2021; Huang et al., 2018; Wang et al., 2021) of structural testing mainly include branch coverage, decision coverage, statement coverage, condition coverage and path coverage. Among them, ATCG for path coverage (ATCG-PC) is the most critical and challenging (Liu et al., 2019; Korel, 1990).