Automatic test data generation for path testing using GAs

doi:10.1016/S0020-0255(00)00093-1

Information Sciences

Volume 131, Issues 1–4, January 2001, Pages 47-64

https://doi.org/10.1016/S0020-0255(00)00093-1 Get rights and content

Abstract

Genetic algorithms (GAs) are inspired by Darwin's the survival of the fittest theory. This paper discusses a genetic algorithm that can automatically generate test cases to test a selected path. This algorithm takes a selected path as a target and executes sequences of operators iteratively for test cases to evolve. The evolved test case will lead the program execution to achieve the target path. To determine which test cases should survive to produce the next generation of fitter test cases, a metric named normalized extended Hamming distance (NEHD, which is used to determine whether the final test case is found) is developed. Based on NEHD, a fitness function named SIMILARITY is defined to determine which test cases should survive if the final test case has not been found. Even when there are loops in the target path, SIMILARITY can help the algorithm to lead the execution to flow along the target path.

Introduction

The basic concepts of genetic algorithms (GAs) were developed by Holland [7], [8]. To date, genetic algorithms are well developed and widely used in the area of artificial intelligence and have been successfully applied in many fields of optimization [17], [18], [19], [20], [21].

It is understood that all occurrences of pre-selected features in a tested program should be executed (e.g. all-branch testing requires that each branch in a program should be tested at least once). It is also understood that manually generating a large number of test cases to fulfill the testing criteria is too great an effort. As a result, certain degree of automation is necessary. As Ould [14] suggested, the automation of test case generation is the most important aspect of automatic testing. Automatic test case generation can keep testers from bias when preparing test cases.

Many automatic test case generators have been developed and many classical searching methods have been used to derive test cases [1], [2], [3], [4], [9]. However, these techniques work only for continuous functions (while most program domains are of discontinuous spaces). This is where GAs step in. GAs include a class of adaptive searching techniques which are able to search a discontinous space.

Genetic algorithms have been used to automatically find a program's longest or shortest execution times. In their paper about testing real-time systems using genetic algorithms [16], Wegener et al. investigated the effectiveness of GAs to validate the temporal correctness of real-time systems by establishing the longest and the shortest execution times. The authors declared that an appropriate fitness function for the genetic algorithms is found, and the fitness function is to measure the execution time in processor cycles. Their experiments using GAs on a number of programs have successfully identified new longer and shorter execution times than those that had been found using random testing or systematic testing. However, they did not show the fitness function in their work.

Genetic algorithms have also been used to search program domains for suitable test cases to satisfy the all-branch testing [6], [10], [11]. In their papers about automatic structural testing using genetic algorithms, Jones et al. showed that appropriate fitness functions are derived automatically for each branch predicate. All branches were covered with two orders of magnitude fewer test cases than random testing. The adequacy of the test cases is improved by the genetic algorithm's ability to generate test cases, which are at or close to the branch predicates.

Consider a predicate such as

If A=B then ⋯

The fitness function for this predicate is based on the Hamming distance $^{∗}$ between A and B. The authors declare that the reciprocal of Hamming distance leads to a suitable fitness function.

$^{∗}$ The Hamming distance is a count based on the number of different bits in the bit patterns for A and B (i.e. A⊕B). Hamming distance between two bit patterns of equal length is defined as the number of different bits in the bit patterns. For example, the distance between 000 and 111 is three, while the distance between 010 and 011 is one.

The results of the papers above confirm that genetic algorithms are more efficient and effective than random testing in all-branch testing and finding longest execution time. In this paper, we developed a new metric (which is a fitness function) to determine the distance between the exercised path and the target path. The genetic algorithm with the metric is used to generate test cases for testing the target path. Such path-testing algorithm is designed to:

1.
select a target path from the program under test,
2.
sample test cases from the whole domain of the program,
3.
execute the program with the test cases to exercise paths, and calculate the fitness between the target path and each exercised path, and then
4.
evolve into new generation of fitter test cases,
5.
repeat from the step 3 to the step 4 until the fittest test case is found.

Section snippets

Genetic algorithms

Genetic algorithms begin with a set of initial individuals as the first generation, which are sampled at random from the problem domain. The algorithms are developed to perform a series of operations that transform the present generation into a new, fitter generation.

Each individual in each generation is evaluated with a fitness function. Based on the evaluation, the evolution of the individuals may approach the optimal solution.

The most common operations of a genetic algorithm are designed to

Apply genetic algorithms to path testing

Path testing is one kind of software testing methodologies which searches the program domain for suitable test cases such that after executing the program with the test cases, a predefined path can be reached. Since a program may contain an infinite number of paths, it is only practical to select a specific subset path to perform path testing. Genetic algorithms could be applied to path testing if the target paths are clearly defined and an appropriate fitness function related to this goal is

Conclusion

In this paper, the genetic algorithms are used to automatically generate test cases for path testing. The greatest merit of using the genetic algorithm in program testing is its simplicity. Each iteration of the genetic algorithms generates a generation of individuals. In practice, the computation time cannot be infinite, so that the iterations in the algorithm should be limited. Within the limited generations, solutions derived by genetic algorithms may be trapped around a local optimum and,

References (21)

J.F. Benders
Partitioning procedures for solving mixed variables programming problems
Numer. Math.
(1962)
A. Bertolini
An overview of automated software testing
J. System Software
(1991)
C.J. Burgess
The automated generation of test cases for compilers
J. Software Test. Verif. Reliab.
(1994)
R. Ferguson, B. Korel, The chaining approach for software test data generation, ACM Trans. Software Engrg. and...
D.E. Goldberg
Genetic Algorithms in Search, Optimization and Machine Learning
(1989)
B.F. Jone, D.E., H.-H. Sthamer, A strategy for using genetic algorithms to automate branch and fault-based tesitng, The...
J.H. Holland, Genetic algorithms and the optimal allocation of trials, SIAM J. Comput. (1973)...
J.H. Holland
Adaptation in Nature and Artificial Systems
(1975)
D.C. Ince
The automatic generation of test data
Computer J.
(1987)
B.F. Jones, H.-H. Sthamer, X. Yang, D.E. Eyres, The automatic generation of software test data sets using adaptive...

There are more references available in the full text version of this article.

Cited by (107)

Multi-task modeling and multifactorial optimization for path coverage problem of automated test case generation
2024, Applied Soft Computing
Recent research in automated test case generation (ATCG) focuses on multi-objective optimization using functions based on path structure (F-PS) to solve the path coverage (PC) problem. Despite the similarity among F-PSs, the existing multi-objective optimization models fail to consider using the similarity to effectively promote optimization among multiple objectives. Inspired by the similarity and multitask optimization, this paper first establishes a multitasking path coverage (MtPC) model with two different F-PSs as its tasks. A multifactorial optimization framework for solving MtPC model (MfO-PC) is then proposed to optimize the tasks by assortative mating and to cooperatively generate desired test cases by automatic assignment strategy. Three multifactorial optimization algorithms based on the framework are then designed and tested on twelve benchmark programs. Experimental results show that the effectiveness of the proposed model and the designed algorithms based on MfO-PC framework achieve the highest path coverage with fewer test cases and less running time than some compared state-of-the-art algorithms.
Backtracking search optimization algorithm with dual scatter search strategy for automated test case generation
2023, Journal of King Saud University - Computer and Information Sciences
It is a challenge to design an effective algorithm utilizing problem features in automated test case generation for path coverage (ATCG-PC). A feature of ATCG-PC “similar paths are usually executed by similar test cases” was touched by a few scholars and can be further exploited to design more effective algorithms. Inspired by this feature, this paper proposes a two-stage local search strategy, denoted dual scatter search (DS) strategy, which concatenates two improved scatter search strategies with different search behaviors. The first stage aims to fully exploit the discovered test cases to search for desired test cases, and the latter stage aims to mine the unexploited areas of the first stage via using less computational overhead. Then, a backtracking search optimization algorithm with dual scatter search strategy (BSA-DS) is proposed, which incorporates DS strategy into the backtracking search optimization algorithm (BSA) with strong exploration capability. BSA is first introduced into the field of ATCG-PC. The performance of BSA-DS and some state-of-the-art algorithms is tested on twelve popular benchmark programs. Experimental studies demonstrate that BSA-DS achieves the highest path coverage with the fewest test cases and running time on at least eight out of the twelve programs.
Binary searching iterative algorithm for generating test cases to cover paths
2021, Applied Soft Computing
Similar paths are usually covered by similar test cases, which is one of the characteristics of automated test case generation for path coverage. Based on this characteristic, this paper proposes a novel search-based algorithm for generating test cases to satisfy path coverage criterion, called binary searching iterative algorithm. The proposed algorithm first selects an uncovered path as a target path, which is most similar to the path covered by a discovered test case. Then it performs a binary search in both the left and right regions of each element of the discovered test case under the guidance of a fitness function for the target path. Binary searching iterative algorithm can quickly find undiscovered test case covering the target path because of making full use of the characteristic of automated test case generation for path coverage. Experimental studies on six fog computing benchmark programs and six natural language processing benchmark programs show that the proposed algorithm can achieve the highest path coverage for all the twelve benchmark programs, and the average number of test cases obtained by the proposed algorithm is significantly less than those obtained by a number of state-of-the-art algorithms for eleven out of the twelve benchmark programs. Moreover, binary searching iterative algorithm is more appropriate for ALBD-based fitness function than BD-based fitness function.
Automated test case generation for path coverage by using grey prediction evolution algorithm with improved scatter search strategy
2021, Engineering Applications of Artificial Intelligence
Automated test case generation for path coverage (ATCG-PC), as an important task in software testing, aims to achieve the highest path coverage of a tested program by using as little computational overhead as possible. In ATCG-PC, “similar paths are usually executed by similar test cases” is a problem-specific knowledge which was touched by a handful of researchers but still underutilized. Inspired by the problem-specific knowledge, this paper designs a local search strategy by improving a scatter search strategy, and then proposes a grey prediction evolution algorithm with the improved scatter search strategy for ATCG-PC. Here, the improved scatter search strategy could obtain two feasible test cases by exploiting a dimension of a test case covering a certain path. The proposed algorithm is constructed by importing the improved scatter search strategy to the end of the reproduction operation of the grey prediction evolution algorithm holding strong exploration ability. Grey prediction evolution algorithm is first applied to solve ATCG-PC. The performance of the proposed algorithm is evaluated on six fog computing benchmark programs and six natural language processing benchmark programs. The experimental results demonstrate that the proposed algorithm can achieve the highest path coverage with the fewer test cases and running time than some state-of-the-art algorithms.
Automated test case generation based on differential evolution with node branch archive
2021, Computers and Industrial Engineering
Automatic test case generation (ATCG) is the active research topic in software testing engineering, which can greatly reduce the cost of software testing. In automated test case generation for path coverage (ATCG-PC) problem, since the relationship between test cases and paths is unknown, there are many redundant test cases in the test case set to meet the path coverage criteria. In many previous studies on search-based algorithms for ATCG-PC, researchers have focused on improving the search-based algorithms itself or designing a more suitable fitness function according to the coverage criteria. However, the relationship between test cases and paths can help search-based algorithms cover more paths. The values of some specific test case dimensions change, and offspring individuals may cover different paths. Inspired by this, we proposed a node branch archive strategy, which can record the relationship between node branch direction and the value of test case variables, and cover more paths through this driven search-based algorithms. The experimental results show that compared with other state-of-the-art algorithms, the differential evolution with node branch archive (NBAr-DE) can significantly reduce the number of redundant test cases.
PSO based test case generation for critical path using improved combined fitness function
2020, Journal of King Saud University - Computer and Information Sciences
Test case generation is a multi objective problem as the goal is to achieve multiple targets. In existing work, emphasis is given to generate test cases to achieve maximum path coverage. For quality testing, coverage of critical path is more important than percentage of code coverage. The objective of this paper is to generate test cases to achieve maximum path coverage with a challenge of covering a critical path, within the available test resources. At the time of automatic test case generation, a path is critical if the probability of covering the path is low. Search based techniques use metaheuristic algorithms for automated test case generator. Fitness function plays an important role in searching techniques. We propose a fitness function, Improved Combined Fitness (ICF) function, using Adaptive Particle Swarm Optimization (APSO), to generate test cases automatically based on path coverage criteria. We have conducted experiments on three well-known case studies and observed that though both Particle Swarm Optimization (PSO) and APSO with the existing fitness functions, branch distance function and branch distance combined with approximation level, give maximum path coverage, sometimes fail to achieve critical path. Our proposed ICF function applied on APSO gives better result in terms of number of path coverage.

View all citing articles on Scopus

View full text

Automatic test data generation for path testing using GAs

Abstract

Introduction

Section snippets

Genetic algorithms

Apply genetic algorithms to path testing

Conclusion

Partitioning procedures for solving mixed variables programming problems

Numer. Math.

An overview of automated software testing

J. System Software

The automated generation of test cases for compilers

J. Software Test. Verif. Reliab.

Genetic Algorithms in Search, Optimization and Machine Learning

Adaptation in Nature and Artificial Systems

The automatic generation of test data

Computer J.