Automatic test data generation for path testing using GAs

https://doi.org/10.1016/S0020-0255(00)00093-1Get rights and content

Abstract

Genetic algorithms (GAs) are inspired by Darwin's the survival of the fittest theory. This paper discusses a genetic algorithm that can automatically generate test cases to test a selected path. This algorithm takes a selected path as a target and executes sequences of operators iteratively for test cases to evolve. The evolved test case will lead the program execution to achieve the target path. To determine which test cases should survive to produce the next generation of fitter test cases, a metric named normalized extended Hamming distance (NEHD, which is used to determine whether the final test case is found) is developed. Based on NEHD, a fitness function named SIMILARITY is defined to determine which test cases should survive if the final test case has not been found. Even when there are loops in the target path, SIMILARITY can help the algorithm to lead the execution to flow along the target path.

Introduction

The basic concepts of genetic algorithms (GAs) were developed by Holland [7], [8]. To date, genetic algorithms are well developed and widely used in the area of artificial intelligence and have been successfully applied in many fields of optimization [17], [18], [19], [20], [21].

It is understood that all occurrences of pre-selected features in a tested program should be executed (e.g. all-branch testing requires that each branch in a program should be tested at least once). It is also understood that manually generating a large number of test cases to fulfill the testing criteria is too great an effort. As a result, certain degree of automation is necessary. As Ould [14] suggested, the automation of test case generation is the most important aspect of automatic testing. Automatic test case generation can keep testers from bias when preparing test cases.

Many automatic test case generators have been developed and many classical searching methods have been used to derive test cases [1], [2], [3], [4], [9]. However, these techniques work only for continuous functions (while most program domains are of discontinuous spaces). This is where GAs step in. GAs include a class of adaptive searching techniques which are able to search a discontinous space.

Genetic algorithms have been used to automatically find a program's longest or shortest execution times. In their paper about testing real-time systems using genetic algorithms [16], Wegener et al. investigated the effectiveness of GAs to validate the temporal correctness of real-time systems by establishing the longest and the shortest execution times. The authors declared that an appropriate fitness function for the genetic algorithms is found, and the fitness function is to measure the execution time in processor cycles. Their experiments using GAs on a number of programs have successfully identified new longer and shorter execution times than those that had been found using random testing or systematic testing. However, they did not show the fitness function in their work.

Genetic algorithms have also been used to search program domains for suitable test cases to satisfy the all-branch testing [6], [10], [11]. In their papers about automatic structural testing using genetic algorithms, Jones et al. showed that appropriate fitness functions are derived automatically for each branch predicate. All branches were covered with two orders of magnitude fewer test cases than random testing. The adequacy of the test cases is improved by the genetic algorithm's ability to generate test cases, which are at or close to the branch predicates.

Consider a predicate such as

If A=B then ⋯

The fitness function for this predicate is based on the Hamming distance between A and B. The authors declare that the reciprocal of Hamming distance leads to a suitable fitness function.

The Hamming distance is a count based on the number of different bits in the bit patterns for A and B (i.e. AB). Hamming distance between two bit patterns of equal length is defined as the number of different bits in the bit patterns. For example, the distance between 000 and 111 is three, while the distance between 010 and 011 is one.

The results of the papers above confirm that genetic algorithms are more efficient and effective than random testing in all-branch testing and finding longest execution time. In this paper, we developed a new metric (which is a fitness function) to determine the distance between the exercised path and the target path. The genetic algorithm with the metric is used to generate test cases for testing the target path. Such path-testing algorithm is designed to:

  • 1.

    select a target path from the program under test,

  • 2.

    sample test cases from the whole domain of the program,

  • 3.

    execute the program with the test cases to exercise paths, and calculate the fitness between the target path and each exercised path, and then

  • 4.

    evolve into new generation of fitter test cases,

  • 5.

    repeat from the step 3 to the step 4 until the fittest test case is found.

Section snippets

Genetic algorithms

Genetic algorithms begin with a set of initial individuals as the first generation, which are sampled at random from the problem domain. The algorithms are developed to perform a series of operations that transform the present generation into a new, fitter generation.

Each individual in each generation is evaluated with a fitness function. Based on the evaluation, the evolution of the individuals may approach the optimal solution.

The most common operations of a genetic algorithm are designed to

Apply genetic algorithms to path testing

Path testing is one kind of software testing methodologies which searches the program domain for suitable test cases such that after executing the program with the test cases, a predefined path can be reached. Since a program may contain an infinite number of paths, it is only practical to select a specific subset path to perform path testing. Genetic algorithms could be applied to path testing if the target paths are clearly defined and an appropriate fitness function related to this goal is

Conclusion

In this paper, the genetic algorithms are used to automatically generate test cases for path testing. The greatest merit of using the genetic algorithm in program testing is its simplicity. Each iteration of the genetic algorithms generates a generation of individuals. In practice, the computation time cannot be infinite, so that the iterations in the algorithm should be limited. Within the limited generations, solutions derived by genetic algorithms may be trapped around a local optimum and,

References (21)

  • J.F. Benders

    Partitioning procedures for solving mixed variables programming problems

    Numer. Math.

    (1962)
  • A. Bertolini

    An overview of automated software testing

    J. System Software

    (1991)
  • C.J. Burgess

    The automated generation of test cases for compilers

    J. Software Test. Verif. Reliab.

    (1994)
  • R. Ferguson, B. Korel, The chaining approach for software test data generation, ACM Trans. Software Engrg. and...
  • D.E. Goldberg

    Genetic Algorithms in Search, Optimization and Machine Learning

    (1989)
  • B.F. Jone, D.E., H.-H. Sthamer, A strategy for using genetic algorithms to automate branch and fault-based tesitng, The...
  • J.H. Holland, Genetic algorithms and the optimal allocation of trials, SIAM J. Comput. (1973)...
  • J.H. Holland

    Adaptation in Nature and Artificial Systems

    (1975)
  • D.C. Ince

    The automatic generation of test data

    Computer J.

    (1987)
  • B.F. Jones, H.-H. Sthamer, X. Yang, D.E. Eyres, The automatic generation of software test data sets using adaptive...
There are more references available in the full text version of this article.

Cited by (107)

  • PSO based test case generation for critical path using improved combined fitness function

    2020, Journal of King Saud University - Computer and Information Sciences
View all citing articles on Scopus
View full text