A novel method of grouping target paths for parallel programs
Introduction
Software testing is an important way to improve the quality of a software product, and a software product often needs a large number of tests to find its existing defects [1]. Test data are usually required when testing a software product. As a result, seeking appropriate approaches to generating effective test data becomes the core of software testing, and has been one of the hot topics in software engineering community. Before generating test data, the corresponding coverage criterion is required to specify. Up to date, there exist a variety of coverage criteria, such as statement coverage, branch coverage, path coverage, to name a few. Given that many other software testing problems can be converted into test data generation problem based on path coverage [2], generating test data for path coverage is one of important tasks to fulfill in software testing. However, traditional methods of generating test data generally require a lot of manpower, which is impractical for testing complex software products. In view of this, researchers have attempted to seek approaches to automatically generating test data, and among which search-based methods are the most popular. The common characters of these search-based methods are that they are all inspired from nature and do not need certain rules. When solving a problem, they automatically retrieve and direct optimized search space, and further adjust the search direction automatically. As a result, a lot of manpower required by traditional methods are freed up. Many search-based methods, for example, genetic algorithms [3], particle swarm optimization [4], ant colony optimization [5] and indicator-based evolutionary algorithms [6], can be applied in software testing. Scholars have successfully employed search-based methods to solve various problems of generating test data. Especially for genetic algorithms(GAs), they have been successfully applied to a wide range of fields, such as combinatorial optimization, machine learning, adaptive control and artificial life, to name a few. In view of this, GAs are selected in this paper to generate test data.
A program contains many paths and even an infinite number of paths [7]. 100% path coverage is generally impossible. A reasonable way to alleviate this case is to generate test data for covering multiple paths that are chosen in advance. When there are multiple paths to be covered, the problem of generating test data can be formulated as a multi-objective optimization problem [8]. Here, the above paths that are chosen to be covered are called target paths. However, if the number of target paths is too large, the formulated optimization problem will become very complicated, and thus increasing the difficulty of problem-solving. Previous studies have demonstrated that existing test data can be used to rapidly generate other test data similar to them. For example, there are two paths, saying p1 and p2, in a parallel program under test. One input, can cover p1, and the other, can cover p2. It can be seen that and are very similar. If one is generated, the other can be easily generated through changing its input variable z to (). Similar test data generally cover similar paths, and there will be some similar test data among the ones that cover similar paths. Following that, a method of grouping target paths was proposed to improve the efficiency in generating test data for path coverage [9]. This method divides all the target paths into a number of groups, with the purpose of all the paths in the same group having high similarities and furthermore test data that cover the paths in the same group having high similarities. When some test data that cover a part of paths in one group are generated, they can be used to quickly generate other similar test data that cover the other ones in the group. The key of the method is that all the paths in the same group should have high similarities. To fulfill this task, this method groups target paths according to the similarity between the benchmark path of each group and each of the others. Here, the benchmark path of a group is a referred one to decide whether a path belongs to this group or not. However, if only one benchmark path in each group is selected, it is possible that paths in the same group, except the benchmark one, may have low similarities.
In this paper, we investigate the problem of grouping target paths of a parallel program. Among a variety of parallel programs, MPI parallel programs are developed by extending serial programs and conforming to the Message- Passing Interface [10] standard, and have gained great popularity in parallel programming [11]. An MPI parallel program contains several processes which run concurrently and work together to fulfill a specific task. Each process tackles a part of the task and communicates with other processes. Nondeterminacy is a typical characteristic of parallel programs. For an MPI parallel program, its nondeterministic execution derives mainly from the usage of wildcards in receiving statements and the non-blocking communication mode. Parallel programs without non-determinacy can be developed by avoiding these two cases and have a wide range of applications [12]. This study focuses on testing MPI parallel programs with determinacy.
A novel Grouping method based on Dynamically Increasing benchmark paths (GDI) is proposed. This method regards all the paths currently in a group as the benchmark ones, with their number dynamically increasing during grouping. Whether a path belongs to a group or not is determined by the sum of all the similarities between the path and each of these benchmark paths, with the purpose of a high similarity between each pair of paths in the same group. Besides, we propose that sub-paths in different processes should have different contributions when calculating the similarity between paths, thus assigning different weights to different processes.
The contribution of this paper is mainly manifested in the following two aspects: (1) presenting a refinement of measuring the path similarity to more reasonably evaluate the similarity between paths when grouping target paths; (2) developing a method of grouping target paths to make a high similarity between any pair of paths in each group.
The remaining of this paper is organized as follows. The related work is comprehensively reviewed in Section 2. Section 3 introduces preliminary knowledge. The focus of this paper is in Section 4. It details the proposed method of grouping target paths for an MPI parallel program, and gives a case study. The proposed method is applied to test nine parallel programs, and compared with an existing method and the random grouping method in Section 5. Finally, Section 6 concludes the whole paper and points out several topics to be further researched.
Section snippets
The testing of serial programs
In recent years, fruitful results have been achieved for search-based methods, especially for genetic algorithms (GAs), to generate test data for serial programs so as to meet the specific structural coverage criteria. Among various structural coverage criteria, branch coverage, condition coverage, and path coverage are commonly used.
Xanthakis et al. is the first one that applied the genetic algorithm to solve the problem of test data generation, pioneering the era of using genetic algorithms
MPI Parallel program
For convenient illustration, the concepts used later are first given in this section [11], [36].
A parallel program is denoted as S. It consists of n processes, and is represented as . In addition, the input vector of S is denoted as that consists of L components, where is the i-th component. If the domain of xi is the domain of can be represented as .
A node in a process may be one or more consecutive
The proposed method
This section proposes a novel method of grouping target paths based on dynamically increasing benchmark paths for parallel programs. The target paths will be put into the same number of groups according to the number of computing resources available so as to make full use of them. Here, a computing resource refers to all the computing units running a program under test. Given the fact that a parallel program contains in general multiple processes, a computing resource refers to all the
Research questions
GDI is utilized to group target paths of an MPI parallel program, with the purpose of efficiently generating test data. So the following questions are required to answer in this section:
RQ1 Is the proposed method effective?
The purpose of grouping paths is to add similar paths to the same group. Therefore, the effectiveness is reflected by the average similarity of paths in each group and the average similarity of paths in all the groups. If a method has a larger average similarity of paths in
Conclusions and future work
This paper is related to the problem of multiple-path coverage of message-passing parallel programs, and expects to improve the coverage rate of paths and the efficiency in generating test data through reasonably grouping target paths.
In view of the limitation of the existing method in grouping target paths for parallel programs, a novel grouping method based on dynamically increasing benchmark paths is proposed in this paper. The distinct characteristic of GDI is that the number of benchmark
Declaration of Competing Interest
The authors declared that they have no conflicts of interest in this study
Acknowledgment
This paper is jointly supported by National Natural Science Foundation of China with grant No. 61773384 and 61503220, and National Key R&D Program of China with grant No. 2018YFB1003802-01.
References (47)
- et al.
Parallel ant colony optimization on multi-core SIMD CPUs
Future Generation Computer Systems
(2018) - et al.
Advances in software model checking
Advances in Computers
(2018) - et al.
GA-Based multiple paths test data generator
Computers & Operations Research
(2008) - et al.
Grouping target paths for evolutionary generation of test data in parallel
Journal of Systems and Software
(2012) - et al.
Automated test data generation for branch testing using genetic algorithm: an improved approach using branch ordering, memory and elitism
Journal of Systems and Software
(2013) - et al.
Evolutionary generation of test data for many paths coverage based on grouping
Journal of Systems and Software
(2011) - et al.
Detection of asynchronous message passing errors using static analysis
Proceedings of International Symposium on Practical Aspects of Declarative Languages
(2011) Software quality, software process, and software testing
Advances in Computers
(1994)- et al.
Research progress in software testing
Acta Scientiarum Naturalium Universitatis Pekinensis
(2005) - et al.
Application of genetic algorithms to software testing
Proceedings of the 5th International Conference on Software Engineering and its Applications
(1992)