Elsevier

Parallel Computing

Volume 97, September 2020, 102665
Parallel Computing

A novel method of grouping target paths for parallel programs

https://doi.org/10.1016/j.parco.2020.102665Get rights and content

Highlights

  • The problem of grouping traget paths for parallel programs is focused on.

  • The measurement of the path similarity is refined when grouping target paths.

  • The number of benchmark paths in a group is dynamically increased.

Abstract

Genetic algorithms can be employed to automatically generate desired test data, with the advantage of freeing up manpower. For the path coverage criterion, the problem of test data generation needs to be transformed into an optimization problem before applying genetic algorithms. However, when the number of paths to be covered is large, the transformed optimization problem will be very complicated. Correspondingly, the difficulty of problem solving will be greatly increased. In view of this, the complex optimization problem is divided into a number of sub-optimization problems by grouping paths. However, the existing method of grouping paths has not fully taken the characteristic of multiple processes existing in a parallel program into consideration. As a result, inappropriate paths will be put into the same group, which heavily restricts the efficiency of test data generation. To overcome the above drawback, this study proposes a novel method of grouping paths. This method refines the measurement of the path similarity when grouping target paths, dynamically increases the number of benchmark paths in a group, and groups the remaining ones based on the similarity between a path and each of these benchmark paths, with the purpose of a large similarity between each pair of paths in the same group. The proposed method is applied to test nine typical programs, and compared with the method of randomly grouping paths and the existing method of grouping paths. The experimental results show that paths in the same group obtained by the proposed method have a larger similarity, which is beneficial to efficiently generating test data that satisfy the path coverage criterion.

Introduction

Software testing is an important way to improve the quality of a software product, and a software product often needs a large number of tests to find its existing defects [1]. Test data are usually required when testing a software product. As a result, seeking appropriate approaches to generating effective test data becomes the core of software testing, and has been one of the hot topics in software engineering community. Before generating test data, the corresponding coverage criterion is required to specify. Up to date, there exist a variety of coverage criteria, such as statement coverage, branch coverage, path coverage, to name a few. Given that many other software testing problems can be converted into test data generation problem based on path coverage [2], generating test data for path coverage is one of important tasks to fulfill in software testing. However, traditional methods of generating test data generally require a lot of manpower, which is impractical for testing complex software products. In view of this, researchers have attempted to seek approaches to automatically generating test data, and among which search-based methods are the most popular. The common characters of these search-based methods are that they are all inspired from nature and do not need certain rules. When solving a problem, they automatically retrieve and direct optimized search space, and further adjust the search direction automatically. As a result, a lot of manpower required by traditional methods are freed up. Many search-based methods, for example, genetic algorithms [3], particle swarm optimization [4], ant colony optimization [5] and indicator-based evolutionary algorithms [6], can be applied in software testing. Scholars have successfully employed search-based methods to solve various problems of generating test data. Especially for genetic algorithms(GAs), they have been successfully applied to a wide range of fields, such as combinatorial optimization, machine learning, adaptive control and artificial life, to name a few. In view of this, GAs are selected in this paper to generate test data.

A program contains many paths and even an infinite number of paths [7]. 100% path coverage is generally impossible. A reasonable way to alleviate this case is to generate test data for covering multiple paths that are chosen in advance. When there are multiple paths to be covered, the problem of generating test data can be formulated as a multi-objective optimization problem [8]. Here, the above paths that are chosen to be covered are called target paths. However, if the number of target paths is too large, the formulated optimization problem will become very complicated, and thus increasing the difficulty of problem-solving. Previous studies have demonstrated that existing test data can be used to rapidly generate other test data similar to them. For example, there are two paths, saying p1 and p2, in a parallel program under test. One input, x1={x,y,z}, can cover p1, and the other, x2={x,y,z+1}, can cover p2. It can be seen that x1 and x2 are very similar. If one is generated, the other can be easily generated through changing its input variable z to (z+1). Similar test data generally cover similar paths, and there will be some similar test data among the ones that cover similar paths. Following that, a method of grouping target paths was proposed to improve the efficiency in generating test data for path coverage [9]. This method divides all the target paths into a number of groups, with the purpose of all the paths in the same group having high similarities and furthermore test data that cover the paths in the same group having high similarities. When some test data that cover a part of paths in one group are generated, they can be used to quickly generate other similar test data that cover the other ones in the group. The key of the method is that all the paths in the same group should have high similarities. To fulfill this task, this method groups target paths according to the similarity between the benchmark path of each group and each of the others. Here, the benchmark path of a group is a referred one to decide whether a path belongs to this group or not. However, if only one benchmark path in each group is selected, it is possible that paths in the same group, except the benchmark one, may have low similarities.

In this paper, we investigate the problem of grouping target paths of a parallel program. Among a variety of parallel programs, MPI parallel programs are developed by extending serial programs and conforming to the Message- Passing Interface [10] standard, and have gained great popularity in parallel programming [11]. An MPI parallel program contains several processes which run concurrently and work together to fulfill a specific task. Each process tackles a part of the task and communicates with other processes. Nondeterminacy is a typical characteristic of parallel programs. For an MPI parallel program, its nondeterministic execution derives mainly from the usage of wildcards in receiving statements and the non-blocking communication mode. Parallel programs without non-determinacy can be developed by avoiding these two cases and have a wide range of applications [12]. This study focuses on testing MPI parallel programs with determinacy.

A novel Grouping method based on Dynamically Increasing benchmark paths (GDI) is proposed. This method regards all the paths currently in a group as the benchmark ones, with their number dynamically increasing during grouping. Whether a path belongs to a group or not is determined by the sum of all the similarities between the path and each of these benchmark paths, with the purpose of a high similarity between each pair of paths in the same group. Besides, we propose that sub-paths in different processes should have different contributions when calculating the similarity between paths, thus assigning different weights to different processes.

The contribution of this paper is mainly manifested in the following two aspects: (1) presenting a refinement of measuring the path similarity to more reasonably evaluate the similarity between paths when grouping target paths; (2) developing a method of grouping target paths to make a high similarity between any pair of paths in each group.

The remaining of this paper is organized as follows. The related work is comprehensively reviewed in Section 2. Section 3 introduces preliminary knowledge. The focus of this paper is in Section 4. It details the proposed method of grouping target paths for an MPI parallel program, and gives a case study. The proposed method is applied to test nine parallel programs, and compared with an existing method and the random grouping method in Section 5. Finally, Section 6 concludes the whole paper and points out several topics to be further researched.

Section snippets

The testing of serial programs

In recent years, fruitful results have been achieved for search-based methods, especially for genetic algorithms (GAs), to generate test data for serial programs so as to meet the specific structural coverage criteria. Among various structural coverage criteria, branch coverage, condition coverage, and path coverage are commonly used.

Xanthakis et al. is the first one that applied the genetic algorithm to solve the problem of test data generation, pioneering the era of using genetic algorithms

MPI Parallel program

For convenient illustration, the concepts used later are first given in this section [11], [36].

A parallel program is denoted as S. It consists of n processes, S0,S1,S2,,Sn1, and is represented as S={S0,S1,S2,,Sn1}. In addition, the input vector of S is denoted as x=(x1,x2,,xL) that consists of L components, where xi(i=1,2,,L) is the i-th component. If the domain of xi is Dxi, the domain of x can be represented as D=Dx1×Dx2××DxL.

A node in a process may be one or more consecutive

The proposed method

This section proposes a novel method of grouping target paths based on dynamically increasing benchmark paths for parallel programs. The target paths will be put into the same number of groups according to the number of computing resources available so as to make full use of them. Here, a computing resource refers to all the computing units running a program under test. Given the fact that a parallel program contains in general multiple processes, a computing resource refers to all the

Research questions

GDI is utilized to group target paths of an MPI parallel program, with the purpose of efficiently generating test data. So the following questions are required to answer in this section:

RQ1 Is the proposed method effective?

The purpose of grouping paths is to add similar paths to the same group. Therefore, the effectiveness is reflected by the average similarity of paths in each group and the average similarity of paths in all the groups. If a method has a larger average similarity of paths in

Conclusions and future work

This paper is related to the problem of multiple-path coverage of message-passing parallel programs, and expects to improve the coverage rate of paths and the efficiency in generating test data through reasonably grouping target paths.

In view of the limitation of the existing method in grouping target paths for parallel programs, a novel grouping method based on dynamically increasing benchmark paths is proposed in this paper. The distinct characteristic of GDI is that the number of benchmark

Declaration of Competing Interest

The authors declared that they have no conflicts of interest in this study

Acknowledgment

This paper is jointly supported by National Natural Science Foundation of China with grant No. 61773384 and 61503220, and National Key R&D Program of China with grant No. 2018YFB1003802-01.

References (47)

  • X. Yan et al.

    An efficient particle swarm optimization for large-scale hardware/software co-design system

    International Journal of Cooperative Information Systems

    (2018)
  • H. Li et al.

    IBEA-SVM: An indicator-based evolutionary algorithm based on pre-selection with classification guided by SVM

    Applied Mathematics-A Journal of Chinese Universities

    (2019)
  • M. Snir et al.

    MPI-The Complete reference: The MPI core

    (1998)
  • S.R.S. Souza et al.

    Structural testing criteria for message-passing parallel programs

    Concurrency and Computation Practice and Experience

    (2008)
  • G. Chen et al.

    Parallel algorithm practice

    (2004)
  • R.P. Pargas et al.

    Test data generation using genetic algorithms

    Journal of Software Testing, Verification, and Reliability

    (1999)
  • M. Roper et al.

    Genetic algorithms and the automatic generation of test data

    Technical report, Semin.Arthr.Rheum, 1995

    (1995)
  • B.F. Jones et al.

    Automatic structural testing using genetic algorithms

    Software Engineering Journal

    (1996)
  • J.C. Lin et al.

    Using genetic algorithms for test case generation in path testing

    Proceedings of the 9th Asian Test Symposium

    (2000)
  • P.M.S. Bueno et al.

    Identification of potentially infeasible program paths by monitoring the search for test data

    Proceedings of The 15th IEEE International Conference on Automated Software Engineering, 2000

    (2000)
  • J. Li et al.

    Path test data generation based on adaptive genetic algorithm

    Computer Engineering

    (2009)
  • P.M.S. Bueno et al.

    Automatic test data generation for generation program path using genetic algorithms

    Int. J. Software Eng. Knowl. Eng.

    (2002)
  • J. Wegener et al.

    Evolutionary test environment for automatic structural testing

    Inf. Softw. Technol.

    (2001)
  • Cited by (0)

    View full text