Dynamic random testing with test case clustering and distance-based parameter adjustment

https://doi.org/10.1016/j.infsof.2020.106470Get rights and content

Abstract

Context

Software testing is essential in software engineering to improve software reliability. One goal of software testing strategies is to detect faults faster. Dynamic Random Testing (DRT) strategy uses the testing results to guide the selection of test cases, which has shown to be effective in the fault detection process.

Objective

Previous studies have demonstrated that DRT is greatly affected by the test case classification and the process of adjusting the testing profile. In this paper, we propose Distance-based DRT (D-DRT) strategies, aiming at enhancing the fault detection effectiveness of DRT.

Method

D-DRT strategies utilize distance information of inputs into the test case classification and the testing profile adjustment process. The test cases are vectorized based on the input parameters and classified into disjoint subdomains through certain clustering methods. And the distance information of subdomains, along with testing results, are used to adjust the testing profile, such that test cases that are closer to failure-causing subdomains are more likely to be selected.

Results

We conduct empirical studies to evaluate the performance of the proposed algorithms using 12 versions of 4 open-source programs. The experimental results show that, compared with Random Testing (RT), Random Partition Testing (RPT), DRT and Adaptive Testing (AT), our strategies achieve greater fault detection effectiveness with a low computational cost. Moreover, the distance-based testing profile adjustment method is the dominant factor in the improvement of the D-DRT strategy.

Conclusion

D-DRT strategies are effective testing strategies, and the distance-based testing profile adjustment method plays a crucial role.

Introduction

Software testing is indispensable in software engineering to ensure software quality and reliability. Generally, the input domain of software is extremely large, and the testing resources are limited. Only limited amount of the test cases can be executed during the testing process. Therefore, the quality of software testing is directly dependent on testing strategies that guide the selection of test cases [1]. Random Testing (RT) and Partition Testing (PT) are two well-known testing strategies [2]. RT [3] selects test cases in accordance with some given probability distribution. A particular scenario of concern, the test case selection is based on the uniformly distributed profile when the testing purpose is to detect software faults. On the other hand, if the testing purpose is aimed for reliability assessment, RT follows a so-called operational profile, in which the probability distribution is based on the user profile. RT has been extensively deployed in the tests of various systems, such as network protocol implementations [4], Windows NT programs [5], and embedded systems [6]. PT is another primary software testing technique, and it partitions the input domain into several disjoint subdomains. The subdomains are equivalence classes, and representative test cases are selected from each subdomain to form a desired test suite [7], [8]. In PT, each partition is expected to have a certain degree of homogeneity; that is, inputs in the same partition should cause similar software execution behaviors. In the ideal case, if one input is a failure-causing input or non-failure-causing input, all other inputs in the same partition will also be failure-causing inputs or non-failure-causing inputs, respectively. The fault detection effectiveness of RT and PT has been investigated in many studies [2], [3], [9], [10]. PT may better than RT in some sense when each partition is homogeneous. However, such homogeneousness is hard to guarantee in practice, and hence PT may be ineffective. PT, which is considered as more systematic testing strategy, does not outperform RT too much, and RT has higher effectiveness under some circumstances.

Since RT and PT are based on different intuitions, it is likely that they can be complementary to each other. Several studies have been conducted to develop advanced testing strategies through the integration of RT and PT. Random Partition Testing (RPT) [11] may be the most straightforward technique for integration. RPT selects a subdomain according to a predefined probability profile and selects a test case from the subdomain. Suppose that the test suite of software under test (SUT) is partitioned into m subdomains {C1,C2,...,Cm}, in which the ithsubdomain Ci consists of ni distinct test cases. RPT selects a test case in two steps. First, a subdomain is selected according to a given testing profile {p1,p2,...,pm}, where piis the selection probability of the ith subdomain. Second, a test case is randomly selected from the chosen subdomain according to a uniform probability distribution.

A number of researchers [7], [12], [13] from different areas have independently conducted investigations into the behavior and patterns of software failures and they have reported that failure-causing inputs (the inputs that can reveal software failures) are normally clustered into contiguous failure regions. Some studies have also illustrated that PT is usually better than RT when the test cases in subdomains with higher fault detection rates are selected with higher selection probabilities [14], [15], [16]. Based on these intuitions, particularly, the empirical observation that software faults tend to cluster, Cai et al. proposed Adaptive Testing (AT) [17] to control the testing process. In AT, the software testing is treated as an adaptive control problem, and the SUT is modeled as a controlled Markov chain. Then, by estimating the parameters based on the historical testing data, the software testing strategy (treated as the corresponding controller) can be adjusted online to make an optimal testing decision. However, AT's decision-making incurs additional computational overhead. To alleviate this to a large extent, Dynamic Random Testing (DRT) was proposed by Cai et al. [18]. The fault detection effectiveness of DRT is close to AT, meanwhile the computational cost is largely reduced. DRT aims at improving RPT. Different from RPT, in which the values of pi are fixed during the testing profile, DRT changes the testing profile according to the testing results so that test cases with higher fault detection rates can be selected earlier. Suppose that the test suite is divided into m subdomains, if a failure is triggered by a test case in subdomain Ci, the selection probability of Ci will increase from pi to pi+ ε (where the upper bound of pi is 1), and the selection probabilities of other subdomains will decrease ε/(m-1); otherwise, the selection probability will decrease to piδ (where the lower bound of pi is 0), and the selection probabilities of other subdomains will increase δ/(m-1). The adjusting parameters ε and δ, which can be set in the range of [0, 1], denote the increment and decrement of the selection probabilities of subdomains after the execution of test cases. Some studies [19], [20], [21], [22] have shown that the fault detection effectiveness of DRT is better than that of RT and RPT, as DRT requires fewer test cases to detect a fixed number of faults.

According to our previous works [23], [24], the performance of DRT is greatly affected by test case classification and the testing profile adjustment process. On the one hand, a good classification can distinguish failure-causing inputs and non-failure-causing inputs. The test case classification method used in DRT might not be desirable in some cases. For instance, the failure-causing test cases cannot be clustered together when the criteria are obscure. On the other hand, the subdomains are treated as equivalent in the original DRT; that is, the increase or decrease of the selection probabilities of the unselected subdomains are the same after the execution of a test case. Since the test cases that are similar to failure-causing inputs and different from non-failure-causing inputs are more likely to trigger failures, the fault detection capacities of the subdomains in each test may vary, and giving equal increment or decrement of selection probabilities to unselected subdomains is not consistent with the discrepancy among the subdomains. Therefore, the differential information of test cases can be utilized in the testing profile adjustment process to help determine the increment or decrement of the subdomains individually.

Based on the problems mentioned above, we propose distance-based DRT (D-DRT) strategies, which improve DRT by applying the clustering method to classify test cases and utilizing distance information of test cases to adjust the testing profile. Usually, test cases with similar inputs have similar execution traces; for example, the functions or structures they test are identical or similar. Thus, the failure detection capacities of these test cases are similar to each other. The purpose of clustering is to partition similar test cases into the same subdomain. Our proposed D-DRT strategy uses Euclidian distance to measure the similarities of test cases. A shorter distance between test cases indicates that they have the similar fault-detecting capacity. In this paper, we use three kinds of clustering methods, namely, k-means clustering method, k-medoids clustering method and hierarchical clustering method to classify the vectorized test cases into disjoint subdomains. Additionally, the distance information among subdomains is used in the testing profile adjustment process, along with the testing results. If a failure is triggered by a test case selected from a subdomain, then the decrease in the selection probabilities of subdomains near a failure-causing subdomain is smaller than that of subdomains far away from a failure-causing subdomain.

Because the proposed approach uses three clustering algorithms, three D-DRT strategies are constructed. The D-DRT strategy with k-means clustering method is denoted as D-DRTK, the D-DRT strategy with k-medoids clustering method as D-DRTKM, and D-DRT strategy with hierarchical clustering method as D-DRTH. The three D-DRT strategies utilize distance information of test cases to obtain classification and optimize the adjustment process, aiming at detecting faults faster. The major contributions of our study are as follows.

  • l

    To the best of our knowledge, this is the first work to utilize the distance measure among test cases in the DRT strategy, which enables the test case information to be used effectively in the field of software cybernetics.

  • l

    We develop three clustering algorithms for D-DRT, namely, k-means clustering method, k-medoids clustering method, and hierarchical clustering method, which make use of distance information to classify the test cases and optimize the adjustment process.

  • l

    The performance of the three proposed strategies is evaluated through a series of empirical studies on open-source programs. It is shown that the proposed strategies have higher fault-detection effectiveness with a relatively low computational cost.

The remainder of this paper is organized as follows: In Section 2, the D-DRT strategies with three kinds of clustering methods are introduced. The experimental setup is described in Section 3. The experimental results are analyzed in Section 4. Threats to validity are summarized in Section 5. Related works on testing strategies are presented in Section 6. Conclusions and future works are summarized in Section 7.

Section snippets

Motivations

Let us first look at how DRT adjusts the selection probabilities of subdomains during the testing process. Suppose that the test suite is partitioned into m disjoint subdomains with the testing profile{p1,p2,...,pm}. A subdomain is selected according to the testing profile, and a test case is chosen from the selected subdomain according to a uniform distribution. The testing result (Pass/Fail) is used to adjust the testing profile. For example, if a failure is triggered by a test case of

Experimental setup

Some experimental studies are conducted in this paper to evaluate the performance of the proposed D-DRT strategies. We aim to answer the following questions.

RQ1: How do D-DRT strategies perform in terms of the fault detection effectiveness compared with RT, RPT and AT?

RQ2: How do D-DRT strategies perform in terms of the time cost compared with RT, RPT and AT?

RQ3: How do D-DRT strategies perform in terms of the fault detection effectiveness compared with the clustering-based RPT when the same

RQ1: Fault detection effectiveness

To evaluate the fault detection effectiveness of our approaches, we apply RT, RPT, DRT, AT, D-DRTK, D-DRTKM and D-DRTH to 12 versions of the subject programs. The fault detection effectiveness is measured by the T-measure. We use boxplots to display the comparison results, which is shown in Fig. 5. The x-axis in each boxplot represents the testing strategy, and the y-axis represents the average number of test cases executed to detect all seeded faults for the 100 times repetition. The boxplot

Threatsto validity

Some potential threats to the validity of our experimental study are discussed in this section.

First, one obvious threat to validity is the selection of subject programs. In our study, 4 programs with 12 versions in SIR were selected as subject programs in our experiment. They cannot represent all software, since the number of programs is limited. It is hard to guarantee that our strategies will exhibit similar results on other programs. Nevertheless, these subject programs have been widely

Related works

In this section, some of the major state-of-the-art works on software testing strategies are introduced.

Conclusion

Software testing is an essential aspect of examining the quality and reliability of software. DRT uses the testing results to adjust the testing profile during the testing process, which can improve fault detection effectiveness than RPT. Previous studies have shown that the test case classification and testing profile adjustment process have great effects on the performance of DRT. In this paper, we proposed three distance-based DRT strategies with different clustering methods: D-DRT with

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China under Grant 61772055 and Grant 61872169, in part by the Technical Foundation Project of Ministry of Industry and Information Technology of China under Grant JSZL2016601B003, and in party by Equipment Preliminary R&D Project of China under Grant 41402020102.

References (52)

  • R. Huang et al.
    (2015)
  • J. Chen et al.

    Test case prioritization for object-oriented software: An adaptive random sequence approach based on clustering

    J. Syst. Softw.

    (2018)
  • J. Lv et al.

    Adaptive and random partition software testing

    IEEE Trans. Syst. Man Cybernet. Syst.

    (2014)
  • W.J. Gutjahr

    Partition testing vs. random testing: The influence of uncertainty

    IEEE Trans. Softw. Eng.

    (1999)
  • T.Y. Chen et al.

    On the relationship between partition and random testing

    IEEE Trans. Softw. Eng.

    (1994)
  • E. Hoque et al.

    Building robust distributed systems and network protocols by using adversarial testing and behavioral analysis

  • J.E. Forrester et al.

    An empirical study of the robustness of Windows NT applications using random testing

  • J. Regehr

    Random testing of interrupt-driven software

  • J. Zeng et al.

    Evaluating the effectiveness of random and partition testing by delivered reliability

  • D. Hamlet et al.

    Partition testing does not inspire confidence

  • X. Bai et al.

    Ontology-based test modeling and partition testing of web services

  • P. Bernardi et al.

    A hybrid approach for detection and correction of transient faults in SoCs

    IEEE Trans. Depend. Secure Comput.

    (2010)
  • K.Y. Cai et al.

    Partition testing with dynamic partitioning

  • P.E. Ammann et al.

    Data diversity: An approach to software fault tolerance

    IEEE Trans. Comp.

    (1988)
  • A. Arcuri et al.

    Formal analysis of the probability of interaction fault detection using random testing

    IEEE Trans. Softw. Eng.

    (2011)
  • P.J. Boland et al.

    Comparing partition and random testing via majorization and Schur functions

    IEEE Trans. Softw. Eng.

    (2003)
  • Cited by (0)

    View full text