An experimental study of adaptive testing for software reliability assessment

https://doi.org/10.1016/j.jss.2007.11.721Get rights and content

Abstract

Adaptive testing is a new form of software testing that is based on the feedback and adaptive control principle and can be treated as the software testing counterpart of adaptive control. Our previous work has shown that adaptive testing can be formulated and guided in theory to minimize the variance of an unbiased software reliability estimator and to achieve optimal software reliability assessment. In this paper, we present an experimental study of adaptive testing for software reliability assessment, where the adaptive testing strategy, the random testing strategy and the operational profile based testing strategy were applied to the Space program in four experiments. The experimental results demonstrate that the adaptive testing strategy can really work in practice and may noticeably outperform the other two. Therefore, the adaptive testing strategy can serve as a preferable alternative to the random testing strategy and the operational profile based testing strategy if high confidence in the reliability estimates is required or the real-world operational profile of the software under test cannot be accurately identified.

Introduction

Software testing is a major paradigm for software quality assurance and is extensively carried out in nearly every software development project (Pressman, 2000). It can be used for software reliability improvement by detecting and removing defects from the software under test, or for software reliability assessment by freezing the code of the software under test without removing detected defects during testing (Binder, 2000, Frankl et al., 1998). There are also techniques such as mutation testing that can be used to assess the defect detection capability of a given test suite of the software under test (DeMillo et al., 1978, Offutt and Untch, 2001). Many software testing techniques have been proposed, including control flow testing, data flow testing, state based testing, functional testing, boundary value testing, random testing, Markov usage model based testing, and so on (Beizer, 1990, Whittaker and Poore, 1993, Whittaker and Thomason, 1994, Zhu et al., 1997).

Adaptive testing is a new software testing technique which results from the application of feedback and adaptive control principles in software testing (Cai, 2002, Cai et al., 2005a). It is a form of adaptive control or can be treated as the software testing counterpart of adaptive control. As shown in Fig. 1.1, there are two feedback loops in the adaptive testing strategy.1 The first feedback loop constitutes the software under test, the database (history of testing data) and the testing strategy, in which the history of testing data is used to generate the next test cases by a given testing policy or test data adequacy criterion. The second feedback loop constitutes the software under test, the database, the parameter estimation scheme and the testing strategy, in which the history of testing data is also used to improve or change the underlying testing policy or test data adequacy criterion. The improvement may lead the random testing to switch from one test distribution (e.g., uniform distribution) to another test distribution (e.g., non-uniform distribution). It may lead partition testing to refine the partitioning of the input domain of the software under test. It may also lead data flow testing to perform boundary value testing. This is because at the beginning of software testing, the software tester has limited knowledge of the software under test and the capability of the test suite. As software testing proceeds, the understanding of the software under test and the test suite is improved. In this paper the improvement leads to better test case selection scheme as a result of better estimates of defect detection rates.

Our previous works have shown that adaptive testing can be carried out for software reliability improvement (Cai, 2002, Cai et al., 2005a, Cai et al., 2007) as well as for software reliability assessment (Cai et al., 2004b). Experimental studies reveal that adaptive testing can noticeably outperform random testing in terms of using fewer tests to detect more defects (Cai et al., 2007). As far as software reliability assessment is concerned, Cai et al. (2004b) only presents simulation results and no experimental results are available. This raises several research questions:

  • (1)

    Is the adaptive testing strategy for software reliability assessment effective? Although simulation results are very useful for justifying the possible behavior of adaptive testing, they may not be sufficient to draw convincing conclusions. The proposed adaptive testing strategy must be validated or invalidated against real software programs.

  • (2)

    What cost should be paid by the adaptive testing strategy? Suppose that the adaptive testing strategy is effective in the sense that it outperforms existing testing strategies in terms of the variance of software reliability estimates, what is the possible disadvantage of the adaptive testing strategy?

  • (3)

    How may various factors affect the performance of the adaptive testing strategy? These factors may include the defect remaining in the software under test, the partitioning of the given test suite, the number of tests that can be applied, and the inaccuracy of the given operational profile.

In response to the above general research questions, in this paper we present an experimental study to evaluate the effectiveness or advantage of adaptive testing in comparison with random testing for software reliability assessment. This experimental study comprised four experiments, which accounted for distinct partitioning of the given test suite and distinct number of tests applied to the software under test. The main purpose of this experimental study was to answer two more specific questions to be identified in Section 3.5.

The rest of this paper is organized as follows: Section 2 reviews the adaptive testing strategy for software reliability assessment. Section 3 describes the set-up for the four experiments presented in this paper. Section 4 presents the results of the four experiments. Section 5 presents an analysis of the experimental results. Section 6 considers the testing scenarios where the expected operational profile of the software under test is inaccurately known a priori. In Section 7 we have a general discussion for the experimental study. Concluding remarks are contained in Section 8.

Section snippets

Adaptive testing for software reliability assessment

The central idea of adaptive testing is to treat software testing as a feedback and adaptive control problem, where the software under test serves as the controlled object and the testing strategy serves as the corresponding controller. As shown in Fig. 1.1, the testing strategy or the adaptive testing strategy is adjusted or updated on-line during testing. For the purpose of software reliability assessment, it is assumed that no failure-causing defects, if any, are removed from the software

Experiment set-up

In order to validate or invalidate the relative effectiveness of the adaptive testing strategy presented in Section 2.2 we need to apply it to the real software and see if it can reveal the true value of the reliability of the software under test. Preferably, the testing process should be automated to avoid human biases. In this section, we describe how the adaptive testing strategy was applied in our experimental study.

Experimental results

Note that in test suite 1 test cases are not evenly distributed across the four classes, whereas in test suite 2 test cases are roughly evenly distributed. Also note that a testing process with x0 = 500 can be treated as a short-term process in comparison with a testing process with x0 = 3000 which can be treated as a long-term process. We considered four experiments in terms of test suite used and x0, and applied all the three testing strategies to the various test scenarios.

Analysis of experimental results

Now we need to analyze the experimental results and see if the testing strategy AT is effective or advantageous in comparison with testing strategies RT and ORT. Since the goal of the testing strategy AT is to minimize var(ρˆ) or var(R^) (refer to Eq. (2.5)), the comparison should be based on the measure SD (refer to Eq. (4.1)). Note that (SD)2 is an unbiased estimator of var(R^). Δ can serve as an auxiliary measure for comparison since it measures the difference between the true value of

Question of concern

Section 5 shows that the testing strategy AT was basically comparable to the testing strategies RT and ORT in terms of ∣Δ∣. Further, the testing strategy AT certainly outperformed the testing strategies RT and ORT in terms of SD, although the advantage of the former over the latter might or might not be noticeable. However a prerequisite to apply the testing strategies ORT and AT is that the operational profile is given. Unfortunately, the real-world operational profile can hardly be precisely

Main conclusions

Up to this point the answers to the two questions identified in Section 3.5 can be stated as followed:

  • (1)

    The adaptive testing strategy AT can really work and perform well in the sense that the variance of its resulting software reliability estimate is generally smaller than those corresponding to the testing strategies RT and ORT. The testing strategy AT achieves its testing goal given a priori.

  • (2)

    While the testing strategy AT may noticeably outperform the random testing strategy RT and the

Concluding remarks

We have presented an experimental study with the Space program for the adaptive testing strategy and empirically compare it with the random testing strategy and the operational profile based testing strategy. The experimental results show that the adaptive testing strategy may noticeably outperform the other two testing strategies in terms of the variance of software reliability estimate at the expense of an increment to the average inaccuracy of the estimate. Since the adaptive testing

Acknowledgement

The helpful comments of the anonymous reviewers led to improved readability of the paper.

Kai-Yuan Cai is a Cheung Kong Scholar (Chair Professor), jointly appointed by the Ministry of Education of China and the Li Ka Shing Foundation of Hong Kong in 1999. He has been a full professor at Beihang University (Beijing University of Aeronautics and Astronautics) since 1995. He was born in April 1965 and entered Beihang University as an undergraduate student in 1980. He received his B.S. degree in 1984, M.S. degree in 1987, and Ph.D. degree in 1991, all from Beihang University. He was a

References (36)

  • K.Y. Cai et al.

    An overview of software cybernetics

  • K.Y. Cai et al.

    On the test case definition of GUI testing

  • R.A. DeMillo et al.

    Hints on test data selection: help for the practicing programmer

    IEEE Computer

    (1978)
  • N. Fenton et al.

    Applying Bayesian belief network in systems dependability assessment

  • P.G. Frankl et al.

    Evaluating testing methods by delivered reliability

    IEEE Transactions on Software Engineering

    (1998)
  • Henmann, D.S., 1998. Sample implementation of the Littlewood holistic model for assessing software quality, safety and...
  • O. Hernandez-Lerma

    Adaptive Markov Control Processes

    (1989)
  • K. Kanoun et al.

    Qualitative and quantitative reliability assessment

    IEEE Software

    (1997)
  • Cited by (37)

    • Reliability assessment of service-based software under operational profile uncertainty

      2020, Reliability Engineering and System Safety
      Citation Excerpt :

      More recently, Silva et al. [23] claim that the predictive ability of the studied reliability models is not affected by OP variations. Oppositely, Cai et al. [24] experimentally evaluate testing techniques performance for reliability assessment under inaccurate profiles, revealing an influence on the estimate accuracy. Chen et al. [25] conduct a case study, using four Software Reliability Growth Models, finding the error of the estimate to grow linearly with the error in the profile estimate.

    • Deriving a frequentist conservative confidence bound for probability of failure per demand for systems with different operational and test profiles

      2017, Reliability Engineering and System Safety
      Citation Excerpt :

      There has been extensive research on the use of statistical methods for estimating software reliability using realistic operational scenarios [12,13]. Adaptive testing strategies have been used to estimate confidence intervals (such as [14,15]) but these strategies are designed to adapt the test profiles once failures are observed, so they are not applicable to testing high integrity systems where no failures are expected. Musa [16] and Crespo et al. [17] have modelled the impact of different operational profiles based on reliability growth during testing, but this is not directly applicable to high integrity systems where we do not expect to observe failures in the final test phase.

    • Modern software cybernetics: New trends

      2017, Journal of Systems and Software
      Citation Excerpt :

      Adaptive testing is a new form of software testing that is based on the feedback and adaptive control principle and can be treated as the software testing counterpart of adaptive control. It means that a software testing strategy should be adjusted on-line by using the testing data collected during software testing (Cai et al., 2007; Cai et al., 2008; Ye et al., 2009). Cai et al. (2007) proposed a new strategy of adaptive software testing to employ fixed-memory feedback for on-line parameter estimations.

    View all citing articles on Scopus

    Kai-Yuan Cai is a Cheung Kong Scholar (Chair Professor), jointly appointed by the Ministry of Education of China and the Li Ka Shing Foundation of Hong Kong in 1999. He has been a full professor at Beihang University (Beijing University of Aeronautics and Astronautics) since 1995. He was born in April 1965 and entered Beihang University as an undergraduate student in 1980. He received his B.S. degree in 1984, M.S. degree in 1987, and Ph.D. degree in 1991, all from Beihang University. He was a research fellow at the Centre for Software Reliability, City University, London, and a visiting scholar at City University of Hong Kong, Swinburge University of Technology (Australia), University of Technology, Sydney (Australia), and Purdue University (USA). Dr. Cai has published over 40 research papers in international journals and is the author of three books: Software Defect and Operational Profile Modeling (Kluwer, Boston, 1998); Introduction to Fuzzy Reliability (Kluwer, Boston, 1996); Elements of Software Reliability Engineering (Tshinghua University Press, Beijing, 1995, in Chinese). He serves on the editorial board of the international journal Fuzzy Sets and Systems and is the editor of the Kluwer International Series on Asian Studies in Computer and Information Science (http://www.wkap.nl/prod/s/ASIS). He served as program committee co-chair for the Fifth International Conference on Quality Software (Melbourne, Australia, September 2005), the First International Workshop on Software Cybernetics (Hong Kong, September 2004), and the Second International Workshop on Software Cybernetics (Edinburgh, UK, July 2005), general co-chair for the Third International Workshop on Software Cybernetics (Chicago, September 2006), and the Second International Symposium on Service-Oriented System Engineering (Shanghai, October 2006). He also served as guest editor for Fuzzy Sets and Systems (1996), the International Journal of Software Engineering and Knowledge Engineering (2006), and the Journal of Systems and Software (2006). His main research interests include software reliability and testing, autonomous flight control, and software cybernetics.

    Chang-Hai Jiang was born in September 1983. He receives his B.S degree at Beihang University (Beijing University of Aeronautics and Astronautics) in 2004. Now he is a Ph.D student at Beihang University under the supervision of Professor Kai-Yuan Cai. His main research topics are software testing and software cybernetics.

    Hai Hu was born in August 1983. He receives his B.S degree at Beihang University (Beijing University of Aeronautics and Astronautics) in 2004. He worked as a visiting scholar at the University of Texas at Dallas from 2005 to 2006. Now he is a Ph.D student at Beihang University under the supervision of Professor Kai-Yuan Cai. His main research topics are software testing and software cybernetics.

    Cheng-Gang Bai received the M.Sc. degree in Statistics from Nanijing University of Aeronautics and Astronautics, Nanjing, China in 1990 and the Ph.D. degree in Control Theory and Control Engineering from Zhejiang University, Zhejiang, China in 1999. From July 1990 to March 1996, he was a Lecturer with the Department of Computer Science and Engineering, Liaocheng University, China. From September 1999 to November 2001, he was a Postdoctoral Fellow with the Department of Automatic Control, Beihang University (Beijing University of Aeronautics and Astronautics), China. In November 2001, he joined the faculty of the Department of Automatic Control, Beihang University, China and has been an Associate Professor since July 2002 in the same university. His research interests include Bayesian statistics, software reliability, software testing.

    Cai was supported by the National Science Foundation of China and Microsoft Research Asia (Grant Nos. 60633010 and 60474006) and the 863 Programme of China (Grant No. 2006AA01Z174). Bai was supported by the National Science Foundation of China (Grant No. 60473067). Cai is also with State Key Laboratory of Virtual Reality Technology and Systems, Beijing, China.

    View full text