An experimental study of adaptive testing for software reliability assessment

doi:10.1016/j.jss.2007.11.721

Journal of Systems and Software

Volume 81, Issue 8, August 2008, Pages 1406-1429

https://doi.org/10.1016/j.jss.2007.11.721 Get rights and content

Abstract

Adaptive testing is a new form of software testing that is based on the feedback and adaptive control principle and can be treated as the software testing counterpart of adaptive control. Our previous work has shown that adaptive testing can be formulated and guided in theory to minimize the variance of an unbiased software reliability estimator and to achieve optimal software reliability assessment. In this paper, we present an experimental study of adaptive testing for software reliability assessment, where the adaptive testing strategy, the random testing strategy and the operational profile based testing strategy were applied to the Space program in four experiments. The experimental results demonstrate that the adaptive testing strategy can really work in practice and may noticeably outperform the other two. Therefore, the adaptive testing strategy can serve as a preferable alternative to the random testing strategy and the operational profile based testing strategy if high confidence in the reliability estimates is required or the real-world operational profile of the software under test cannot be accurately identified.

Introduction

Software testing is a major paradigm for software quality assurance and is extensively carried out in nearly every software development project (Pressman, 2000). It can be used for software reliability improvement by detecting and removing defects from the software under test, or for software reliability assessment by freezing the code of the software under test without removing detected defects during testing (Binder, 2000, Frankl et al., 1998). There are also techniques such as mutation testing that can be used to assess the defect detection capability of a given test suite of the software under test (DeMillo et al., 1978, Offutt and Untch, 2001). Many software testing techniques have been proposed, including control flow testing, data flow testing, state based testing, functional testing, boundary value testing, random testing, Markov usage model based testing, and so on (Beizer, 1990, Whittaker and Poore, 1993, Whittaker and Thomason, 1994, Zhu et al., 1997).

Adaptive testing is a new software testing technique which results from the application of feedback and adaptive control principles in software testing (Cai, 2002, Cai et al., 2005a). It is a form of adaptive control or can be treated as the software testing counterpart of adaptive control. As shown in Fig. 1.1, there are two feedback loops in the adaptive testing strategy.¹ The first feedback loop constitutes the software under test, the database (history of testing data) and the testing strategy, in which the history of testing data is used to generate the next test cases by a given testing policy or test data adequacy criterion. The second feedback loop constitutes the software under test, the database, the parameter estimation scheme and the testing strategy, in which the history of testing data is also used to improve or change the underlying testing policy or test data adequacy criterion. The improvement may lead the random testing to switch from one test distribution (e.g., uniform distribution) to another test distribution (e.g., non-uniform distribution). It may lead partition testing to refine the partitioning of the input domain of the software under test. It may also lead data flow testing to perform boundary value testing. This is because at the beginning of software testing, the software tester has limited knowledge of the software under test and the capability of the test suite. As software testing proceeds, the understanding of the software under test and the test suite is improved. In this paper the improvement leads to better test case selection scheme as a result of better estimates of defect detection rates.

Our previous works have shown that adaptive testing can be carried out for software reliability improvement (Cai, 2002, Cai et al., 2005a, Cai et al., 2007) as well as for software reliability assessment (Cai et al., 2004b). Experimental studies reveal that adaptive testing can noticeably outperform random testing in terms of using fewer tests to detect more defects (Cai et al., 2007). As far as software reliability assessment is concerned, Cai et al. (2004b) only presents simulation results and no experimental results are available. This raises several research questions:

(1)
Is the adaptive testing strategy for software reliability assessment effective? Although simulation results are very useful for justifying the possible behavior of adaptive testing, they may not be sufficient to draw convincing conclusions. The proposed adaptive testing strategy must be validated or invalidated against real software programs.
(2)
What cost should be paid by the adaptive testing strategy? Suppose that the adaptive testing strategy is effective in the sense that it outperforms existing testing strategies in terms of the variance of software reliability estimates, what is the possible disadvantage of the adaptive testing strategy?
(3)
How may various factors affect the performance of the adaptive testing strategy? These factors may include the defect remaining in the software under test, the partitioning of the given test suite, the number of tests that can be applied, and the inaccuracy of the given operational profile.

In response to the above general research questions, in this paper we present an experimental study to evaluate the effectiveness or advantage of adaptive testing in comparison with random testing for software reliability assessment. This experimental study comprised four experiments, which accounted for distinct partitioning of the given test suite and distinct number of tests applied to the software under test. The main purpose of this experimental study was to answer two more specific questions to be identified in Section 3.5.

The rest of this paper is organized as follows: Section 2 reviews the adaptive testing strategy for software reliability assessment. Section 3 describes the set-up for the four experiments presented in this paper. Section 4 presents the results of the four experiments. Section 5 presents an analysis of the experimental results. Section 6 considers the testing scenarios where the expected operational profile of the software under test is inaccurately known a priori. In Section 7 we have a general discussion for the experimental study. Concluding remarks are contained in Section 8.

Section snippets

Adaptive testing for software reliability assessment

The central idea of adaptive testing is to treat software testing as a feedback and adaptive control problem, where the software under test serves as the controlled object and the testing strategy serves as the corresponding controller. As shown in Fig. 1.1, the testing strategy or the adaptive testing strategy is adjusted or updated on-line during testing. For the purpose of software reliability assessment, it is assumed that no failure-causing defects, if any, are removed from the software

Experiment set-up

In order to validate or invalidate the relative effectiveness of the adaptive testing strategy presented in Section 2.2 we need to apply it to the real software and see if it can reveal the true value of the reliability of the software under test. Preferably, the testing process should be automated to avoid human biases. In this section, we describe how the adaptive testing strategy was applied in our experimental study.

Experimental results

Note that in test suite 1 test cases are not evenly distributed across the four classes, whereas in test suite 2 test cases are roughly evenly distributed. Also note that a testing process with x₀ = 500 can be treated as a short-term process in comparison with a testing process with x₀ = 3000 which can be treated as a long-term process. We considered four experiments in terms of test suite used and x₀, and applied all the three testing strategies to the various test scenarios.

Analysis of experimental results

Now we need to analyze the experimental results and see if the testing strategy AT is effective or advantageous in comparison with testing strategies RT and ORT. Since the goal of the testing strategy AT is to minimize $var (\hat{ρ})$ or $var (\hat{R})$ (refer to Eq. (2.5)), the comparison should be based on the measure SD (refer to Eq. (4.1)). Note that (SD)² is an unbiased estimator of $var (\hat{R})$ . Δ can serve as an auxiliary measure for comparison since it measures the difference between the true value of

Question of concern

Section 5 shows that the testing strategy AT was basically comparable to the testing strategies RT and ORT in terms of ∣Δ∣. Further, the testing strategy AT certainly outperformed the testing strategies RT and ORT in terms of SD, although the advantage of the former over the latter might or might not be noticeable. However a prerequisite to apply the testing strategies ORT and AT is that the operational profile is given. Unfortunately, the real-world operational profile can hardly be precisely

Main conclusions

Up to this point the answers to the two questions identified in Section 3.5 can be stated as followed:

(1)
The adaptive testing strategy AT can really work and perform well in the sense that the variance of its resulting software reliability estimate is generally smaller than those corresponding to the testing strategies RT and ORT. The testing strategy AT achieves its testing goal given a priori.
(2)
While the testing strategy AT may noticeably outperform the random testing strategy RT and the

Concluding remarks

We have presented an experimental study with the Space program for the adaptive testing strategy and empirically compare it with the random testing strategy and the operational profile based testing strategy. The experimental results show that the adaptive testing strategy may noticeably outperform the other two testing strategies in terms of the variance of software reliability estimate at the expense of an increment to the average inaccuracy of the estimate. Since the adaptive testing

Acknowledgement

The helpful comments of the anonymous reviewers led to improved readability of the paper.

References (36)

K.Y. Cai
Towards a conceptual framework of software run reliability modeling
Information Sciences
(2000)
K.Y. Cai
Optimal software testing and adaptive software testing in the context of software cybernetics
Information and Software Technology
(2002)
K.Y. Cai et al.
On the neural network approach in software reliability modeling
Journal of Systems and Software
(2001)
K.Y. Cai et al.
Optimal and adaptive testing for software reliability assessment
Information and Software Technology
(2004)
K.Y. Cai et al.
Optimal software testing in the setting of controlled Markov chains
European Journal of Operational Research
(2005)
K.Y. Cai et al.
Adaptive software testing with fixed-memory feedback
Journal of Systems and Software
(2007)
K. Goseva-Popstojanova et al.
Architecture-based approach to reliability assessment of software systems
Performance Evaluation
(2001)
B. Beizer
Software Testing Techniques
(1990)
R.V. Binder
Testing Object-Oriented Systems: Models, Patterns, and Tools
(2000)
K.Y. Cai
Software Defect and Operational Profile Modeling
(1998)

K.Y. Cai et al.

An overview of software cybernetics

K.Y. Cai et al.

On the test case definition of GUI testing

R.A. DeMillo et al.

Hints on test data selection: help for the practicing programmer

IEEE Computer

(1978)

N. Fenton et al.

Applying Bayesian belief network in systems dependability assessment

P.G. Frankl et al.

Evaluating testing methods by delivered reliability

IEEE Transactions on Software Engineering

(1998)

Henmann, D.S., 1998. Sample implementation of the Littlewood holistic model for assessing software quality, safety and...

O. Hernandez-Lerma

Adaptive Markov Control Processes

(1989)

K. Kanoun et al.

Qualitative and quantitative reliability assessment

IEEE Software

(1997)

Cited by (37)

Reliability assessment of service-based software under operational profile uncertainty
2020, Reliability Engineering and System Safety
Citation Excerpt :
More recently, Silva et al. [23] claim that the predictive ability of the studied reliability models is not affected by OP variations. Oppositely, Cai et al. [24] experimentally evaluate testing techniques performance for reliability assessment under inaccurate profiles, revealing an influence on the estimate accuracy. Chen et al. [25] conduct a case study, using four Software Reliability Growth Models, finding the error of the estimate to grow linearly with the error in the profile estimate.
We address the problem of operational reliability assessment through testing of software services delivered on-demand such as Web Services. Software reliability assessment is typically done for a specific operational profile: the profile is needed in testing to select or generate test cases (operational testing) in a way statistically similar to the anticipated use of software in operation; the observations of success/failure of test executions are used to predict software reliability in actual operation. It is well known that unless the profile is accurate, software reliability predictions obtained via operational testing cannot be trusted.
We present a new way of dealing with the uncertainty in the operational profile adopting a two-stage Bayesian inference for reliability assessment. The technique relies on the availability of information about partitions of the input space. The approach is demonstrated on contrived examples and on a case study of real Web Services. We discuss the usefulness of the approach in dealing with two important practical problems: i) the true profile in operation differs from the one used in testing, ii) the profile in operation is changing continuously.
Optimal selection and release problem in software testing process: A continuous time stochastic control approach
2020, European Journal of Operational Research
This paper studies a joint selection of test cases and release problem for a software under test with predetermined classes of test cases and release time deadline. The software test manager can make three alternative choices dynamically during software testing progress before the deadline: continue testing and select a class of test cases, release the software, or scrap the software, with the objective of minimizing the cumulative testing cost plus penalty cost after releasing or scrapping the software. We formulate the problem as a continuous time stochastic control model and provide a mathematically rigorous method to establish the concavity of the optimal cost function. Based on this property, we are able to characterize that the optimal release policy has a threshold structure. Moreover, the thresholds are founded to be monotone in the residual time length in the case of homogeneous release cost. Besides, we put forward a method based on low convex envelope and discover that the optimal selection policy also has a threshold or other simple structure, if the running cost or the removal cost is the same for all classes. Finally, we present an approximation algorithm of computing the optimal cost function, by which some numerical examples are studied to justify our theoretical results and the robustness of our policy. We also conduct a case study to compare our dynamic selection and release testing policy with two other commonly used testing policies and find that our policy is the best in most instances.
Deriving a frequentist conservative confidence bound for probability of failure per demand for systems with different operational and test profiles
2017, Reliability Engineering and System Safety
Citation Excerpt :
There has been extensive research on the use of statistical methods for estimating software reliability using realistic operational scenarios [12,13]. Adaptive testing strategies have been used to estimate confidence intervals (such as [14,15]) but these strategies are designed to adapt the test profiles once failures are observed, so they are not applicable to testing high integrity systems where no failures are expected. Musa [16] and Crespo et al. [17] have modelled the impact of different operational profiles based on reliability growth during testing, but this is not directly applicable to high integrity systems where we do not expect to observe failures in the final test phase.
Reliability testing is typically used in demand-based systems (such as protection systems) to derive a confidence bound for a specific operational profile. To be realistic, the number of tests for each class of demand should be proportional to the demand frequency of the class. In practice, however, the actual operational profile may differ from that used during testing. This paper provides a means for estimating the confidence bound when the test profile differs from the profile used in actual operation. Based on this analysis the paper examines what bound can be claimed for different types of profile uncertainty and options for dealing with this uncertainty. We also show that the same conservative bound estimation equations can be applied to cases where different measures of software test coverage and operational profile are used.
Modern software cybernetics: New trends
2017, Journal of Systems and Software
Citation Excerpt :
Adaptive testing is a new form of software testing that is based on the feedback and adaptive control principle and can be treated as the software testing counterpart of adaptive control. It means that a software testing strategy should be adjusted on-line by using the testing data collected during software testing (Cai et al., 2007; Cai et al., 2008; Ye et al., 2009). Cai et al. (2007) proposed a new strategy of adaptive software testing to employ fixed-memory feedback for on-line parameter estimations.
Software cybernetics research is to apply a variety of techniques from cybernetics research to software engineering research. For more than fifteen years since 2001, there has been a dramatic increase in work relating to software cybernetics. From cybernetics viewpoint, the work is mainly on the first-order level, namely, the software under observation and control. Beyond the first-order cybernetics, the software, developers/users, and running environments influence each other and thus create feedback to form more complicated systems. We classify software cybernetics as Software Cybernetics I based on the first-order cybernetics, and as Software Cybernetics II based on the higher order cybernetics. This paper provides a review of the literature on software cybernetics, particularly focusing on the transition from Software Cybernetics I to Software Cybernetics II. The results of the survey indicate that some new research areas such as Internet of Things, big data, cloud computing, cyber-physical systems, and even creative computing are related to Software Cybernetics II. The paper identifies the relationships between the techniques of Software Cybernetics II applied and the new research areas to which they have been applied, formulates research problems and challenges of software cybernetics with the application of principles of Phase II of software cybernetics; identifies and highlights new research trends of software cybernetic for further research.
Optimal control based regression test selection for service-oriented workflow applications
2017, Journal of Systems and Software
Regression test selection, which is well known as an effective technology to ensure the quality of modified BPEL applications, is regarded as an optimal control issue. The BPEL applications under test serves as a controlled object and the regression test selection strategy functions as the corresponding controller. The performance index is to select fewest test cases to test modified BPEL applications. In addition, a promising controller (regression test selection approach) should be safe, which means that it can select all test cases in which faults might be exposed in modified versions under controlled regression testing from the original test suite. However, existing safe controllers may rerun some test cases without exposing fault. In addition, the unique features (e.g., dead path elimination semantics, communication mechanism, multi-assignment etc.) of BPEL applications also raise enormous problems in regression test selection. To address these issues, we present in this paper a safe optimal controller for BPEL applications. Firstly, to handle the unique features mentioned above, we transform BPEL applications and their modified versions into universal BPEL forms. Secondly, For our optimal controller, BPEL program dependence graphs corresponding to the two universal BPEL forms are established. Finally, guided by behavioral differences between the two versions, we construct an optimal controller and select test cases to be rerun. By contrast with the previous approaches, our approach can eliminate some unnecessary test cases to be selected. We conducted experiments with 8 BPEL applications to compare our approach with other typical approaches. Experimental results show that the test cases selected using our approach are fewer than other approaches.
The verification of program relationships in the context of software cybernetics
2017, Journal of Systems and Software
Software cybernetics aims at improving the reliability of software by introducing the control theory into software engineering domain systematically. A key issue in software verification is to improve the reliability of software by inspecting whether the software can achieve its expected behaviors. In this paper, the thought of software cybernetics is applied in the process of verification to address this issue and a nested control system is established. The proposed method verifies functional requirements in a dynamic environment with constantly changing user requirements, in which the program serves as a controlled object, and the verification strategy determined by software behavioral model (SBM) serves as a controller. The main contribution of this paper includes: (1) SBM is established in software design phase, and a concern-based construction approach is proposed, which starts from obtaining the software expected functionality extracted from a requirement text; (2) Program abstract-relationship model (PARM) is constructed basing on a process of gradual abstract to be a controlled object; (3) Feedback in a form of intermediate code is generated in the process of verification. The proposed method is validated by our case study.

View all citing articles on Scopus

Kai-Yuan Cai is a Cheung Kong Scholar (Chair Professor), jointly appointed by the Ministry of Education of China and the Li Ka Shing Foundation of Hong Kong in 1999. He has been a full professor at Beihang University (Beijing University of Aeronautics and Astronautics) since 1995. He was born in April 1965 and entered Beihang University as an undergraduate student in 1980. He received his B.S. degree in 1984, M.S. degree in 1987, and Ph.D. degree in 1991, all from Beihang University. He was a research fellow at the Centre for Software Reliability, City University, London, and a visiting scholar at City University of Hong Kong, Swinburge University of Technology (Australia), University of Technology, Sydney (Australia), and Purdue University (USA). Dr. Cai has published over 40 research papers in international journals and is the author of three books: Software Defect and Operational Profile Modeling (Kluwer, Boston, 1998); Introduction to Fuzzy Reliability (Kluwer, Boston, 1996); Elements of Software Reliability Engineering (Tshinghua University Press, Beijing, 1995, in Chinese). He serves on the editorial board of the international journal Fuzzy Sets and Systems and is the editor of the Kluwer International Series on Asian Studies in Computer and Information Science (http://www.wkap.nl/prod/s/ASIS). He served as program committee co-chair for the Fifth International Conference on Quality Software (Melbourne, Australia, September 2005), the First International Workshop on Software Cybernetics (Hong Kong, September 2004), and the Second International Workshop on Software Cybernetics (Edinburgh, UK, July 2005), general co-chair for the Third International Workshop on Software Cybernetics (Chicago, September 2006), and the Second International Symposium on Service-Oriented System Engineering (Shanghai, October 2006). He also served as guest editor for Fuzzy Sets and Systems (1996), the International Journal of Software Engineering and Knowledge Engineering (2006), and the Journal of Systems and Software (2006). His main research interests include software reliability and testing, autonomous flight control, and software cybernetics.

Chang-Hai Jiang was born in September 1983. He receives his B.S degree at Beihang University (Beijing University of Aeronautics and Astronautics) in 2004. Now he is a Ph.D student at Beihang University under the supervision of Professor Kai-Yuan Cai. His main research topics are software testing and software cybernetics.

Hai Hu was born in August 1983. He receives his B.S degree at Beihang University (Beijing University of Aeronautics and Astronautics) in 2004. He worked as a visiting scholar at the University of Texas at Dallas from 2005 to 2006. Now he is a Ph.D student at Beihang University under the supervision of Professor Kai-Yuan Cai. His main research topics are software testing and software cybernetics.

Cheng-Gang Bai received the M.Sc. degree in Statistics from Nanijing University of Aeronautics and Astronautics, Nanjing, China in 1990 and the Ph.D. degree in Control Theory and Control Engineering from Zhejiang University, Zhejiang, China in 1999. From July 1990 to March 1996, he was a Lecturer with the Department of Computer Science and Engineering, Liaocheng University, China. From September 1999 to November 2001, he was a Postdoctoral Fellow with the Department of Automatic Control, Beihang University (Beijing University of Aeronautics and Astronautics), China. In November 2001, he joined the faculty of the Department of Automatic Control, Beihang University, China and has been an Associate Professor since July 2002 in the same university. His research interests include Bayesian statistics, software reliability, software testing.

^☆: Cai was supported by the National Science Foundation of China and Microsoft Research Asia (Grant Nos. 60633010 and 60474006) and the 863 Programme of China (Grant No. 2006AA01Z174). Bai was supported by the National Science Foundation of China (Grant No. 60473067). Cai is also with State Key Laboratory of Virtual Reality Technology and Systems, Beijing, China.

View full text

An experimental study of adaptive testing for software reliability assessment☆

Abstract

Introduction

Section snippets

Adaptive testing for software reliability assessment

Experiment set-up

Experimental results

Analysis of experimental results

Question of concern

Main conclusions

Concluding remarks

Acknowledgement

Information Sciences

Information and Software Technology

Journal of Systems and Software

Information and Software Technology

European Journal of Operational Research

Journal of Systems and Software

Performance Evaluation

Software Testing Techniques

Testing Object-Oriented Systems: Models, Patterns, and Tools

Software Defect and Operational Profile Modeling

An overview of software cybernetics

On the test case definition of GUI testing

Hints on test data selection: help for the practicing programmer

IEEE Computer

Applying Bayesian belief network in systems dependability assessment

Evaluating testing methods by delivered reliability

IEEE Transactions on Software Engineering

Adaptive Markov Control Processes

Qualitative and quantitative reliability assessment

IEEE Software