Controversy cornerRobustness of spectrum-based fault localisation in environments with labelling perturbations
Introduction
Software and software systems can be found everywhere in daily life and software failures are frequently encountered. Many failures are due to faults (or bugs) that are embedded in programs during development, and debugging is a very effective way of identifying their presence. It is commonly recognized that debugging is important but resource-consuming in software engineering, in which fault localisation is one of the essential activities. Due to a substantial amount of manual involvement, fault localisation is a very resource-consuming task in the software development lifecycle. Therefore, many researchers have proposed various automatic and effective techniques for fault localisation to decrease its cost and to increase the software quality.
One promising approach towards fault localisation is spectrum-based fault localisation (referred to as SBFL in this article). SBFL refers to the automatic mechanism for predicting fault positions in a faulty program by analysing the dynamic program spectra that are captured in program runs. Typically, a program element that is always exercised in failed runs and never exercised in passed runs has a high chance of explaining the observed failures and is deemed very suspicious, in terms of relations to one or more faults. Many heuristics (Jones and Harrold, 2005, Abreu et al., 2007, Gong et al., 2012, Steimann et al., 2013, Neelofar et al., 2017) and mathematical models (Liblit et al., 2005, Liu et al., 2006, Zhang et al., 2011, Gore and Reynolds, 2012, Tang et al., 2017) have been proposed.
SBFL has received substantial attention due to its simplicity and effectiveness. It takes as inputs a faulty program and a test suite and produces as output a ranked list of suspicious code locations at which the program may be defective. Ideally, there should be no perturbations to the input of SBFL. In this way, output values of an SBFL technique will not be changed by the perturbation parameters of input values. Unfortunately, perturbations could occur in real testing process. They may produce errors to the obtained test information, which are propagated by SBFL techniques and cause unexpected results. Consequently, we have a problem: will a small perturbation of the input cause a large variance?
There could be various types of perturbations in the process of testing and debugging. As an attempt to study the above problem, we focus on labelling perturbations in this work. This type of perturbations is caused by the incorrect labelling on a small number of test cases in a test suite, for example mislabelling a test case as passed although it is actually failed or vice versa. It could be very common due to the facts such as human errors, imperfect development of test systems, differences between the test environment and the actual execution environment, and etc. We expect a fault localisation technique to be robust to the perturbations. However, it has been observed that, even under a small labelling perturbation, there may be a substantial impact on the results of fault localisation. In the example shown in Table 1,1 although there is only one test case mislabelled as failed, the faulty statement is given the lowest suspiciousness degree by Naish1, which has been evaluated as one of the “maximal” risk evaluation formula under the single-fault scenario. The ranking of the faulty statement drops from the first to almost the last. For other formulas, for example Jaccard, we can find a similar situation as shown in Table 1.
In this paper, we first provide a theoretical investigation of the impacts of labelling perturbations on the accuracy of risk evaluation formulas. The preservation of three relations among the formulas, namely, strict equivalence relations (Naish et al., 2011), Xie and Chen's equivalence relations (Xie et al., 2013a), and Xie and Chen's order relations (Xie et al., 2013a), is proved under the scenario of labelling perturbations. In addition, the problem of how the labelling perturbation influences the outputting rank list of formulas is theoretically studied both in the scenario of all mislabelling activities are in the same direction and the mislabelling activities are in different directions.
To further explore the impacts of labelling perturbations on different risk evaluation formulas, we conducted controlled experiments using the Siemens suite, UNIX utility software, space and Defects4J. The robustness of 23 classes of risk evaluation formulas and their impact factors, including perturbation degrees, number of faults and types of labelling perturbation are empirically studied. We observe that (1) Different risk evaluation formulas usually have different robustness values and the robustness values of risk evaluation formulas are not positive or negative correlation to their Expense; (2) The robustness of most risk evaluation formulas decreases with the increase of perturbation degrees; (3) Most formulas show an increasing trend of robustness with the increase of the number of faults; (4) On average, the impacts of mislabelling passed cases as failed cases are greater than the ones of mislabelling failed cases as passed cases. Based on the findings, a new metric is proposed for evaluating risk evaluation formulas by synthetically determining their robustness and accuracy. Experiments show the rationality of the metric. Besides, we also perform the experiments to evaluate the robustness of two neural network-based fault localization techniques.
The rest of this article is organized as follows: Section 2 provides the problem description and motivation of this work. Section 3 introduces the theoretical analysis from two aspects. Section 4 presents an empirical study on 18 programs with 3079 faulty versions from different domains. Section 5 discusses the threats to the validity of this work. In Section 6, a review of previous theoretical and empirical studies is presented. Finally, the conclusions for this work is discussed in Section 7.
Section snippets
Spectrum-based fault localisation and its labelling perturbations
Spectrum-based fault localisation (SBFL) refers to the automatic mechanism for predicting potential fault positions in a faulty program by analysing the dynamic program spectra that are captured in program runs. With the approach, each structural element in the program is assigned a suspiciousness value that corresponds to the relative likelihood of the element containing one or more of the faults (Liblit et al., 2005, Jones and Harrold, 2005, Abreu et al., 2007). The concept of spectrum-based
Theoretical analysis
In this section, we present mathematical proofs for the cases in which the accuracy of formulas was observed to have deteriorated, improved or been preserved by considering the perturbations. We carry out the theoretical analysis from two main aspects. First, we analyse the influence of labelling perturbations on three relations among the formulas: strict equivalence relations (Naish et al., 2011), Xie and Chen's equivalence relations (Xie et al., 2013a), and Xie and Chen's order relations (
Controlled experiments
In order to verify the theoretical analysis results and further analyse the impacts of parameters, in this section we design a number of controlled experiments on 18 programs, 23 risk formulas and 2 neural network-based techniques.
Threats to validity
The discussion regarding threats to validity focuses on internal, external and constructing validities. The primary threat to the internal validity involves the correctness of our techniques, which includes the implementation of the fault localisation techniques with or without considering labelling perturbations. The implementation of these techniques was manually evaluated by applying them to small programs; the data that were collected in the experiment, however, were manually evaluated by
Related work
Spectrum-based fault localisation (SBFL) has undergone long-term development and evolution and some work was carried out over a decade ago (Jones et al., 2005). To date, many SBFL techniques have been proposed based on various granularities of program components, including predicate-based techniques (Liblit et al., 2005, Liu et al., 2006), statement-based techniques (Jones and Harrold, 2005, Abreu et al., 2007), and path-oriented techniques (Chilimbi et al., 2009). Moreover, different
Conclusions and future work
No matter how careful testers and debuggers are, how effective and efficient a test system is, and how close the test environment is to reality, we cannot avoid the mislabelling of test cases. Therefore, for a fault localisation technique, it is necessary not only to have a high accuracy in a perfect environment but also to maintain high accuracy in environments with labelling perturbations. In this paper, we theoretically analyse and experimentally study the impacts of labelling perturbations
Yanhong Xu is a M.Sc. student at School of Automation Science and Electrical Engineering, Beihang University. She obtained her bachelor degree in 2016 from Beihang University. Her research interest is program debugging.
References (77)
- et al.
A practical evaluation of spectrum-based fault localization
J. Syst. Softw.
(2009) - et al.
On the adoption of MC/DC and control-flow adequacy for a tight integration of program testing and statistical fault localization
Inf. Softw. Technol.
(2013) - et al.
How well does test case prioritization integrate with statistical fault localization?
Inf. Softw. Technol.
(2012) - et al.
HSFal: Effective fault localization using hybrid spectrum of full slices and execution slices
J. Syst. Softw.
(2014) - et al.
A proposed index for measuring agreement in test-retest studies
J. Chronic. Dis.
(1966) - et al.
Metamorphic slice: an application in spectrum-based fault localization
Inf. Softw. Technol.
(2013) - et al.
A theoretical analysis on cloning the failed test cases to improve spectrum-based fault localization
J. Syst. Softw.
(2017) - et al.
Exploring the usefulness of unlabelled test cases in software fault localization
J. Syst. Softw.
(2018) - et al.
Fault localization based only on failed runs
Computer
(2012) - et al.
Non-parametric statistical fault localization
J. Syst. Softw.
(2011)
An evaluation of similarity coefficients for software fault localization
On the accuracy of spectrum-based fault localization
Cluster analysis for applications
NY Publication: Probability and Mathematical Statistics
Tester feedback driven fault localization
Entropy-based test generation for improved fault localization
SMOTE: synthetic minority over-sampling technique
J. Artificial Intelligence Res.
Pinpoint: problem determination in large, dynamic internet services
HOLMES: effective statistical debugging via efficient path profiling
A coefficient of agreement for nominal scales
Educ. Psychol. Measur.
Insights on fault interference for programs with multiple bugs
Neural network training on unequally represented classes
Intell. Eng. Syst. Through Artif. Neural Netw
Measures of the amount of ecologic association between species
Ecology
Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact
Emp. Softw. Eng.
Comparison of similarity coefficients based on RAPD markers in the common bean
Gen. Mol. Biol.
Graphical Techniques for Multivariate Data
Estimating the accuracy of dichotomous judgments
Psychometrika
Effects of class imbalance in test suites: an empirical study of spectrum-based fault localization
Automatic error detection techniques based on dynamic invariants
Measures of association for cross classifications
J. Am. Statist. Assoc.
Reducing confounding bias in predicate-level statistical debugging metrics
A Comprehensive Survey of Trends in Oracles for Software Testing
Learning from imbalanced data
IEEE Trans. Knowl. Data Eng.
Empirical evaluation of the tarantula automatic fault-localization technique
Visualization of test information to assist fault localization
Defects4J: A database of existing faults to enable controlled testing studies for Java programs
Taxicab geometry
Math. Teacher
Theory and practice, do they match? A case with spectrum-based fault localization
Study of the relationship of bug consistency with respect to performance of spectra metrics
Cited by (0)
Yanhong Xu is a M.Sc. student at School of Automation Science and Electrical Engineering, Beihang University. She obtained her bachelor degree in 2016 from Beihang University. Her research interest is program debugging.
Beibei Yin is currently a lecturer at Beihang University of China. She received the Ph.D. degree from Beihang University, China, in 2010. She was working as a research scholar in the Department of Electrical and Computer Engineering at Duke Univesity in 2016. Her research interests include software testing, software reliability, and software cybernetics. She has published research results in venues such as TSE, T-Rel, Inf. Sci. and ISSRE.
Zheng Zheng is a Professor at Beihang University of China. He received his Ph.D. degree in computer software and theory in Chinese Academy of Science. In 2014 he was with Department of Electrical and Computer Engineering at Duke University, working as a research scholar. His research interests include software fault localization and software dependability modeling. He has published research results in venues such as TDSC, TSC, T-Rel, JSS, COR and ISSRE.
Xiaoyi Zhang is a Ph.D. candidate at School of Automation Science and Electrical Engineering, Beihang University. He obtained his bachelor degree in Beihang University in 2011. His research interests include program debugging and program repairing.
Chenglong Li is a Ph.D. candidate at School of Automation Science and Electrical Engineering, Beihang University. He obtained his bachelor degree in 2018 from Beihang University. His research interest is program debugging.
Shunkun Yang is an Associate Professor at Beihang University of China. He received his Ph.D. degree in Beihang University. His research interests include software testing and software reliability.
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61772055, 61872169), Equipment Preliminary R&D Project of China (No. 41402020102), Technical Foundation Project of Ministry of Industry and Information Technology of China (JSZL2016601B003).