Simultaneous debugging of software faults

doi:10.1016/j.jss.2010.11.915

Journal of Systems and Software

Volume 84, Issue 4, April 2011, Pages 573-586

https://doi.org/10.1016/j.jss.2010.11.915 Get rights and content

Abstract

(Semi-)automated diagnosis of software faults can drastically increase debugging efficiency, improving reliability and time-to-market. Current automatic diagnosis techniques are predominantly of a statistical nature and, despite typical defect densities, do not explicitly consider multiple faults, as also demonstrated by the popularity of the single-fault benchmark set of programs. We present a reasoning approach, called Zoltar-M(ultiple fault), that yields multiple-fault diagnoses, ranked in order of their probability. Although application of Zoltar-M to programs with many faults requires heuristics (trading-off completeness) to reduce the inherent computational complexity, theory as well as experiments on synthetic program models and multiple-fault program versions available from the software infrastructure repository (SIR) show that for multiple-fault programs this approach can outperform statistical techniques, notably spectrum-based fault localization (SFL). As a side-effect of this research, we present a new SFL variant, called Zoltar-S(ingle fault), that is optimal for single-fault programs, outperforming all other variants known to date.

Introduction

Automatic software fault localization (also known as fault diagnosis) techniques aid developers to pinpoint the root cause of detected failures, thereby reducing the debugging effort. Two approaches can be distinguished:

(1)
the spectrum-based fault localization (SFL) approach, which correlates software component activity with program failures (a statistical approach) (Abreu et al., 2007, Gupta et al., 2005, Jones et al., 2002, Liu et al., 2005, Renieris and Reiss, 2003, Zeller, 2002), and
(2)
the model-based diagnosis or debugging (MBD) approach, which deduces component failure through logic reasoning (de Kleer and Williams, 1987, Feldman et al., 2008, Mayer and Stumptner, 2008, Wotawa et al., 2002).

Because of its low computational complexity, SFL has gained large popularity. Although inherently not restricted to single faults, in most cases these statistical techniques are applied and evaluated in a single-fault context, as demonstrated by the benchmark set of programs widely used by researchers,¹ which is seeded with only one fault per program (version). In practice, however, the defect density of even small programs typically amounts to multiple faults. Although the root cause of a particular program failure need not constitute multiple faults that are acting simultaneously, many failures will be caused by different faults. Hence, the problem of multiple-fault localization (diagnosis) deserves detailed study.

Unlike SFL, MBD traditionally deals with multiple faults. However, apart from much higher computational complexity, the logic models that are used in the diagnostic inference are typically based on static program analysis. Consequently, they do not exploit execution behavior, which, in contrast, is the essence of the SFL approach. Combining the dynamic approach of SFL with the multiple-fault logic reasoning approach of MBD, in this paper, we present a multiple-fault reasoning approach that is based on the dynamic, spectrum-based observations of SFL. Additional reasons to study the merits of this approach are the following.

•
Diagnoses are returned in terms of multiple faults, whereas statistical techniques return a one-dimensional list of single fault locations only. The information on fault multiplicity is attractive from parallel debugging point of view (Jones et al., 2007).
•
Unlike statistical approaches, multiple-fault diagnoses only include valid candidates, and are asymptotically optimal with increasing test information (Abreu et al., 2008).
•
The ranking of the diagnoses is based on probability instead of similarity. This implies that the quality of a diagnosis can be expressed in terms of information entropy or any other metric that is based on probability theory (Pietersma and van Gemund, 2006).
•
The reasoning approach naturally accommodates additional (model) information about component behavior, increasing diagnostic performance when more information about component behavior is available.

To illustrate the difference between multiple-fault and the statistical approach, consider a triple-fault (sub)program with faulty components c₁, c₂, and c₃. Whereas under ideal testing circumstances a traditional SFL approach would produce multiple single-fault diagnoses (in terms of the component indices) like {{1}, {2}, {3}, {4}, {5}, …} (ordered in terms of statistical similarity), a multiple-fault approach would simply produce one single multiple-fault diagnosis {{1, 2, 3}}. Although the statistical similarity of the first three items in the former diagnosis would be highest, the latter, single diagnosis unambiguously reveals the actual triple fault.

Despite the above advantages, a reasoning approach is more costly than statistical approaches because an exponential number of multiple-fault candidates need to be processed instead of just the (M, being M the number of components in the system under analysis) single-fault candidates. In this paper, we compare our reasoning approach to several statistical approaches. Our study is based on random synthetic spectra, as well as on several benchmark programs, extended by us to accommodate multiple faults. More specifically, this paper makes the following five contributions.

•
We introduce a multiple-fault diagnosis approach that originates from the model-based diagnosis area, but which is specifically adapted to the interaction dynamics of software. The approach is coined Zoltar-M (Zoltar for the name of our debugging tool set (Janssen et al., 2009),² M for multiple-fault).
•
We show how our reasoning approach applies to single-fault programs, yielding a provably optimal SFL variant, called Zoltar-S (S for single-fault), as of yet unknown in literature.
•
We introduce a general, multiple-fault, probabilistic program (spectrum) model, parameterized in terms of size, testing code coverage, and testing fault coverage, to theoretically study Zoltar-M, compared to statistical techniques such as Tarantula and Zoltar-S.
•
We extend the traditional, single-fault benchmark set of programs (referred to as SIR-S) with a multiple-fault version (SIR-M), by combining the existing single-fault versions, to empirically evaluate debugging performance under realistic, multiple-fault conditions.
•
We investigate the ability of all techniques to deduce program fault multiplicity, which is aimed at providing a good estimate to guide parallel debugging, using an approach that substantially differs from Jones et al. (2007).

To the best of our knowledge, this is the first paper to specifically address software multiple-fault localization using a spectrum-based, logic reasoning approach, yielding two new localization techniques Zoltar-S and Zoltar-M, implemented within our Zoltar SFL framework. Our experiments confirm that Zoltar-S is superior to all known similarity coefficients for the Siemens-S benchmark. More importantly however, our experiments for multiple-fault programs show that although for synthetic spectra Zoltar-M is outperformed by Zoltar-S, for our SIR-M experiments Zoltar-M outperforms all similarity coefficients known to date.

The paper is organized as follows. In the next section, we present the concepts and terminology used throughout the paper. In Section 3, our multiple-fault localization approach is described, as well as a derivation of the optimal similarity coefficient for single-fault programs. In Section 4, the approaches are theoretically evaluated, and in Section 5, real programs are used to assess the capabilities of the studied techniques for fault localization. Related work is discussed in Section 6. Preliminary results of Sections 4 Theoretical evaluation, 5 Empirical evaluation appeared in Abreu et al. (2009a). We conclude and discuss future work in Section 7.

Section snippets

Preliminaries

In this section, we introduce basic definitions as well as the traditional SFL approach. As defined in Avižienis et al. (2004), in the remainder of this paper, we use the following terminology.

•
A failure is an event that occurs when delivered service deviates from correct service.
•
An error is the part of the total state of the system that may cause a failure.
•
A fault is the cause of an error in the system.

To illustrate these concepts, consider the C function in Fig. 1. It is meant to sort, using

Multiple-fault localization

In this section, we present our multiple-fault localization approach Zoltar-M, which is based on reasoning as performed in model-based diagnosis, combined with (Bayesian) probability theory to compute the ranking of the candidates. The major difference with the statistical approach in Section 2.2 is

•
that only a subset of components is considered (the so-called hitting set) in contrast to all components,
•
all computed candidates logically explain the observed failures, and
•
that the ranking is based

Theoretical evaluation

In order to gain understanding of the effects of the various parameters on the diagnostic performance of the different approaches, we use a simple, probabilistic model of program behavior that is directly based on C, N, M, r, and g. Without loss of generality we model the first C of the M components to be at fault. For each run, every component has probability r to be involved in that run. If a selected component is faulty, the probability of exhibiting nominal (“good”) behavior equals g. When

Empirical evaluation

Whereas the synthetic observation matrices used in the previous section are populated using a uniform distribution, this is not the case with observation matrices for the behavior of actual programs (different spectral distribution). Therefore, in this section we will evaluate the same diagnosis techniques on the SIR-S set, which provides the programs introduced in Section 3.3 extended with the real-world, large programs space, gzip, and sed (see Table 5. In addition, we also evaluated our

Related work

As mentioned in the introduction, automated debugging techniques can be distinguished into statistical and logic reasoning approaches that use program models.

In model-based reasoning to automatic software debugging (MBSD), the program model is typically generated from the source code using static analysis, as opposed to the traditional application of model based diagnosis where the model is obtained from a formal specification of the (physical) system (Reiter, April 1987). An overview of

Conclusions and future work

In this paper, we have presented a multiple-fault localization technique, Zoltar-M, which is based on the dynamic, spectrum-based measurement approach from statistical fault localization methods, combined with a logic (and probabilistic) reasoning approach from model-based diagnosis, inspired by previous work in both separate disciplines (Abreu et al., 2007, Feldman et al., 2008). We have compared the performance of Zoltar-M with Tarantula and Ochiai, which are among the best known statistical

Acknowledgments

We extend our gratitude to Johan de Kleer for discussions which have influenced our multiple-fault reasoning approach. Also thanks to Rafi Vayani for conducting initial experiments on the effect of the hitting set filter in the single-fault case. Finally, we acknowledge the feedback from the discussions with our TRADER project partners.

Rui Abreu is with the Department of Informatics of the Faculty of Engineering of University of Porto as an Assistant Professor. He obtained his PhD. in Computer Science at the Software Engineering Research Group at Delft University of Technology. He holds an MSc. in Computer Science and Systems Engineering from Minho University, Portugal. Through his thesis work at Siemens R&D Porto, and professional internship at Philips Research, he acquired industrial experience in the area of quality of

References (33)

J. de Kleer et al.
Characterizing diagnoses and systems
Artif. Intel.
(1992)
J. de Kleer et al.
Diagnosing multiple faults
Artif. Intel.
(1987)
R. Reiter
A theory of diagnosis from first principles
Artif. Intel.
(April 1987)
R. Abreu et al.
A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis
R. Abreu et al.
Localizing software faults simultaneously
R. Abreu et al.
On the accuracy of spectrum-based fault localization
R. Abreu et al.
An observation-based model for fault localization
R. Abreu et al.
Spectrum-based multiple fault localization
A. Avižienis et al.
Basic concepts and taxonomy of dependable and secure computing
IEEE Trans. Dependable Sec. Comput.
(2004)
G.K. Baah et al.
The probabilistic program dependence graph and its application to fault diagnosis

T.M. Chilimbi et al.

Holmes: effective statistical debugging via efficient path profiling

J. de Kleer

Diagnosing intermittent faults

M. Esser et al.

Automated test generation from models based on functional software specifications

A. Feldman et al.

Computing minimal diagnoses by greedy stochastic search

A. Groce et al.

Error explanation with distance metrics

Int. J. Software Tools Technol. Transfer (STTT)

(2006)

N. Gupta et al.

Locating faulty code using failure-inducing chops

Cited by (49)

A systematic mapping study of bug reproduction and localization
2024, Information and Software Technology
Identifying the root cause of a software bug and fixing it is challenging. One reason for this is that many bugs are not reproducible during bug fixing.
We aim to provide an overview of existing works on bug reproduction and localization. We ask four research questions: RQ1: What types of problems have been studied in the area of bug reproduction and localization? RQ2: How are problems studied in previous research? RQ3: What are the main findings and outcomes of previous studies? RQ4: What are the gaps and challenges identified in previous studies?
We conducted a systematic mapping study analyzing research literature published between 2011 and 2021. The search for primary studies involved four major computer science digital libraries and resulted in 134 studies for analysis.
Regarding RQ1 we found that many studies focus on information retrieval-based approaches to support bug reproduction and localization. Regarding RQ2 we found that bug reports and source code are the typical data sources of bug reproduction and localization. Also, most studies include experiments with historical data but do not investigate ongoing projects. Regarding RQ3 we found that many studies adapt or combine existing approaches for bug reproduction and localization to improve their accuracy or applicability (e.g., combine requirements-related information and bug reports to increase information-retrieval-based techniques). Regarding RQ4 we found that existing solutions for bug reproduction and localization have rarely been integrated into the workflow of developers.
Although bug reproduction and localization have been studied in quite some detail, new challenges and gaps emerge due to the evolution of software technologies and practices and the practical needs of software developers. For example, bug reproduction approaches for traditional web applications do not work well with modern “Single Page Web Applications” (SPA) and related technologies, e.g., Angular or React.
Learning test-mutant relationship for accurate fault localisation
2023, Information and Software Technology
Automated fault localisation aims to assist developers in the task of identifying the root cause of the fault by narrowing down the space of likely fault locations. Simulating variants of the faulty program called mutants, several Mutation Based Fault Localisation (MBFL) techniques have been proposed to automatically locate faults. Despite their success, existing MBFL techniques suffer from the cost of performing mutation analysis after the fault is observed.
To overcome this shortcoming, we propose a new MBFL technique named SIMFL (Statistical Inference for Mutation-based Fault Localisation). SIMFL localises faults based on the past results of mutation analysis that has been done on the earlier version in the project history, allowing developers to make predictions on the location of incoming faults in a just-in-time manner. Using several statistical inference methods, SIMFL models the relationship between test results of the mutants and their locations, and subsequently infers the location of the current faults.
The empirical study on Defects4J dataset shows that SIMFL can localise 113 faults on the first rank out of 224 faults, outperforming other MBFL techniques. Even when SIMFL is trained on the predicted kill matrix, SIMFL can still localise 95 faults on the first rank out of 194 faults. Moreover, removing redundant mutants significantly improves the localisation accuracy of SIMFL by the number of faults localised at the first rank up to 51.
This paper proposes a new MBFL technique called SIMFL, which exploits ahead-of-time mutation analysis to localise current faults. SIMFL is not only cost-effective, as it does not need a mutation analysis after the fault is observed, but also capable of localising faults accurately.
Spectrum-based multi-fault localization using Chaotic Genetic Algorithm
2021, Information and Software Technology
In the field of software engineering, the most complex and time consuming activity is fault-finding. Due to increasing size and complexity of software, there is a necessity of automated fault detection tool which can detect fault with minimal human intervention. A programmer spends a lot of time and effort on software fault localization. Various Spectrum Based Fault Localization (SBFL) techniques have already been developed to automate the fault localization in single-fault software. But, there is a scarcity of fault localization technique for multi-fault software. In our study, we have found that pure SBFL is not always sufficient for effective fault localization in multi-fault programs.
To address the above challenge, we propose an automated framework using Chaos-based Genetic Algorithm for Multi-fault Localization (CGAML) based on SBFL technique.
Traditional Genetic Algorithm (GA) sometimes stuck in local optima, and it takes more time to converge. Different chaos mapping functions have been applied to GA for better performance. We have used logistic mapping function to achieve chaotic sequence. The proposed technique CGAML first calculates the suspiciousness score for each program statement and then assigns ranks according to that score. The statements having smaller rank means there is a high probability of the statements to be faulty.
Five open-source benchmark programs are tested to evaluate the efficiency of CGAML technique. The experimental results show CGAML gives better results for both single-fault and multi-fault programs in comparison with existing spectrum-based fault localization techniques.
$E X A M$ metric is used to compare the performance of our proposed technique with other existing techniques. Smaller $E X A M$ score denotes the higher accuracy of the technique. The proposed framework generates smaller $E X A M$ score in comparison with other existing techniques. We found that, overall CGAML works on an average 8.5% better than GA for both single-fault and multi-fault software.
Multiple fault localization of software programs: A systematic literature review
2020, Information and Software Technology
Citation Excerpt :
However, their proposal has not been implemented or empirically validated. Abreu et al. proposed a reasoning fault localization approach named Zoltar-M. the approach generates multiple-fault diagnoses and rank them in order of their probability [56, 61]. The authors aim is to effectively localize multiple faults simultaneously.
Multiple fault localization (MFL) is the act of identifying the locations of multiple faults (more than one fault) in a faulty software program. This is known to be more complicated, tedious, and costly in comparison to the traditional practice of presuming that a software contains a single fault. Due to the increasing interest in MFL by the research community, a broad spectrum of MFL debugging approaches and solutions have been proposed and developed.
The aim of this study is to systematically review existing research on MFL in the software fault localization (SFL) domain. This study also aims to identify, categorize, and synthesize relevant studies in the research domain.
Consequently, using an evidence-based systematic methodology, we identified 55 studies relevant to four research questions. The methodology provides a systematic selection and evaluation process with rigorous and repeatable evidence-based studies selection process.
The result of the systematic review shows that research on MFL is gaining momentum with stable growth in the last 5 years. Three prominent MFL debugging approaches were identified, i.e. One-bug-at-a-time debugging approach (OBA), parallel debugging approach, and multiple-bug-at-a-time debugging approach (MBA), with OBA debugging approach being utilized the most.
The study concludes with some identified research challenges and suggestions for future research. Although MFL is becoming of grave concern, existing solutions in the field are less mature. Studies utilizing real faults in their experiments are scarce. Concrete solutions to reduce MFL debugging time and cost by adopting an approach such as MBA debugging approach are also less, which require more attention from the research community.
Batch repair actions for automated troubleshooting
2020, Artificial Intelligence
Repairing a set of components as a batch is often cheaper than repairing each of them separately. A primary reason for this is that initiating a repair action and testing the system after performing a repair action often incurs non-negligible overhead. However, most troubleshooting algorithms proposed to date do not consider the option of performing batch repair actions. In this work we close this gap, and address the combinatorial problem of choosing which batch repair action to perform so as to minimize the overall repair costs. We call this problem the Batch Repair Problem (BRP) and formalize it. Then, we propose several approaches for solving it. The first seeks to choose to repair the set of components that are most likely to be faulty. The second estimates the cost wasted by repairing a given set of components, and tried to find the set of components that minimizes these costs. The third approach models BRP as a Stochastic Shortest Path Problem (SSP-MDP) [1], and solves the resulting problem with a dedicated solver. Experimentally, we compare the pros and cons of the proposed BRP algorithms on a standard Boolean circuit benchmark and a novel benchmark from the Physiotherapy domain. Results show the clear benefit of performing batch repair actions with our BRP algorithms compared to repairing components one at a time.
A single fault localization technique based on failed test input
2019, Array
Citation Excerpt :
Degree centrality and closeness centrality were adopted for fault diagnosis and a new ranking formula was also proposed. Most recent studies have shifted to fault localization on multiple faults [4,17,33,47–50] introducing various approaches and methods to localize faults efficiently. Before we start explaining our proposed technique, we need to revisit our initial benchmark technique (FLCN) and its observable limitation on single-fault context.
Testing and debugging are very important tasks in software development. Fault localization is a very critical activity in the debugging process and also is one of the most difficult and time-consuming activities. The demand for effective fault localization techniques that can aid developers to the location of faults is high. In this paper, a fault localization technique based on complex network theory named FLCN-S is proposed to improve localization effectiveness on single-fault subject programs. The proposed technique diagnoses and ranks faulty program statements based on their behavioral anomalies and distance between statements in failed tests execution by utilizing two network centrality measures (degree centrality and closeness centrality). The proposed technique is evaluated on a well-known standard benchmark (Siemens test suite) and four Unix real-life utility subject programs (gzip, sed, flex, and grep). Overall, the results show that FLCN-S is significantly more effective in locating faults in comparison with other techniques. Furthermore, we observed that both degree and closeness centrality play a vital role in the identification of faults.

View all citing articles on Scopus

Peter Zoeteweij works at IntelliMagic as a Software Developer. He holds an MSc. from Delft University of Technology, and a PhD. from the University of Amsterdam, both in computer science. Before his PhD., Peter worked for several years as a software engineer for Logica (now LogicaCMG), mainly on software for the oil industry.

Arjan J.C. van Gemund holds a BSc. in physics, and an MSc. (cum laude) and PhD. (cum laude) in computer science, all from Delft University of Technology. He has held positions at DSM and TNO, and currently serves as a full professor at the Electrical Engineering, Mathematics, and Computer Science Faculty of Delft University of Technology.

^☆: This work has been carried out as part of the TRADER project under the responsibility of the Embedded Systems Institute. This project is partially supported by the Netherlands Ministry of Economic Affairs under the BSIK03021 program.

View full text

Simultaneous debugging of software faults☆

Abstract

Introduction

Section snippets

Preliminaries

Multiple-fault localization

Theoretical evaluation

Empirical evaluation

Related work

Conclusions and future work

Acknowledgments

Artif. Intel.

Artif. Intel.

Artif. Intel.

A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis

Localizing software faults simultaneously

On the accuracy of spectrum-based fault localization

An observation-based model for fault localization

Spectrum-based multiple fault localization

Basic concepts and taxonomy of dependable and secure computing

IEEE Trans. Dependable Sec. Comput.

The probabilistic program dependence graph and its application to fault diagnosis

Holmes: effective statistical debugging via efficient path profiling

Diagnosing intermittent faults

Automated test generation from models based on functional software specifications

Computing minimal diagnoses by greedy stochastic search

Error explanation with distance metrics

Int. J. Software Tools Technol. Transfer (STTT)

Locating faulty code using failure-inducing chops