Simultaneous debugging of software faults

https://doi.org/10.1016/j.jss.2010.11.915Get rights and content

Abstract

(Semi-)automated diagnosis of software faults can drastically increase debugging efficiency, improving reliability and time-to-market. Current automatic diagnosis techniques are predominantly of a statistical nature and, despite typical defect densities, do not explicitly consider multiple faults, as also demonstrated by the popularity of the single-fault benchmark set of programs. We present a reasoning approach, called Zoltar-M(ultiple fault), that yields multiple-fault diagnoses, ranked in order of their probability. Although application of Zoltar-M to programs with many faults requires heuristics (trading-off completeness) to reduce the inherent computational complexity, theory as well as experiments on synthetic program models and multiple-fault program versions available from the software infrastructure repository (SIR) show that for multiple-fault programs this approach can outperform statistical techniques, notably spectrum-based fault localization (SFL). As a side-effect of this research, we present a new SFL variant, called Zoltar-S(ingle fault), that is optimal for single-fault programs, outperforming all other variants known to date.

Introduction

Automatic software fault localization (also known as fault diagnosis) techniques aid developers to pinpoint the root cause of detected failures, thereby reducing the debugging effort. Two approaches can be distinguished:

  • (1)

    the spectrum-based fault localization (SFL) approach, which correlates software component activity with program failures (a statistical approach) (Abreu et al., 2007, Gupta et al., 2005, Jones et al., 2002, Liu et al., 2005, Renieris and Reiss, 2003, Zeller, 2002), and

  • (2)

    the model-based diagnosis or debugging (MBD) approach, which deduces component failure through logic reasoning (de Kleer and Williams, 1987, Feldman et al., 2008, Mayer and Stumptner, 2008, Wotawa et al., 2002).

Because of its low computational complexity, SFL has gained large popularity. Although inherently not restricted to single faults, in most cases these statistical techniques are applied and evaluated in a single-fault context, as demonstrated by the benchmark set of programs widely used by researchers,1 which is seeded with only one fault per program (version). In practice, however, the defect density of even small programs typically amounts to multiple faults. Although the root cause of a particular program failure need not constitute multiple faults that are acting simultaneously, many failures will be caused by different faults. Hence, the problem of multiple-fault localization (diagnosis) deserves detailed study.

Unlike SFL, MBD traditionally deals with multiple faults. However, apart from much higher computational complexity, the logic models that are used in the diagnostic inference are typically based on static program analysis. Consequently, they do not exploit execution behavior, which, in contrast, is the essence of the SFL approach. Combining the dynamic approach of SFL with the multiple-fault logic reasoning approach of MBD, in this paper, we present a multiple-fault reasoning approach that is based on the dynamic, spectrum-based observations of SFL. Additional reasons to study the merits of this approach are the following.

  • Diagnoses are returned in terms of multiple faults, whereas statistical techniques return a one-dimensional list of single fault locations only. The information on fault multiplicity is attractive from parallel debugging point of view (Jones et al., 2007).

  • Unlike statistical approaches, multiple-fault diagnoses only include valid candidates, and are asymptotically optimal with increasing test information (Abreu et al., 2008).

  • The ranking of the diagnoses is based on probability instead of similarity. This implies that the quality of a diagnosis can be expressed in terms of information entropy or any other metric that is based on probability theory (Pietersma and van Gemund, 2006).

  • The reasoning approach naturally accommodates additional (model) information about component behavior, increasing diagnostic performance when more information about component behavior is available.

To illustrate the difference between multiple-fault and the statistical approach, consider a triple-fault (sub)program with faulty components c1, c2, and c3. Whereas under ideal testing circumstances a traditional SFL approach would produce multiple single-fault diagnoses (in terms of the component indices) like {{1}, {2}, {3}, {4}, {5}, …} (ordered in terms of statistical similarity), a multiple-fault approach would simply produce one single multiple-fault diagnosis {{1, 2, 3}}. Although the statistical similarity of the first three items in the former diagnosis would be highest, the latter, single diagnosis unambiguously reveals the actual triple fault.

Despite the above advantages, a reasoning approach is more costly than statistical approaches because an exponential number of multiple-fault candidates need to be processed instead of just the (M, being M the number of components in the system under analysis) single-fault candidates. In this paper, we compare our reasoning approach to several statistical approaches. Our study is based on random synthetic spectra, as well as on several benchmark programs, extended by us to accommodate multiple faults. More specifically, this paper makes the following five contributions.

  • We introduce a multiple-fault diagnosis approach that originates from the model-based diagnosis area, but which is specifically adapted to the interaction dynamics of software. The approach is coined Zoltar-M (Zoltar for the name of our debugging tool set (Janssen et al., 2009),2 M for multiple-fault).

  • We show how our reasoning approach applies to single-fault programs, yielding a provably optimal SFL variant, called Zoltar-S (S for single-fault), as of yet unknown in literature.

  • We introduce a general, multiple-fault, probabilistic program (spectrum) model, parameterized in terms of size, testing code coverage, and testing fault coverage, to theoretically study Zoltar-M, compared to statistical techniques such as Tarantula and Zoltar-S.

  • We extend the traditional, single-fault benchmark set of programs (referred to as SIR-S) with a multiple-fault version (SIR-M), by combining the existing single-fault versions, to empirically evaluate debugging performance under realistic, multiple-fault conditions.

  • We investigate the ability of all techniques to deduce program fault multiplicity, which is aimed at providing a good estimate to guide parallel debugging, using an approach that substantially differs from Jones et al. (2007).

To the best of our knowledge, this is the first paper to specifically address software multiple-fault localization using a spectrum-based, logic reasoning approach, yielding two new localization techniques Zoltar-S and Zoltar-M, implemented within our Zoltar SFL framework. Our experiments confirm that Zoltar-S is superior to all known similarity coefficients for the Siemens-S benchmark. More importantly however, our experiments for multiple-fault programs show that although for synthetic spectra Zoltar-M is outperformed by Zoltar-S, for our SIR-M experiments Zoltar-M outperforms all similarity coefficients known to date.

The paper is organized as follows. In the next section, we present the concepts and terminology used throughout the paper. In Section 3, our multiple-fault localization approach is described, as well as a derivation of the optimal similarity coefficient for single-fault programs. In Section 4, the approaches are theoretically evaluated, and in Section 5, real programs are used to assess the capabilities of the studied techniques for fault localization. Related work is discussed in Section 6. Preliminary results of Sections 4 Theoretical evaluation, 5 Empirical evaluation appeared in Abreu et al. (2009a). We conclude and discuss future work in Section 7.

Section snippets

Preliminaries

In this section, we introduce basic definitions as well as the traditional SFL approach. As defined in Avižienis et al. (2004), in the remainder of this paper, we use the following terminology.

  • A failure is an event that occurs when delivered service deviates from correct service.

  • An error is the part of the total state of the system that may cause a failure.

  • A fault is the cause of an error in the system.

To illustrate these concepts, consider the C function in Fig. 1. It is meant to sort, using

Multiple-fault localization

In this section, we present our multiple-fault localization approach Zoltar-M, which is based on reasoning as performed in model-based diagnosis, combined with (Bayesian) probability theory to compute the ranking of the candidates. The major difference with the statistical approach in Section 2.2 is

  • that only a subset of components is considered (the so-called hitting set) in contrast to all components,

  • all computed candidates logically explain the observed failures, and

  • that the ranking is based

Theoretical evaluation

In order to gain understanding of the effects of the various parameters on the diagnostic performance of the different approaches, we use a simple, probabilistic model of program behavior that is directly based on C, N, M, r, and g. Without loss of generality we model the first C of the M components to be at fault. For each run, every component has probability r to be involved in that run. If a selected component is faulty, the probability of exhibiting nominal (“good”) behavior equals g. When

Empirical evaluation

Whereas the synthetic observation matrices used in the previous section are populated using a uniform distribution, this is not the case with observation matrices for the behavior of actual programs (different spectral distribution). Therefore, in this section we will evaluate the same diagnosis techniques on the SIR-S set, which provides the programs introduced in Section 3.3 extended with the real-world, large programs space, gzip, and sed (see Table 5. In addition, we also evaluated our

Related work

As mentioned in the introduction, automated debugging techniques can be distinguished into statistical and logic reasoning approaches that use program models.

In model-based reasoning to automatic software debugging (MBSD), the program model is typically generated from the source code using static analysis, as opposed to the traditional application of model based diagnosis where the model is obtained from a formal specification of the (physical) system (Reiter, April 1987). An overview of

Conclusions and future work

In this paper, we have presented a multiple-fault localization technique, Zoltar-M, which is based on the dynamic, spectrum-based measurement approach from statistical fault localization methods, combined with a logic (and probabilistic) reasoning approach from model-based diagnosis, inspired by previous work in both separate disciplines (Abreu et al., 2007, Feldman et al., 2008). We have compared the performance of Zoltar-M with Tarantula and Ochiai, which are among the best known statistical

Acknowledgments

We extend our gratitude to Johan de Kleer for discussions which have influenced our multiple-fault reasoning approach. Also thanks to Rafi Vayani for conducting initial experiments on the effect of the hitting set filter in the single-fault case. Finally, we acknowledge the feedback from the discussions with our TRADER project partners.

Rui Abreu is with the Department of Informatics of the Faculty of Engineering of University of Porto as an Assistant Professor. He obtained his PhD. in Computer Science at the Software Engineering Research Group at Delft University of Technology. He holds an MSc. in Computer Science and Systems Engineering from Minho University, Portugal. Through his thesis work at Siemens R&D Porto, and professional internship at Philips Research, he acquired industrial experience in the area of quality of

References (33)

  • J. de Kleer et al.

    Characterizing diagnoses and systems

    Artif. Intel.

    (1992)
  • J. de Kleer et al.

    Diagnosing multiple faults

    Artif. Intel.

    (1987)
  • R. Reiter

    A theory of diagnosis from first principles

    Artif. Intel.

    (April 1987)
  • R. Abreu et al.

    A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis

  • R. Abreu et al.

    Localizing software faults simultaneously

  • R. Abreu et al.

    On the accuracy of spectrum-based fault localization

  • R. Abreu et al.

    An observation-based model for fault localization

  • R. Abreu et al.

    Spectrum-based multiple fault localization

  • A. Avižienis et al.

    Basic concepts and taxonomy of dependable and secure computing

    IEEE Trans. Dependable Sec. Comput.

    (2004)
  • G.K. Baah et al.

    The probabilistic program dependence graph and its application to fault diagnosis

  • T.M. Chilimbi et al.

    Holmes: effective statistical debugging via efficient path profiling

  • J. de Kleer

    Diagnosing intermittent faults

  • M. Esser et al.

    Automated test generation from models based on functional software specifications

  • A. Feldman et al.

    Computing minimal diagnoses by greedy stochastic search

  • A. Groce et al.

    Error explanation with distance metrics

    Int. J. Software Tools Technol. Transfer (STTT)

    (2006)
  • N. Gupta et al.

    Locating faulty code using failure-inducing chops

  • Cited by (49)

    • Multiple fault localization of software programs: A systematic literature review

      2020, Information and Software Technology
      Citation Excerpt :

      However, their proposal has not been implemented or empirically validated. Abreu et al. proposed a reasoning fault localization approach named Zoltar-M. the approach generates multiple-fault diagnoses and rank them in order of their probability [56, 61]. The authors aim is to effectively localize multiple faults simultaneously.

    • A single fault localization technique based on failed test input

      2019, Array
      Citation Excerpt :

      Degree centrality and closeness centrality were adopted for fault diagnosis and a new ranking formula was also proposed. Most recent studies have shifted to fault localization on multiple faults [4,17,33,47–50] introducing various approaches and methods to localize faults efficiently. Before we start explaining our proposed technique, we need to revisit our initial benchmark technique (FLCN) and its observable limitation on single-fault context.

    View all citing articles on Scopus

    Rui Abreu is with the Department of Informatics of the Faculty of Engineering of University of Porto as an Assistant Professor. He obtained his PhD. in Computer Science at the Software Engineering Research Group at Delft University of Technology. He holds an MSc. in Computer Science and Systems Engineering from Minho University, Portugal. Through his thesis work at Siemens R&D Porto, and professional internship at Philips Research, he acquired industrial experience in the area of quality of (embedded) systems.

    Peter Zoeteweij works at IntelliMagic as a Software Developer. He holds an MSc. from Delft University of Technology, and a PhD. from the University of Amsterdam, both in computer science. Before his PhD., Peter worked for several years as a software engineer for Logica (now LogicaCMG), mainly on software for the oil industry.

    Arjan J.C. van Gemund holds a BSc. in physics, and an MSc. (cum laude) and PhD. (cum laude) in computer science, all from Delft University of Technology. He has held positions at DSM and TNO, and currently serves as a full professor at the Electrical Engineering, Mathematics, and Computer Science Faculty of Delft University of Technology.

    This work has been carried out as part of the TRADER project under the responsibility of the Embedded Systems Institute. This project is partially supported by the Netherlands Ministry of Economic Affairs under the BSIK03021 program.

    View full text