An empirical study on the use of mutant traces for diagnosis of faults in deployed systems

https://doi.org/10.1016/j.jss.2013.11.1094Get rights and content

Highlights

  • This paper investigates the use of mutants (artificial faults) in diagnosing actual faults in deployed systems.

  • It uses decision trees on mutant traces to identify faulty functions in actual traces of the deployed systems.

  • Results show that mutants can identify faulty functions with up to 60–100% accuracy on reviewing 10% or less of the code.

Abstract

Debugging deployed systems is an arduous and time consuming task. It is often difficult to generate traces from deployed systems due to the disturbance and overhead that trace collection may cause on a system in operation. Many organizations also do not keep historical traces of failures. On the other hand earlier techniques focusing on fault diagnosis in deployed systems require a collection of passing–failing traces, in-house reproduction of faults or a historical collection of failed traces. In this paper, we investigate an alternative solution. We investigate how artificial faults, generated using software mutation in test environment, can be used to diagnose actual faults in deployed software systems. The use of traces of artificial faults can provide relief when it is not feasible to collect different kinds of traces from deployed systems. Using artificial and actual faults we also investigate the similarity of function call traces of different faults in functions. To achieve our goal, we use decision trees to build a model of traces generated from mutants and test it on faulty traces generated from actual programs. The application of our approach to various real world programs shows that mutants can indeed be used to diagnose faulty functions in the original code with approximately 60–100% accuracy on reviewing 10% or less of the code; whereas, contemporary techniques using pass–fail traces show poor results in the context of software maintenance. Our results also show that different faults in closely related functions occur with similar function call traces. The use of mutation in fault diagnosis shows promising results but the experiments also show the challenges related to using mutants.

Introduction

Typically, maintainers collect data (such as execution traces and error logs) related to software failures in order to debug the causes of failures. For example, Windows Error Reporting (WER, 2012), Mozilla crash reporting (Mozilla, 2013), and Ubuntu's Apport crash reporting (Ubuntu, 2013) collect function calls on stacks and other related information to debug the causes of crashes. Similarly, maintainers at IBM collect function call traces for DB2 (Melnyk, 2004) and WebSphere (Hare and Julin, 2007) from the field to diagnose the causes of crashing failures and non-crashing failures1 (e.g., performance failures, unexpected outputs, etc.). However, diagnosing the origin of faults causing failures in deployed systems is time consuming and can take up to 30–40% of the corrective maintenance time (UWO and IBM, 2008).

Prior techniques focusing on automatic fault diagnosis in deployed systems (e.g., using statistical debugging (Chilimbi et al., 2009, Liu and Han, 2006) propose to diagnose fault locations by collecting passing and failing traces from deployed systems at a time when the fault occurs. Other researchers (Brodie et al., 2005, Lee and Iyer, 2000, Murtaza et al., 2010, Podgurski et al., 2003) focusing on deployed systems propose to correlate (function call) failure traces from deployed systems with historical traces of failures to identify recurrent faults. Another typical practice, mostly used in in-house software testing, is to reproduce faults on test machines and collect the corresponding program traces of pass–fail test cases. In this practice, passing–failing traces are collected at a finer grained level, such as statements, and fault localization techniques are executed on them: many fault localization techniques have been proposed that focus on software testing (e.g., Agrawal et al., 1995, Wong and Qi, 2006, Jones and Harrold, 2005, Zhang et al., 2009, Wong et al., 2007).

In practice, it is usually not feasible to collect many traces from deployed systems, due to overhead incurred during trace collection that can impact business operations. Further, historical traces based techniques usually detect only known faults and it is common that historical traces are not available in many organizations. It is also time consuming to reproduce thousands of faults reported from the field in a lab environment and many faults are not easily reproducible (e.g., crashes occurring due to specific system configurations).

In the field, function call traces are commonly collected traces for crashing failures (Mozilla, 2013, Ubuntu, 2013) and non-crashing failures (Melnyk, 2004, Hare and Julin, 2007), and mostly a failed trace is collected for a corresponding failure. In this paper, we focus on the problem of identifying faulty functions from a function call2 trace of a deployed software system. We approach the solution of this problem by employing the concept of software mutation for the identification of faulty functions. A software mutant is an artificially generated fault in a program and Andrews et al. (2005) showed that mutants are close representative of actual faults. More precisely, we investigate whether we can generate mutants for functions of a program and use their (mutants) traces to locate faulty functions in the traces of actual faults of the program.

The use of mutants for fault localization is a novel approach as mutants have mostly been used to measure, enhance, and compare the effectiveness of testing strategies (Offutt and Untch, 2001) and test coverage criteria (Andrews et al., 2006). If traces of mutants can be used to diagnose actual faults, then it can relieve the collection of historical failed traces or pass–fail traces from deployed systems and facilitate the diagnosis of faults without spending time in fault reproduction. The use of mutants will also reduce the overhead of multiple trace collection from deployed systems but at the expense of time to generate traces of mutants before deployment (i.e., offline). However, savings in time and overhead of trace collection for systems in operations (deployed systems) is more critical than offline trace generation. This paper therefore addresses the following novel research question:

(Q1) Can we diagnose actual faults in traces of deployed software systems by using only the traces of mutants (i.e., automatically seeded artificial faults) of software systems?

In on our earlier work (Murtaza et al., 2010), we showed that different actual faults in the same function occur with similar function call traces. In this paper, we extend this investigation further by determining how different artificial (mutants) and actual faults in functions are related to each other in terms of function call traces. This can be beneficial in understanding the relationship among different faulty functions and improving the fault diagnosis process. Therefore, a secondary research question that follows from (Q1) is:

(Q2) Do different artificial faults (mutants) and actual faults in functions occur with similar function call traces?

We determine the answers to these novel research questions by training decision trees on the traces of mutants of functions in a program and predicting faulty functions in actual faulty traces of that program. Our results on public programs show that mutants can be used to diagnose actual faults and different faults in a group of functions occur with similar function call traces. These results are novel and contribute to the knowledge of corrective maintenance and the literature on fault diagnosis.

The rest of the paper is as follows. We present our approach in Section 2, and case studies to evaluate our approach in Section 3. In Section 4, we describe the related techniques to our technique. Section 5 explains the threats to validity and Section 6 concludes this paper with the directions to future work.

Section snippets

Approach

The steps of our approach are shown in Fig. 1. Initially, we generate mutants (artificial faults) for functions of a program. The next step is to collect function call traces from executing the mutants. We call these traces mutant traces. This step requires generating the mutants and running test cases on them. A mutant trace is collected when the output of a test case is different from the original program (deemed correct). Function call traces are then stored in a database. The records of

Case studies

This section answers the two research questions of Section 1: (Q1) Can we diagnose actual faults in traces of deployed software systems by using only the traces of mutants (i.e., automatically seeded artificial faults) of software systems? (Q2) Do different artificial (mutants) and actual faults in functions occur with similar function call traces?

To answer the questions, we applied our approach to different programs, namely, the Space program and the UNIX utilities (i.e., Grep, Gzip and Sed) (

Related work

This section describes closely related techniques grouped into three categories: fault localization techniques for in-house faults, fault localization techniques for field failures, and the techniques using mutation.

Threats to validity

In this section, we describe certain threats to the validity of the research results obtained through our employed research process. We classify threats into four groups: conclusion validity, internal validity, construct validity, and external validity (Wohlin et al., 2000).

A threat to conclusion validity belongs to random variations in mutant traces. We randomly chose 3 mutants per function, but on some mutants in functions, test cases did not fail. It is possible that different selected

Conclusions and future work

A number of fault diagnosis techniques proposed for deployed software focus on: the classification of field profiles into failed or successful executions (Bowring et al., 2004, Haran et al., 2007), clustering field profiles (Liu and Han, 2006, Podgurski et al., 2003), rediscovery of crashing faults (Brodie et al., 2005, Lee and Iyer, 2000), statistical debugging (Chilimbi et al., 2009, Liu and Han, 2006), and rediscovery of crashing and non-crashing faults (Murtaza et al., 2010). These

Syed Shariyar Murtaza received his Ph.D. from the University of Western Ontario in 2011. He received his MS in Computer Engineering from Kyung Hee University in 2006 and BS from the University of Karachi in 2004. He has been working as a research scientist with Concordia University and Defence Research and Development Canada since 2011. He specializes in software maintenance with a focus on fault localization and anomaly detection. He also has a keen interest in the applications of machine

References (53)

  • V. Debroy et al.

    Using mutation to automatically suggest fixes for faulty programs

  • N. Devillard et al.

    Etrace – Runtime Tracing Tool

    (2004, March)
  • G. Di Fatta et al.

    Discriminative pattern mining in software fault detection

  • H. Do et al.

    On the use of mutation faults in empirical assessments of test case prioritization techniques

    IEEE Trans. Soft. Eng.

    (2006)
  • H. Do et al.

    Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact

    J. Empirical Soft. Eng.

    (2005)
  • S. Elbaum et al.

    Trace anomalies as precursors of field failures: an empirical study

    J. Empirical Soft. Eng.

    (2007)
  • M. Gittens et al.

    The vital few versus the trivial many: examining the pareto principle for software

  • A. Hamou-Lhadj

    Techniques to Simplify the Analysis of Execution Traces for Program Comprehension

    (2005)
  • D. Hao et al.

    A similarity-aware approach to testing based fault localization

  • M. Haran et al.

    Techniques for classifying executions of deployed software to support software engineering tasks

    IEEE Trans. Soft. Eng.

    (2007)
  • D. Hare et al.

    The Support Authority: Interpreting a WebSphere Application Server Trace File

    (2007, April)
  • Y. Jia et al.

    An analysis and survey of the development of mutation testing

    IEEE Trans. Softw. Eng.

    (2011)
  • J.A. Jones et al.

    Empirical evaluation of the tarantula automatic fault-localization technique

  • M. Jose et al.

    Cause clue clauses: error localization using maximum satisfiability

  • I. Lee et al.

    Diagnosing rediscovered problems using symptoms

    IEEE Trans. Soft. Eng.

    (2000)
  • B. Liblit et al.

    Scalable statistical bug isolation

  • Cited by (11)

    • Mutation Testing Advances: An Analysis and Survey

      2019, Advances in Computers
      Citation Excerpt :

      Empirical results demonstrated that mutation-based techniques identify the faulty program locations (and edits) as the most suspicious statements. Another work of this type is that of Murtaza et al. [247,248]. In this work, it was observed that the test execution traces produced by mutants and faults are similar.

    • WAVELETS TRANSFORM FOR SOFTWARE BUG LOCALIZATION

      2023, International Journal on Technical and Physical Problems of Engineering
    • On Automatic Parsing of Log Records

      2021, Proceedings - International Conference on Software Engineering
    • Exploiting code knowledge graph for bug localization via bi-directional attention

      2020, IEEE International Conference on Program Comprehension
    • Using mutants to help developers distinguish and debug (compiler) faults

      2020, Software Testing Verification and Reliability
    View all citing articles on Scopus

    Syed Shariyar Murtaza received his Ph.D. from the University of Western Ontario in 2011. He received his MS in Computer Engineering from Kyung Hee University in 2006 and BS from the University of Karachi in 2004. He has been working as a research scientist with Concordia University and Defence Research and Development Canada since 2011. He specializes in software maintenance with a focus on fault localization and anomaly detection. He also has a keen interest in the applications of machine learning in software engineering and information management.

    Abdelwahab Hamou-Lhadj is a tenured associate professor in the Department of Electrical and Computer Engineering at Concordia University. His research interests include software modeling, software behavior analysis, software maintenance and evolution, anomaly detection, business process management, and organizational performance. He has worked with and consulted for many government and industrial organizations. He holds a Ph.D. degree in Computer Science from the University of Ottawa (2005). He also holds the OMG Certified Expert in Business Process Management Certification – advanced business and technical tracks. He is a Licensed Professional Engineer in Quebec, and a long-lasting member of IEEE and ACM.

    Nazim H. Madhavji is a Professor at the University of Western Ontario, London, Canada. His research interests include: requirements engineering, software architectures, system compliance, software and process quality, software evolution and feedback, and empirical studies.

    Mechelle Gittens has worked and carried out research in software engineering since 1995. She is currently a Lecturer at the University of the West Indies – Cave Hill Campus where she teaches and does research in Computer Science. Mechelle has a Master's and Doctorate from the University of Western Ontario (UWO) where she is now a Research Adjunct Professor. Her work is in software quality, quality of life technologies, software testing, empirical software engineering, software reliability, and project management. She has published at several international forums in these areas and jointly holds a US patent in software testing.

    View full text