Topic-based software defect explanation

https://doi.org/10.1016/j.jss.2016.05.015Get rights and content

Highlights

  • Some topics are more defect-prone than others.

  • Defect-prone topics are likely to remain so over time.

  • Our topic-based metrics provide additional defect explanatory to baseline metrics.

  • Our metrics outperform state-of-the-art topic-based cohesion and coupling metrics.

Abstract

Researchers continue to propose metrics using measurable aspects of software systems to understand software quality. However, these metrics largely ignore the functionality, i.e., the conceptual concerns, of software systems. Such concerns are the technical concepts that reflect the system’s business logic. For instance, while lines of code may be a good general measure for defects, a large file responsible for simple I/O tasks is likely to have fewer defects than a small file responsible for complicated compiler implementation details. In this paper, we study the effect of concerns on software quality. We use a statistical topic modeling approach to approximate software concerns as topics (related words in source code). We propose various metrics using these topics to help explain the file defect-proneness. Case studies on multiple versions of Firefox, Eclipse, Mylyn, and NetBeans show that (i) some topics are more defect-prone than others; (ii) defect-prone topics tend to remain so over time; (iii) our topic-based metrics provide additional explanatory power for software quality over existing structural and historical metrics; and (iv) our topic-based cohesion metric outperforms state-of-the-art topic-based cohesion and coupling metrics in terms of defect explanatory power, while being simpler to implement and more intuitive to interpret.

Introduction

The cost of fixing software defects can be prohibitively high (Slaughter et al., 1998). As a result, researchers have tried to uncover the possible reasons for software defects using different classes of software metrics, such as product metrics, process metrics, and project metrics (Kan, 2002, Hall, Beecham, Bowes, Gray, Counsell, 2012). Indeed, such metrics have shown some success in explaining (i.e., correlation between the metrics and defects) the defect-proneness of software entities (e.g., methods, classes, files, or modules) (Hall et al., 2012). However, these types of metrics do not take into account the actual conceptual concerns of a software system—the main technical concepts and business logic embedded within the files (Liu et al., 2009). For example, an often-used metric, lines of code, may not always be a good general measure for defects: the largest file in one of our studied systems, (Mylyn, 2012), for example, has 2771 lines of code but no defects, while a much smaller file, with 23 lines of code, does contain a defect.

Recent studies propose a new class of metrics based on conceptual concerns (Liu, Poshyvanyk, Ferenc, Gyimothy, Chrisochoides, 2009, Nguyen, Nguyen, Phuong, 2011; Maskeri et al., 2008; Linstead et al., 2008). These studies approximate concerns using statistical topic models, such as latent Dirichlet allocation (LDA) (Blei et al., 2003). Statistical topic models discover topics (i.e., sets of related words) within the source code files, which researchers use as surrogates for conceptual concerns. A prior study by Baldi et al. (2008), shows that the topics that are generated by topic models have a strong agreement with that of other approaches like aspect mining. Recent studies Liu et al. (2009), Nguyen et al. (2011), Maskeri et al. (2008); Linstead et al. (2008), on applying topic models on software systems provide initial evidence that topics in software systems are associated with the defect-proneness of source code files, opening up new possibilities for explaining why some files are more defect-prone than others.

In this paper, we build upon prior studies on software quality by considering the topics in source code files. We propose a set of topic-based metrics to study software quality: number of topics (NT), number of defect-prone topics (NDT), topic membership (TM), and defect-prone topic membership (DTM). We study how our proposed topic-based metrics can help better explain software defects. We also compare one of our metrics (NT), which measures cohesion in a software system using topics, with other state-of-the-art topic-based cohesion and coupling metrics. To the best of our knowledge, our study provides the first detailed comparison of the defect explanatory power of state-of-the-art topic-based cohesion and coupling metrics. We perform a detailed case study on multiple versions of four real-world software systems, with a focus on the following research questions:

RQ1: Are some topics more defect-prone than others?

We find that some topics, such as those related to new features and the core functionality of a system, may have a much higher cumulative defect density (CDDT) than others (average skewness in CDDT is 7.25, where a skewness of 1 is already considered highly skewed). We also find that defect-prone topics are likely to remain so over time, indicating that prior defect-proneness of a topic can be used to explain the future defect-proneness of topics and their associated files (Spearman correlation is 0.44–0.67).

RQ2: Can our proposed topic-based metrics help explain why some files are more defect-prone than others?

We find that our proposed topic-based metrics provide additional explanatory power (4 – 314% improvement) about the defect-proneness of files over existing state-of-the-art product and process metrics such as lines of code, code churn, and number of pre-release defects.

RQ3: How do our topic-based metrics compare with state-of-the-art topic-based cohesion and coupling metrics?

We find that our metric outperforms state-of-the-art topic-based cohesion and coupling metrics. Compared to state-of-the-art, our metric gives a larger improvement in defect explanatory power (8 – 55%) when using lines of code as a baseline metric. Thus, practitioners may benefit from including our metrics when analyzing software quality using cohesion and coupling.

This work extends our previous work (Chen et al., 2012). First, we extend our case studies to include additional systems (NetBeans 4.0, 5.0, and 5.5.1). Second, we compare one of our topic-based metrics, which measures file cohesion, with state-of-the-art topic-based cohesion and coupling metrics (Section 5). We conduct a detailed comparison on how topic-based cohesion and coupling metrics help explain software defects. Finally, we study the sensitivity of the parameters that we use in our approach (Section 6). We have also made our data-sets and results publicly available (Chen, 2014), and encourage researchers to replicate and verify our study.

The rest of this paper is organized as follows. Section 2 describes our approach to discover topics in source code files, and we define the topic-based metrics that we use to answer our research questions. Section 3 introduces the studied systems and outlines the design of our case studies. Sections 4 and 5 present the result of our research questions. Section 6 discusses the parameter sensitivity of our approach. Section 7 talks about the potential threats to the validity of our findings, and Section 8 describes related work. Finally, Section 9 concludes the paper.

Section snippets

Proposed approach

In this section, we outline our approach of using topics to explain defects. First, we briefly introduce topic modeling and describe how it can be applied to source code files to approximate conceptual concerns (i.e., main business logic). Next, we motivate and describe our new topic-based metrics.

Case study design

In this section, we introduce the systems that we use for our case study and we describe our analysis process, depicted in Fig. 2.

Case study results

In this section, we present the results of our case study. We present each research question along three parts: the used approach to address the question; our experimental results; and a discussion of the results.

RQ3: How do our metrics compare with state-of-the-art topic-based cohesion and coupling metrics?

Maintaining a high cohesion and low coupling among source code files during development can help reduce maintenance costs and improve the reliability of a software system (Macro, Buxton, 1987, Fenton, 1991). Researchers have used various software structures, such as interactions among variables and methods, to measure cohesion and coupling in software systems (Allen, Khoshgoftaar, 1999, Chae, Kwon, Bae, 2000, Bieman, Kang, 1998, Briand, Daly, Wüst, 1998, Menzies, Butcher, Cok, Marcus, Layman,

Sensitivity analysis for the parameters of our approach

In our approach, we use several parameter values for LDA and our proposed topic-based metrics: two Dirichlet priors for smoothing (α and β), the number of iterations (II), the number of topics (K), and δ in NT and NDT. We perform a parameter sensitivity analysis to see how these parameters affect the defect explanatory power of our topic-based metrics. We do not change the LDA parameters of the topic-based cohesion and coupling metrics in Section 5, since we are only interested in comparing how

Threats to validity

The results of our case study provide an initial evaluation of using topics to explain software defects, and we show that our topic-based metric outperforms state-of-the-art topic-based cohesion and coupling metrics. However, we note the following threats to the validity of our findings.

Applying topic models to software engineering tasks

Recently, many researchers have used topic modeling approaches to understand software systems from a different point of view than from the traditional structural and historical views (Chen et al., 2015). For example, Kuhn et al. (2007), used Latent Semantic Indexing (LSI) to cluster the files in a software system according to the similarity of word usage. Maskeri et al. (2008), were the first to apply LDA to source code to uncover its conceptual concerns. Prior studies used topics to study the

Conclusions and future work

In this paper, we aim to understand the relationship between the conceptual concerns in source code files, i.e., their technical content, with their defect-proneness. To do so, we captured the concerns in each file using topics, and proposed new metrics based on these topics. In particular, we considered the defect history of each topic, which we hypothesized would help better explain the defect-proneness of the files.

To evaluate our new metrics, we performed a detailed case study on multiple

Acknowledgements

We thank Dr. Yasutaka Kamei for providing us the bug data-sets of the studied systems that are used in this paper.

References (82)

  • E.B. Allen et al.

    Measuring coupling and cohesion: An information-theory approach

    Proceedings of the Sixth International Symposium on Software Metrics

    (1999)
  • H.U. Asuncion et al.

    Software traceability with topic modeling

    Proceedings of the Thirty-Second International Conference on Software Engineering

    (2010)
  • P.F. Baldi et al.

    A theory of aspects as latent topics

    Proceedings of the Twenty-Third ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications

    (2008)
  • J.M. Bieman et al.

    Measuring design-level cohesion

    IEEE Trans. Softw. Eng.

    (1998)
  • L.R. Biggers et al.

    Configuring latent dirichlet allocation based feature location

    Empir. Softw. Eng.

    (2014)
  • D. Binkley et al.

    Understanding lda in source code analysis

    Proceedings of the Twenty-Second International Conference on Program Comprehension

    (2014)
  • C. Bird et al.

    Don’t touch my code!: Examining the effects of ownership on software quality

    Proceedings of the Nineteenth Symposium on the Foundations of Software Engineering and the Thirteenth European Software Engineering Conference

    (2011)
  • S. Biyani et al.

    Exploring defect data from development and customer usage on software modules over multiple releases

    Proceedings of the Ninth International Symposium on Software Reliability Engineering

    (1998)
  • D.M. Blei et al.

    Latent Dirichlet allocation

    J. Mach. Learn. Res.

    (2003)
  • S. Boslaugh et al.

    Statistics in a Nutshell: a Desktop Quick Reference

    (2008)
  • L.C. Briand et al.

    A unified framework for cohesion measurement in object-orientedsystems

    Empir. Softw. Eng.

    (1998)
  • P.F. Brown et al.

    Class-based n-gram models of natural language

    Comput. Linguist.

    (1992)
  • R. Anderson David et al.

    Multimodel inference: Understanding AIC and BIC in model selection

    Sociol. Methods Res.

    (2004)
  • ChaeH.S. et al.

    A cohesion measure for object-oriented classes

    Softw. Pract. Exp.

    (2000)
  • ChangJ. et al.

    Relational topic models for document networks

    Proceedings of the Twelth International Conference on Artifiücial Intelligence and Statistics

    (2009)
  • ChenT.-H.

    STUDYING SOFTWARE QUALITY USING TOPIC MODELS

    (2013)
  • Chen, T.-H., 2014....
  • ChenT.-H. et al.

    A survey on the use of topic models when mining software repositories

    Empir. Softw. Eng.

    (2015)
  • ChenT.-H. et al.

    Explaining software defects using topic models

    Proceedings of the Ninth Working Conference on Mining Software Repositories

    (2012)
  • B. Cleary et al.

    An empirical analysis of information retrieval based concept location techniques in software comprehension

    Empir. Softw. Eng.

    (2008)
  • S.G. Crawford et al.

    An analysis of static metrics and faults in c software

    J. Syst. Softw.

    (1985)
  • M. DAmbros et al.

    An extensive comparison of bug prediction approaches

    Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), Cape Town

    (2010)
  • M. Eaddy et al.

    Do crosscutting concerns cause defects?

    IEEE Trans. Softw. Eng.

    (2008)
  • R. Feldman et al.

    Applied Probability and Stochastic Processes

    (2010)
  • N. Fenton

    Software Metrics: A Rigorous Approach

    (1991)
  • M. Gethers et al.

    Using relational topic models to capture coupling among classes in object-oriented software systems

    Proceedings of the Twenty-Sixth International Conference on Software Maintenance

    (2010)
  • M. Golberg et al.

    Introduction to regression analysis

    (2003)
  • S. Grant et al.

    Estimating the optimal number of latent concepts in source code analysis

    Proceedings of the 2010 Tenth IEEE Working Conference on Source Code Analysis and Manipulation

    (2010)
  • S. Grant et al.

    Using heuristics to estimate an appropriate number of latent topics in source code analysis

    Sci. Comput. Programm.

    (2013)
  • C. Haan

    Statistical methods in hydrology

    (1977)
  • T. Hall et al.

    A systematic review of fault prediction performance in software engineering

    IEEE Trans. Softw. Eng.

    (2012)
  • Cited by (18)

    • An integrated probabilistic graphic model and FMEA approach to identify product defects from social media data

      2021, Expert Systems with Applications
      Citation Excerpt :

      Given that PGMs are useful tools to extract main topics from texts, they have shown their potential to discover defects in many studies. Famous PGMs like LDA or STM have been applied and obtain good performance (Chen et al., 2017; Kuhn, 2018). But these PGMs can only derive main topics from corpora, they can not provide information on product defects detailly.

    • Studying the Relationship Between the Usage of APIs Discussed in the Crowd and Post-Release Defects

      2020, Journal of Systems and Software
      Citation Excerpt :

      Approach. Following prior studies (Chen et al., 2017; Shang et al., 2015), we are not going to directly predict defects in this step. Instead, we aim to investigate how much our crowd-related metrics can improve the deviance explained by the traditional baseline models.

    • Insights of effectivity analysis of learning-based approaches towards software defect prediction

      2024, International Journal of Electrical and Computer Engineering
    View all citing articles on Scopus
    View full text