Topic-based software defect explanation

doi:10.1016/j.jss.2016.05.015

Journal of Systems and Software

Volume 129, July 2017, Pages 79-106

https://doi.org/10.1016/j.jss.2016.05.015 Get rights and content

Highlights

•
Some topics are more defect-prone than others.
•
Defect-prone topics are likely to remain so over time.
•
Our topic-based metrics provide additional defect explanatory to baseline metrics.
•
Our metrics outperform state-of-the-art topic-based cohesion and coupling metrics.

Abstract

Researchers continue to propose metrics using measurable aspects of software systems to understand software quality. However, these metrics largely ignore the functionality, i.e., the conceptual concerns, of software systems. Such concerns are the technical concepts that reflect the system’s business logic. For instance, while lines of code may be a good general measure for defects, a large file responsible for simple I/O tasks is likely to have fewer defects than a small file responsible for complicated compiler implementation details. In this paper, we study the effect of concerns on software quality. We use a statistical topic modeling approach to approximate software concerns as topics (related words in source code). We propose various metrics using these topics to help explain the file defect-proneness. Case studies on multiple versions of Firefox, Eclipse, Mylyn, and NetBeans show that (i) some topics are more defect-prone than others; (ii) defect-prone topics tend to remain so over time; (iii) our topic-based metrics provide additional explanatory power for software quality over existing structural and historical metrics; and (iv) our topic-based cohesion metric outperforms state-of-the-art topic-based cohesion and coupling metrics in terms of defect explanatory power, while being simpler to implement and more intuitive to interpret.

Introduction

The cost of fixing software defects can be prohibitively high (Slaughter et al., 1998). As a result, researchers have tried to uncover the possible reasons for software defects using different classes of software metrics, such as product metrics, process metrics, and project metrics (Kan, 2002, Hall, Beecham, Bowes, Gray, Counsell, 2012). Indeed, such metrics have shown some success in explaining (i.e., correlation between the metrics and defects) the defect-proneness of software entities (e.g., methods, classes, files, or modules) (Hall et al., 2012). However, these types of metrics do not take into account the actual conceptual concerns of a software system—the main technical concepts and business logic embedded within the files (Liu et al., 2009). For example, an often-used metric, lines of code, may not always be a good general measure for defects: the largest file in one of our studied systems, (Mylyn, 2012), for example, has 2771 lines of code but no defects, while a much smaller file, with 23 lines of code, does contain a defect.

Recent studies propose a new class of metrics based on conceptual concerns (Liu, Poshyvanyk, Ferenc, Gyimothy, Chrisochoides, 2009, Nguyen, Nguyen, Phuong, 2011; Maskeri et al., 2008; Linstead et al., 2008). These studies approximate concerns using statistical topic models, such as latent Dirichlet allocation (LDA) (Blei et al., 2003). Statistical topic models discover topics (i.e., sets of related words) within the source code files, which researchers use as surrogates for conceptual concerns. A prior study by Baldi et al. (2008), shows that the topics that are generated by topic models have a strong agreement with that of other approaches like aspect mining. Recent studies Liu et al. (2009), Nguyen et al. (2011), Maskeri et al. (2008); Linstead et al. (2008), on applying topic models on software systems provide initial evidence that topics in software systems are associated with the defect-proneness of source code files, opening up new possibilities for explaining why some files are more defect-prone than others.

In this paper, we build upon prior studies on software quality by considering the topics in source code files. We propose a set of topic-based metrics to study software quality: number of topics (NT), number of defect-prone topics (NDT), topic membership (TM), and defect-prone topic membership (DTM). We study how our proposed topic-based metrics can help better explain software defects. We also compare one of our metrics (NT), which measures cohesion in a software system using topics, with other state-of-the-art topic-based cohesion and coupling metrics. To the best of our knowledge, our study provides the first detailed comparison of the defect explanatory power of state-of-the-art topic-based cohesion and coupling metrics. We perform a detailed case study on multiple versions of four real-world software systems, with a focus on the following research questions:

RQ1: Are some topics more defect-prone than others?

We find that some topics, such as those related to new features and the core functionality of a system, may have a much higher cumulative defect density (CDDT) than others (average skewness in CDDT is 7.25, where a skewness of 1 is already considered highly skewed). We also find that defect-prone topics are likely to remain so over time, indicating that prior defect-proneness of a topic can be used to explain the future defect-proneness of topics and their associated files (Spearman correlation is 0.44–0.67).

RQ2: Can our proposed topic-based metrics help explain why some files are more defect-prone than others?

We find that our proposed topic-based metrics provide additional explanatory power (4 – 314% improvement) about the defect-proneness of files over existing state-of-the-art product and process metrics such as lines of code, code churn, and number of pre-release defects.

RQ3: How do our topic-based metrics compare with state-of-the-art topic-based cohesion and coupling metrics?

We find that our metric outperforms state-of-the-art topic-based cohesion and coupling metrics. Compared to state-of-the-art, our metric gives a larger improvement in defect explanatory power (8 – 55%) when using lines of code as a baseline metric. Thus, practitioners may benefit from including our metrics when analyzing software quality using cohesion and coupling.

This work extends our previous work (Chen et al., 2012). First, we extend our case studies to include additional systems (NetBeans 4.0, 5.0, and 5.5.1). Second, we compare one of our topic-based metrics, which measures file cohesion, with state-of-the-art topic-based cohesion and coupling metrics (Section 5). We conduct a detailed comparison on how topic-based cohesion and coupling metrics help explain software defects. Finally, we study the sensitivity of the parameters that we use in our approach (Section 6). We have also made our data-sets and results publicly available (Chen, 2014), and encourage researchers to replicate and verify our study.

The rest of this paper is organized as follows. Section 2 describes our approach to discover topics in source code files, and we define the topic-based metrics that we use to answer our research questions. Section 3 introduces the studied systems and outlines the design of our case studies. Sections 4 and 5 present the result of our research questions. Section 6 discusses the parameter sensitivity of our approach. Section 7 talks about the potential threats to the validity of our findings, and Section 8 describes related work. Finally, Section 9 concludes the paper.

Section snippets

Proposed approach

In this section, we outline our approach of using topics to explain defects. First, we briefly introduce topic modeling and describe how it can be applied to source code files to approximate conceptual concerns (i.e., main business logic). Next, we motivate and describe our new topic-based metrics.

Case study design

In this section, we introduce the systems that we use for our case study and we describe our analysis process, depicted in Fig. 2.

Case study results

In this section, we present the results of our case study. We present each research question along three parts: the used approach to address the question; our experimental results; and a discussion of the results.

RQ3: How do our metrics compare with state-of-the-art topic-based cohesion and coupling metrics?

Maintaining a high cohesion and low coupling among source code files during development can help reduce maintenance costs and improve the reliability of a software system (Macro, Buxton, 1987, Fenton, 1991). Researchers have used various software structures, such as interactions among variables and methods, to measure cohesion and coupling in software systems (Allen, Khoshgoftaar, 1999, Chae, Kwon, Bae, 2000, Bieman, Kang, 1998, Briand, Daly, Wüst, 1998, Menzies, Butcher, Cok, Marcus, Layman,

Sensitivity analysis for the parameters of our approach

In our approach, we use several parameter values for LDA and our proposed topic-based metrics: two Dirichlet priors for smoothing (α and β), the number of iterations (II), the number of topics (K), and δ in NT and NDT. We perform a parameter sensitivity analysis to see how these parameters affect the defect explanatory power of our topic-based metrics. We do not change the LDA parameters of the topic-based cohesion and coupling metrics in Section 5, since we are only interested in comparing how

Threats to validity

The results of our case study provide an initial evaluation of using topics to explain software defects, and we show that our topic-based metric outperforms state-of-the-art topic-based cohesion and coupling metrics. However, we note the following threats to the validity of our findings.

Applying topic models to software engineering tasks

Recently, many researchers have used topic modeling approaches to understand software systems from a different point of view than from the traditional structural and historical views (Chen et al., 2015). For example, Kuhn et al. (2007), used Latent Semantic Indexing (LSI) to cluster the files in a software system according to the similarity of word usage. Maskeri et al. (2008), were the first to apply LDA to source code to uncover its conceptual concerns. Prior studies used topics to study the

Conclusions and future work

In this paper, we aim to understand the relationship between the conceptual concerns in source code files, i.e., their technical content, with their defect-proneness. To do so, we captured the concerns in each file using topics, and proposed new metrics based on these topics. In particular, we considered the defect history of each topic, which we hypothesized would help better explain the defect-proneness of the files.

To evaluate our new metrics, we performed a detailed case study on multiple

Acknowledgements

We thank Dr. Yasutaka Kamei for providing us the bug data-sets of the studied systems that are used in this paper.

References (82)

E. Cureton et al.
Factor Analysis: An Applied Approach
(1993)
T. Gyimothy et al.
Empirical validation of object-oriented metrics on open source software for fault prediction
IEEE Trans. Softw. Eng.
(2005)
M. Kutner et al.
Applied linear regression models
(1989)
A. Macro et al.
The craft of software engineering
(1987)
C. Manning et al.
Introduction to Information Retrieval
(2008)
TianK. et al.
Using latent Dirichlet allocation for automatic categorization of software
Proceedings of the Sixth International Working Conference on Mining Software Repositories
(2009)
Eclipse, 2012....
Mozilla firefox, 2012....
Mylyn, 2012....
Netbeans, 2012....

E.B. Allen et al.

Measuring coupling and cohesion: An information-theory approach

Proceedings of the Sixth International Symposium on Software Metrics

(1999)

H.U. Asuncion et al.

Software traceability with topic modeling

Proceedings of the Thirty-Second International Conference on Software Engineering

(2010)

P.F. Baldi et al.

A theory of aspects as latent topics

Proceedings of the Twenty-Third ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications

(2008)

J.M. Bieman et al.

Measuring design-level cohesion

IEEE Trans. Softw. Eng.

(1998)

L.R. Biggers et al.

Configuring latent dirichlet allocation based feature location

Empir. Softw. Eng.

(2014)

D. Binkley et al.

Understanding lda in source code analysis

Proceedings of the Twenty-Second International Conference on Program Comprehension

(2014)

C. Bird et al.

Don’t touch my code!: Examining the effects of ownership on software quality

Proceedings of the Nineteenth Symposium on the Foundations of Software Engineering and the Thirteenth European Software Engineering Conference

(2011)

S. Biyani et al.

Exploring defect data from development and customer usage on software modules over multiple releases

Proceedings of the Ninth International Symposium on Software Reliability Engineering

(1998)

D.M. Blei et al.

Latent Dirichlet allocation

J. Mach. Learn. Res.

(2003)

S. Boslaugh et al.

Statistics in a Nutshell: a Desktop Quick Reference

(2008)

L.C. Briand et al.

A unified framework for cohesion measurement in object-orientedsystems

Empir. Softw. Eng.

(1998)

P.F. Brown et al.

Class-based n-gram models of natural language

Comput. Linguist.

(1992)

R. Anderson David et al.

Multimodel inference: Understanding AIC and BIC in model selection

Sociol. Methods Res.

(2004)

ChaeH.S. et al.

A cohesion measure for object-oriented classes

Softw. Pract. Exp.

(2000)

ChangJ. et al.

Relational topic models for document networks

Proceedings of the Twelth International Conference on Artifiücial Intelligence and Statistics

(2009)

ChenT.-H.

STUDYING SOFTWARE QUALITY USING TOPIC MODELS

(2013)

Chen, T.-H., 2014....

ChenT.-H. et al.

A survey on the use of topic models when mining software repositories

Empir. Softw. Eng.

(2015)

ChenT.-H. et al.

Explaining software defects using topic models

Proceedings of the Ninth Working Conference on Mining Software Repositories

(2012)

B. Cleary et al.

An empirical analysis of information retrieval based concept location techniques in software comprehension

Empir. Softw. Eng.

(2008)

S.G. Crawford et al.

An analysis of static metrics and faults in c software

J. Syst. Softw.

(1985)

M. DAmbros et al.

An extensive comparison of bug prediction approaches

Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), Cape Town

(2010)

M. Eaddy et al.

Do crosscutting concerns cause defects?

IEEE Trans. Softw. Eng.

(2008)

R. Feldman et al.

Applied Probability and Stochastic Processes

(2010)

N. Fenton

Software Metrics: A Rigorous Approach

(1991)

M. Gethers et al.

Using relational topic models to capture coupling among classes in object-oriented software systems

Proceedings of the Twenty-Sixth International Conference on Software Maintenance

(2010)

M. Golberg et al.

Introduction to regression analysis

(2003)

S. Grant et al.

Estimating the optimal number of latent concepts in source code analysis

Proceedings of the 2010 Tenth IEEE Working Conference on Source Code Analysis and Manipulation

(2010)

S. Grant et al.

Using heuristics to estimate an appropriate number of latent topics in source code analysis

Sci. Comput. Programm.

(2013)

C. Haan

Statistical methods in hydrology

(1977)

T. Hall et al.

A systematic review of fault prediction performance in software engineering

IEEE Trans. Softw. Eng.

(2012)

Cited by (18)

A data-driven, comparative review of the academic literature and news media on blockchain-enabled supply chain management: Trends, gaps, and research needs
2022, Computers in Industry
The application of blockchain technology in supply chain management has become a popular area of discussion in research and practice. This paper develops a computational, data-driven synthesis of the scholarly literature versus news media on BT-enabled supply chain management (BT-SCM) to uncover major trends, understand how academic research is aligned with business practice, and find out existing gaps. Through text mining and topic modeling of 1148 full-text research papers and 5130 news articles, major themes within each domain, their patterns of evolution over time, and the depth and breadth of their associations were identified. Mapping analyses were also conducted based on the supply chain operations reference (SCOR) model and the main SCM research streams to further explore existing knowledge gaps. The findings revealed that BT-enabled supply chain asset management, BT-enabled reverse logistics and closed-loop supply chain, and actual versus anticipated performance outcomes of BT-SCM are among important pathways for future research. The findings also highlighted where there is more need to enhance the practical relevance of BT-SCM research considering advances in business adoption. The paper provides a comprehensive, unbiased assessment of the BT-SCM knowledge landscape and a taxonomy of the research questions related to the technical and managerial aspects of BT-SCM that are particularly useful for the community of researchers in the field. It offers a practical framework that can be applied to assess the academic literature on other emerging technologies in SCM where state-of-the-practice is key to guiding research efforts.
An integrated probabilistic graphic model and FMEA approach to identify product defects from social media data
2021, Expert Systems with Applications
Citation Excerpt :
Given that PGMs are useful tools to extract main topics from texts, they have shown their potential to discover defects in many studies. Famous PGMs like LDA or STM have been applied and obtain good performance (Chen et al., 2017; Kuhn, 2018). But these PGMs can only derive main topics from corpora, they can not provide information on product defects detailly.
Recently, the explosive increase in social media data enables manufacturers to collect product defect information promptly. Extant literature gathers defect information like defective components or defect symptoms without distinguishing defect-related (DR) texts from defect-unrelated (DUR) texts and thus makes defects discussed by few texts buried in enormous DUR texts. Moreover, existing studies do not consider the defect severity which is valuable and important for manufacturers to make remedial decisions. To bridge these research gaps, we propose a novel approach that integrates the probabilistic graphic model named Product Defect Identification and Analysis Model (PDIAM) with Failure Mode and Effect Analysis (FMEA) to derive product defect information from social media data. Comparing to extant studies, PDIAM identifies DR texts and then extracts defect information from these texts. And PDIAM provides more defect information than previous researches. Besides, we further analyze defect severity with the combination of FMEA and PDIAM which alleviates the inherent subjectivity brought by expert evaluation in the traditional FMEA. A case study in the automobile industry proves the predominant performance of our approach and great potential in defect management.
Studying the Relationship Between the Usage of APIs Discussed in the Crowd and Post-Release Defects
2020, Journal of Systems and Software
Citation Excerpt :
Approach. Following prior studies (Chen et al., 2017; Shang et al., 2015), we are not going to directly predict defects in this step. Instead, we aim to investigate how much our crowd-related metrics can improve the deviance explained by the traditional baseline models.
Software development nowadays is heavily based on libraries, frameworks and their proposed Application Programming Interfaces (APIs). However, due to challenges such as the complexity and the lack of documentation, these APIs may introduce various obstacles for developers and common defects in software systems. To resolve these issues, developers usually utilize Question and Answer (Q&A) websites such as Stack Overflow by asking their questions and finding proper solutions for their problems on APIs. Therefore, these websites have become inevitable sources of knowledge for developers, which is also known as the crowd knowledge.
However, the relation of this knowledge to the software quality has never been adequately explored before. In this paper, we study whether using APIs which are challenging according to the discussions of the Stack Overflow is related to code quality defined in terms of post-release defects. To this purpose, we define the concept of challenge of an API, which denotes how much the API is discussed in high-quality posts on Stack Overflow. Then, using this concept, we propose a set of products and process metrics. We empirically study the statistical correlation between our metrics and post-release defects as well as added explanatory and predictive power to traditional models through a case study on five open source projects including Spring, Elastic Search, Jenkins, K-8 Mail Android Client, and OwnCloud Android client.
Our findings reveal that our metrics have a positive correlation with post-release defects which is comparable to known high-performance traditional process metrics, such as code churn and number of pre-release defects. Furthermore, our proposed metrics can provide additional explanatory and predictive power for software quality when added to the models based on existing products and process metrics. Our results suggest that software developers should consider allocating more resources on reviewing and improving external API usages to prevent further defects.
A novel probabilistic graphic model to detect product defects from social media data
2020, Decision Support Systems
Product defects are a major concern for manufacturers and customers. Detecting product defects is vital for manufacturers to prevent enormous product failure costs. As the surge of social media is in vogue, social media data become an important information source for manufacturers to collect defect information. In this study, we propose a novel probabilistic graphic model to discover defects from social media data. We first use three filters, namely, sentiment filter, component-symptom filter and similarity filter, to select informative data. Second, we analyze the remaining data via the proposed probabilistic graphic model and identify defect-related data. Our method provides detailed defect information including defect types, defective components and defect symptoms which is omitted by previous research. A case study in the automobile industry validates the effectiveness and superior performance of our method compared to prior approaches.
Insights of effectivity analysis of learning-based approaches towards software defect prediction
2024, International Journal of Electrical and Computer Engineering
Modeling Topics in DFA-Based Lemmatized Gujarati Text
2023, Sensors

View all citing articles on Scopus

View full text