Topic-based software defect explanation
Introduction
The cost of fixing software defects can be prohibitively high (Slaughter et al., 1998). As a result, researchers have tried to uncover the possible reasons for software defects using different classes of software metrics, such as product metrics, process metrics, and project metrics (Kan, 2002, Hall, Beecham, Bowes, Gray, Counsell, 2012). Indeed, such metrics have shown some success in explaining (i.e., correlation between the metrics and defects) the defect-proneness of software entities (e.g., methods, classes, files, or modules) (Hall et al., 2012). However, these types of metrics do not take into account the actual conceptual concerns of a software system—the main technical concepts and business logic embedded within the files (Liu et al., 2009). For example, an often-used metric, lines of code, may not always be a good general measure for defects: the largest file in one of our studied systems, (Mylyn, 2012), for example, has 2771 lines of code but no defects, while a much smaller file, with 23 lines of code, does contain a defect.
Recent studies propose a new class of metrics based on conceptual concerns (Liu, Poshyvanyk, Ferenc, Gyimothy, Chrisochoides, 2009, Nguyen, Nguyen, Phuong, 2011; Maskeri et al., 2008; Linstead et al., 2008). These studies approximate concerns using statistical topic models, such as latent Dirichlet allocation (LDA) (Blei et al., 2003). Statistical topic models discover topics (i.e., sets of related words) within the source code files, which researchers use as surrogates for conceptual concerns. A prior study by Baldi et al. (2008), shows that the topics that are generated by topic models have a strong agreement with that of other approaches like aspect mining. Recent studies Liu et al. (2009), Nguyen et al. (2011), Maskeri et al. (2008); Linstead et al. (2008), on applying topic models on software systems provide initial evidence that topics in software systems are associated with the defect-proneness of source code files, opening up new possibilities for explaining why some files are more defect-prone than others.
In this paper, we build upon prior studies on software quality by considering the topics in source code files. We propose a set of topic-based metrics to study software quality: number of topics (NT), number of defect-prone topics (NDT), topic membership (TM), and defect-prone topic membership (DTM). We study how our proposed topic-based metrics can help better explain software defects. We also compare one of our metrics (NT), which measures cohesion in a software system using topics, with other state-of-the-art topic-based cohesion and coupling metrics. To the best of our knowledge, our study provides the first detailed comparison of the defect explanatory power of state-of-the-art topic-based cohesion and coupling metrics. We perform a detailed case study on multiple versions of four real-world software systems, with a focus on the following research questions:
RQ1: Are some topics more defect-prone than others?
We find that some topics, such as those related to new features and the core functionality of a system, may have a much higher cumulative defect density (CDDT) than others (average skewness in CDDT is 7.25, where a skewness of 1 is already considered highly skewed). We also find that defect-prone topics are likely to remain so over time, indicating that prior defect-proneness of a topic can be used to explain the future defect-proneness of topics and their associated files (Spearman correlation is 0.44–0.67).
RQ2: Can our proposed topic-based metrics help explain why some files are more defect-prone than others?
We find that our proposed topic-based metrics provide additional explanatory power (4 – 314% improvement) about the defect-proneness of files over existing state-of-the-art product and process metrics such as lines of code, code churn, and number of pre-release defects.
RQ3: How do our topic-based metrics compare with state-of-the-art topic-based cohesion and coupling metrics?
We find that our metric outperforms state-of-the-art topic-based cohesion and coupling metrics. Compared to state-of-the-art, our metric gives a larger improvement in defect explanatory power (8 – 55%) when using lines of code as a baseline metric. Thus, practitioners may benefit from including our metrics when analyzing software quality using cohesion and coupling.
This work extends our previous work (Chen et al., 2012). First, we extend our case studies to include additional systems (NetBeans 4.0, 5.0, and 5.5.1). Second, we compare one of our topic-based metrics, which measures file cohesion, with state-of-the-art topic-based cohesion and coupling metrics (Section 5). We conduct a detailed comparison on how topic-based cohesion and coupling metrics help explain software defects. Finally, we study the sensitivity of the parameters that we use in our approach (Section 6). We have also made our data-sets and results publicly available (Chen, 2014), and encourage researchers to replicate and verify our study.
The rest of this paper is organized as follows. Section 2 describes our approach to discover topics in source code files, and we define the topic-based metrics that we use to answer our research questions. Section 3 introduces the studied systems and outlines the design of our case studies. Sections 4 and 5 present the result of our research questions. Section 6 discusses the parameter sensitivity of our approach. Section 7 talks about the potential threats to the validity of our findings, and Section 8 describes related work. Finally, Section 9 concludes the paper.
Section snippets
Proposed approach
In this section, we outline our approach of using topics to explain defects. First, we briefly introduce topic modeling and describe how it can be applied to source code files to approximate conceptual concerns (i.e., main business logic). Next, we motivate and describe our new topic-based metrics.
Case study design
In this section, we introduce the systems that we use for our case study and we describe our analysis process, depicted in Fig. 2.
Case study results
In this section, we present the results of our case study. We present each research question along three parts: the used approach to address the question; our experimental results; and a discussion of the results.
RQ3: How do our metrics compare with state-of-the-art topic-based cohesion and coupling metrics?
Maintaining a high cohesion and low coupling among source code files during development can help reduce maintenance costs and improve the reliability of a software system (Macro, Buxton, 1987, Fenton, 1991). Researchers have used various software structures, such as interactions among variables and methods, to measure cohesion and coupling in software systems (Allen, Khoshgoftaar, 1999, Chae, Kwon, Bae, 2000, Bieman, Kang, 1998, Briand, Daly, Wüst, 1998, Menzies, Butcher, Cok, Marcus, Layman,
Sensitivity analysis for the parameters of our approach
In our approach, we use several parameter values for LDA and our proposed topic-based metrics: two Dirichlet priors for smoothing (α and β), the number of iterations (II), the number of topics (K), and δ in NT and NDT. We perform a parameter sensitivity analysis to see how these parameters affect the defect explanatory power of our topic-based metrics. We do not change the LDA parameters of the topic-based cohesion and coupling metrics in Section 5, since we are only interested in comparing how
Threats to validity
The results of our case study provide an initial evaluation of using topics to explain software defects, and we show that our topic-based metric outperforms state-of-the-art topic-based cohesion and coupling metrics. However, we note the following threats to the validity of our findings.
Applying topic models to software engineering tasks
Recently, many researchers have used topic modeling approaches to understand software systems from a different point of view than from the traditional structural and historical views (Chen et al., 2015). For example, Kuhn et al. (2007), used Latent Semantic Indexing (LSI) to cluster the files in a software system according to the similarity of word usage. Maskeri et al. (2008), were the first to apply LDA to source code to uncover its conceptual concerns. Prior studies used topics to study the
Conclusions and future work
In this paper, we aim to understand the relationship between the conceptual concerns in source code files, i.e., their technical content, with their defect-proneness. To do so, we captured the concerns in each file using topics, and proposed new metrics based on these topics. In particular, we considered the defect history of each topic, which we hypothesized would help better explain the defect-proneness of the files.
To evaluate our new metrics, we performed a detailed case study on multiple
Acknowledgements
We thank Dr. Yasutaka Kamei for providing us the bug data-sets of the studied systems that are used in this paper.
References (82)
- et al.
Factor Analysis: An Applied Approach
(1993) - et al.
Empirical validation of object-oriented metrics on open source software for fault prediction
IEEE Trans. Softw. Eng.
(2005) - et al.
Applied linear regression models
(1989) - et al.
The craft of software engineering
(1987) - et al.
Introduction to Information Retrieval
(2008) - et al.
Using latent Dirichlet allocation for automatic categorization of software
Proceedings of the Sixth International Working Conference on Mining Software Repositories
(2009) - Eclipse, 2012....
- Mozilla firefox, 2012....
- Mylyn, 2012....
- Netbeans, 2012....
Measuring coupling and cohesion: An information-theory approach
Proceedings of the Sixth International Symposium on Software Metrics
Software traceability with topic modeling
Proceedings of the Thirty-Second International Conference on Software Engineering
A theory of aspects as latent topics
Proceedings of the Twenty-Third ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications
Measuring design-level cohesion
IEEE Trans. Softw. Eng.
Configuring latent dirichlet allocation based feature location
Empir. Softw. Eng.
Understanding lda in source code analysis
Proceedings of the Twenty-Second International Conference on Program Comprehension
Don’t touch my code!: Examining the effects of ownership on software quality
Proceedings of the Nineteenth Symposium on the Foundations of Software Engineering and the Thirteenth European Software Engineering Conference
Exploring defect data from development and customer usage on software modules over multiple releases
Proceedings of the Ninth International Symposium on Software Reliability Engineering
Latent Dirichlet allocation
J. Mach. Learn. Res.
Statistics in a Nutshell: a Desktop Quick Reference
A unified framework for cohesion measurement in object-orientedsystems
Empir. Softw. Eng.
Class-based n-gram models of natural language
Comput. Linguist.
Multimodel inference: Understanding AIC and BIC in model selection
Sociol. Methods Res.
A cohesion measure for object-oriented classes
Softw. Pract. Exp.
Relational topic models for document networks
Proceedings of the Twelth International Conference on Artifiücial Intelligence and Statistics
STUDYING SOFTWARE QUALITY USING TOPIC MODELS
A survey on the use of topic models when mining software repositories
Empir. Softw. Eng.
Explaining software defects using topic models
Proceedings of the Ninth Working Conference on Mining Software Repositories
An empirical analysis of information retrieval based concept location techniques in software comprehension
Empir. Softw. Eng.
An analysis of static metrics and faults in c software
J. Syst. Softw.
An extensive comparison of bug prediction approaches
Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), Cape Town
Do crosscutting concerns cause defects?
IEEE Trans. Softw. Eng.
Applied Probability and Stochastic Processes
Software Metrics: A Rigorous Approach
Using relational topic models to capture coupling among classes in object-oriented software systems
Proceedings of the Twenty-Sixth International Conference on Software Maintenance
Introduction to regression analysis
Estimating the optimal number of latent concepts in source code analysis
Proceedings of the 2010 Tenth IEEE Working Conference on Source Code Analysis and Manipulation
Using heuristics to estimate an appropriate number of latent topics in source code analysis
Sci. Comput. Programm.
Statistical methods in hydrology
A systematic review of fault prediction performance in software engineering
IEEE Trans. Softw. Eng.
Cited by (18)
An integrated probabilistic graphic model and FMEA approach to identify product defects from social media data
2021, Expert Systems with ApplicationsCitation Excerpt :Given that PGMs are useful tools to extract main topics from texts, they have shown their potential to discover defects in many studies. Famous PGMs like LDA or STM have been applied and obtain good performance (Chen et al., 2017; Kuhn, 2018). But these PGMs can only derive main topics from corpora, they can not provide information on product defects detailly.
Studying the Relationship Between the Usage of APIs Discussed in the Crowd and Post-Release Defects
2020, Journal of Systems and SoftwareCitation Excerpt :Approach. Following prior studies (Chen et al., 2017; Shang et al., 2015), we are not going to directly predict defects in this step. Instead, we aim to investigate how much our crowd-related metrics can improve the deviance explained by the traditional baseline models.
A novel probabilistic graphic model to detect product defects from social media data
2020, Decision Support SystemsInsights of effectivity analysis of learning-based approaches towards software defect prediction
2024, International Journal of Electrical and Computer Engineering