Identification of defect-prone classes in telecommunication software systems using design metrics
Introduction
Quality is a key element in the success of any software product. Real-time telecommunication systems have even more stringent requirements for quality, since their failure may result in penalties to developers, losses of essential services for users, and even life threatening situations.
Assuring high quality is an increasingly complex time- and effort-consuming activity. Consequently, it is essential to identify the most critical parts of software systems to properly allocate resources for defect detection and removal. As the cost of fixing defects increases over time [25], it is also important to perform such identification as early as possible.
Moreover, the ability to identify the most defect prone modules early in the development cycle provides critical support for design inspections and for design-based refactoring [14].
Models based on software metrics can be employed towards this goal.
Object-oriented methodologies offer a substantial amount of information about the system even before any coding has started. A widely used set of software metrics for object-oriented systems is the CK suite [9]. The suite has shown its usefulness in several studies [31], [3], [10], [7], [42]. However, most of these studies have limited practical effect since they were constructed on data from academic environment, and based on small datasets. In this work, we attempt to review empirically the possibility of early identification of the classes that contain most of the defects in the system.
In particular, using large industrial datasets, we validate for our target application domain previous anecdotal claims that:
- •
High values of associations between classes identify defect prone classes.
- •
Defects are concentrated in a few classes and are distributed according to a Poisson-like curve.
Furthermore, we evidence that it is possible to recognize defect prone classes early in the software lifecycle, when defects are easier to fix.
In this paper, we present the results of an extensive survey based on five real-time, telecommunication software systems developed by a North-American company.
For the applications analyzed in this study, all the modifications of the classes caused by defects in software operation were recorded throughout the development process. The number of modifications referring to defects, widely used in other scientific studies [17], represents a good estimation of the defect-proneness of a class.
The internal product metrics are collected from stable product releases, and include the CK metrics suite.
To deal with typical problems related to software metrics, such as absolute measurement scale, non-normal distribution, high variance, and underprediction of zero values, Poisson regression models (PRM) and negative binomial regression models (NBRM) [7] are employed. In addition to these models, we also apply zero-inflated negative binomial regression models (ZINBRM) [33].
The models are evaluated using the correlation between the model estimations and the observed data, and the dispersion parameter. We also use Alberg diagrams for graphical evaluation of models’ performances [39], [16]. These diagrams represent an additional method for assessment and comparison of the models’ ability to identify the most critical classes in the system.
This paper is organized as follows. An overview of the applied metrics and statistical methods used in this study is provided in Section 2. The reference products and the experimental data are presented in Section 3. In Section 4, the regression models are applied to estimate the number of defects. The analysis of the results is presented in Section 5. In Section 6, this study is compared with other work performed in this area. Conclusions and directions for future research are presented in Section 7.
Section snippets
Background
In this section we discuss the metrics collection process and methods used in our statistical analysis. We also introduce Alberg diagrams, which are employed for evaluating the resulting models.
Discussion of the experimental data
This research focuses on five projects developed by a North-American company that prefers to remain anonymous. The development and testing spanned over approximately five years. The projects are developed for embedded systems in a real-time telecommunication domain, mainly using the C++ programming language.
The development teams of the projects had equivalent skill-sets. The developers assigned to each project had similar experience and education levels equivalent to a B.Sc. in Electrical and
Extraction of the models
To determine the most suitable models, we proceed as follows.
First, the best predictors are identified using the Spearman rank correlation between the number of defects and the internal metrics (Table 3).
Often relations between internal metrics and quality can be explained in terms of size metrics [13]. In this study, we use RFC and NOM, which are size metrics in the sense of [3]. We also use LOC to determine the information that is lost using the early available design-based models instead of
Analysis of the results
The performances of the models reported in Table 4, differ significantly.
In general, ML models exhibit reduced overdispersion, sometimes at the expenses of a reduced correlation coefficient. Within the ML models, the ZINBRM performs the best overall.
As mentioned, the ZINBRM assumes that the overall population is formed by two independent groups, one with zero defects and one with a negative binomial defects distribution. The predominance of the ZINBRM provides the statistical confirmation of
Relationships with other studies
An overview of some of the most significant works in this area is presented in Table 6.
As these investigations use different dependent variables and different explanatory metrics, it is not possible to perform quantitative comparisons of the results or to infer solid generalizations. Still, all the dependent variables used in these papers try to explain various aspects of software development effort and quality using design measures.
Although the dependent variables are often heavily skewed and
Conclusion
In this paper we evidence that early lifecycle metrics can be used for identifying the most defect prone classes in the context of real-time, telecommunication software systems developed using C++.
To this end, we adopt statistical models applicable to count data, i.e., PRM, NBRM, and ZINBRM. These models account for the typical problems with the software metrics data, such as overdispersion and heterogeneity. The CK metrics and LOC are used as independent variables. Number of modifications as a
Acknowledgements
The authors acknowledge the support of the Natural Science and Engineering Research Council of Canada, the Government of Alberta, the University of Alberta, and the Free University of Bozen. Special thanks also go to Eric Liu for his contribution to this work. Thanks also to Luigi Benedicenti, Snezana Djokic, Arrigo L. Frisiani and Forrest Shull for their valuable feedbacks on, and inputs to this work.
References (47)
- et al.
A comparison of techniques for developing predictive models of software metrics
Information and Software Technology
(1997) - et al.
Empirical evaluation of reuse sensitiveness of complexity metrics
Information and Software Technology
(1999) Another metric suite for object-oriented programming
Journal of Systems and Software
(1998)- et al.
Object-oriented metrics that predict maintainability
Journal of Systems and Software
(1993) - et al.
Effort estimation and prediction of object-oriented systems
Journal of Systems and Software
(1998) - et al.
Early estimation of software size in object-oriented environment—a case study in a CMM Level 3 Software Firm
Information Sciences
(2006) - et al.
Statistics for the Behavioral and Social Sciences
(1997) - et al.
Design and code complexity metrics for OO classes
Journal of Object Oriented Programming
(1999) - et al.
A validation of object-oriented design metrics as quality indicators
IEEE Transactions on Software Engineering
(1996) - et al.
On the application of measurement theory in software engineering
Journal of Empirical Software Engineering
(1996)
Property-based software engineering measurement
IEEE Transactions on Software Engineering
Econometric models based on count data: comparisons and applications of some estimators and tests
Journal of Applied Econometrics
A metrics suite for object oriented design
IEEE Transactions on Software Engineering
Managerial use of object-oriented software: an explanatory analysis
IEEE Transactions on Software Engineering
Metrics for identifying critical components in software projects
Refactoring: Improving the Design of Existing Code
A critique of software defect prediction models
IEEE Transactions on Software Engineering
Quantitative analysis of faults and failures in a complex software system
IEEE Transactions on Software Engineering
Software Metrics: A Rigorous and Practical Approach
Cited by (70)
Predicting the precise number of software defects: Are we there yet?
2022, Information and Software TechnologyEmpirical investigation of hyperparameter optimization for software defect count prediction
2022, Expert Systems with ApplicationsDNNAttention: A deep neural network and attention based architecture for cross project defect number prediction
2021, Knowledge-Based SystemsBCV-Predictor: A bug count vector predictor of a successive version of the software system
2020, Knowledge-Based SystemsCitation Excerpt :Similar work is also reported [49], which enhances the accuracy of existing model by 20%. Janes et al. [50] and Liguo et al. [51] also utilized the NBR in a similar aspect. Liguo et al. [51] applied NBR over object-oriented metrics of large telecommunication systems.
Software defect number prediction: Unsupervised vs supervised methods
2019, Information and Software TechnologyTowards an ensemble based system for predicting the number of software faults
2017, Expert Systems with Applications