ABSTRACT
Background. Early identification of software modules that are likely to be faulty helps practitioners take timely actions to improve these modules' quality and reduce development costs in the remainder of the development process. To this end, module faultiness estimation models can be built at any point during development by using measures collected up to that time. Models available in later phases are expected to be more accurate than those available in earlier phases. However, waiting until late in the development process may reduce the impact of the effectiveness and efficacy of any software quality improvement actions and increase their cost.
Aims. Our goal is to investigate to what extent using software code measures along with software design measures helps improve the accuracy of module faultiness estimation with respect to using software design measures alone.
Method. We built faultiness estimation models---by using Binary Logistic Regression, Naive Bayes, Support Vector Machines, and Decision Trees---for 54 datasets from the PROMISE repository. These datasets contain design and code measures and faultiness data of software modules of real-life projects. We compared the models built by using the code measures and design measures together against the models built by using design measures alone via a few accuracy indicators.
Results. The results indicate that the models built by using code measures and design measures together are only slightly more accurate than the models built by using design measures alone.
Conclusions. Our analysis shows that measures that can be obtained during design can provide models that are almost as accurate as models that can be achieved in later development phases. This is good news for practitioners, who can start early ---hence cheaper and more effective---quality improvement initiatives based on fairly reliable models.
- 2015. The PROMISE Repository of Empirical Software Engineering Data.Google Scholar
- A. Agresti. 2007. An introduction to categorical data analysis. Wiley-Blackwell. http://scholar.google.de/scholar.bib?q=info:zgZR_0-o5cUJ:scholar.google.com/&output=citation&hl=de&as_sdt=0,5&ct=citation&cd=0Google Scholar
- Pierre Baldi, Søren Brunak, Yves Chauvin, Claus AF Andersen, and Henrik Nielsen. 2000. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 5 (2000), 412--424.Google ScholarCross Ref
- Sarah Beecham, Tracy Hall, David Bowes, David Gray, Steve Counsell, and Sue Black. 2010. A systematic review of fault prediction approaches used in software engineering. Technical Report. Lero.Google Scholar
- C. E. Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilità. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 8 (1936), 3--62.Google Scholar
- Shyam R. Chidamber and Chris F. Kemerer. 1994. A Metrics Suite for Object Oriented Design. IEEE Trans. on Software Eng. 20, 6 (1994). Google ScholarDigital Library
- J.Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates.Google Scholar
- Donald E. Farrar and Robert R. Glauber. 1967. Multicollinearity in Regression Analysis: The Problem Revisited. The Review of Economics and Statistics 49, 1 (1967), 92--107. http://www.jstor.org/stable/1937887Google ScholarCross Ref
- Norman Fenton and James Bieman. 2014. Software metrics: a rigorous and practical approach. CRC Press. Google ScholarDigital Library
- Tracy Hall, Sarah Beecham, David Bowes, David Gray, and Steve Counsell. 2012. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. on Software Eng. 38, 6 (2012). Google ScholarDigital Library
- James W. Hardin and Joseph M. Hilbe. 2002. Generalized Estimating Equations. CRC Press, Abingdon.Google Scholar
- Larry V. Hedges and Ingram. Olkin. 1985. Statistical methods for meta-analysis / Larry V. Hedges, Ingram Olkin. Academic Press Orlando. xxii, 369 p.: pages. http://www.loc.gov/catdir/toc/els031/84012469.htmlGoogle Scholar
- David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. John Wiley & Sons.Google ScholarCross Ref
- Eibe Frank Ian H. Witten. 2005. Data mining:practical machine learning tools and techniques (2nd ed ed.). Morgan Kaufman.Google Scholar
- Yue Jiang, Bojan Cuki, Tim Menzies, and Nick Bartlow. 2008. Comparing Design and Code Metrics for Software Quality Prediction. In Proceedings of the 4th International Workshop on Predictor Models in Software Engineering (PROMISE '08). ACM, New York, NY, USA, 11--18. Google ScholarDigital Library
- David H. Krantz, R. Duncan Luce, Patrick Suppes, and Amos Tversky. 1971. Foundations of Measurement. Vol. 1. Academic Press, San Diego.Google Scholar
- Brian W Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, 2 (1975), 442--451.Google ScholarCross Ref
- Sandro Morasca. 2009. A probability-based approach for measuring external attributes of software artifacts. In 3rd Int. Symposium on Empirical Software Engineering and Measurement. IEEE Computer Society. Google ScholarDigital Library
- Sandro Morasca and Luigi Lavazza. 2017. Risk-averse slope-based thresholds: Definition and empirical evaluation. Information and Software Technology (2017).Google Scholar
- Linda M Ottenstein, Victor B Schneider, and Maurice H Halstead. 1976. Predicting the number of bugs expected in a program module. (1976).Google Scholar
- J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Google ScholarDigital Library
- R Core Team. 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing.Google Scholar
- Danijel Radjenović, Marjan Heričko, Richard Torkar, and Aleš Živkovič. 2013. Software fault prediction metrics: A systematic literature review. Information and Software Technology 55, 8 (2013), 1397--1418. Google ScholarDigital Library
- C.J. Rijsbergen. 1979. Information Retrieval. Butterworths. Google ScholarDigital Library
- J P Shaffer. 1995. Multiple Hypothesis Testing. Annual Review of Psychology 46, 1 (1995), 561--584.Google ScholarCross Ref
- Martin Shepperd, David Bowes, and Tracy Hall. 2014. Researcher bias: The use of machine learning in software defect prediction. IEEE Transactions on Software Engineering 40, 6 (2014), 603--616.Google ScholarCross Ref
- Marco Torchiano. 2017. Package 'effsize'. (2017).Google Scholar
- Ming Zhao, Claes Wohlin, Niclas Ohlsson, and Min Xie. 1998. A comparison between software design and code metrics for the prediction of software fault content. Information & Software Technology 40, 14 (1998), 801--809. Google ScholarDigital Library
Index Terms
- Comparing the Effectiveness of Using Design and Code Measures in Software Faultiness Estimation
Recommendations
Exploring Software Measures to Assess Program Comprehension
ESEM '11: Proceedings of the 2011 International Symposium on Empirical Software Engineering and MeasurementSoftware measures are often used to assess program comprehension, although their applicability is discussed controversially. Often, their application is based on plausibility arguments, which, however, is not sufficient to decide whether software ...
An Empirical Evaluation of Distribution-based Thresholds for Internal Software Measures
PROMISE 2016: Proceedings of the The 12th International Conference on Predictive Models and Data Analytics in Software EngineeringBackground Setting thresholds is important for the practical use of internal software measures, so software modules can be classified as having either acceptable or unacceptable quality, and software practitioners can take appropriate quality ...
Deriving models of software fault-proneness
SEKE '02: Proceedings of the 14th international conference on Software engineering and knowledge engineeringThe effectiveness of the software testing process is a key issue for meeting the increasing demand of quality without augmenting the overall costs of software development. The estimation of software fault-proneness is important for assessing costs and ...
Comments