Skip to main content
Log in

A decision analysis approach for selecting software defect prediction method in the early phases

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

One of the most important quality indicators of a software product is its defect rates. In this regard and also with the proliferation in methods and tools supporting prediction in software engineering, the interest in software defect prediction (SDP) is increasing. Eventually, it becomes important for stakeholders to build the desired SDP model as early as possible and use it throughout the software development lifecycle. We aim to present a two-phase decision analysis approach, which is structured using decision tree and multi-criteria decision analysis (MCDA), in order to select the best-fit SDP method. To do this, we specify and use criteria to evaluate SDP methods according to the dataset characteristics and stakeholder needs that are elicited via a questionnaire in the early phases of the development lifecycle. We systematically determine the alternatives to be evaluated in the decision analysis and the criteria that may have an impact on the decision. In doing so, we conduct two different expert opinion studies to formulate the decision analysis. We also present case studies with selected SDP methods using public datasets, and investigate the trustworthiness of the proposed approach. The most convenient methods proposed by the decision analysis are naïve Bayes (NB), decision tree (DT), and fuzzy logic for the case studies. It is inferred that the results of the decision analysis are consistent with the results of the empirical evidence that we present. The presented approach could be useful in helping software practitioners decide which SDP method is advantageous by revealing their specific requirements for the software projects and associated defect data. While our results provide guidance for future research on the context of early software defect prediction (ESDP), further studies on real software projects are necessary in order to expand knowledge prior to having decisions that are more reliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Abts, C., Clark, B., Devnani-Chulani, S., Horowitz, E., Madachy, R., Reifer, D., et al. (1998). Cocomo II model definition manual.

  • Alan, O., & Catal, C. (2009). An outlier detection algorithm based on object-oriented metrics thresholds. In 2009 24th International Symposium on Computer and Information Sciences, ISCIS 2009 (pp. 567–570). https://doi.org/10.1109/ISCIS.2009.5291882

  • Baker, D., Bridges, D., Hunter, R., Johnson, G., Krupa, J., Murphy, J., & Sorenson, K. (2001). Guidebook to decision-making methods. USA.

  • Balogun, A. O., Bajeh, A. O., Orie, V. A., & Yusuf-asaju, A. W. (2018). Software defect prediction using ensemble learning: An ANP based evaluation method. Journal of Engineering and Technology, 3(2), 50–55.

    Google Scholar 

  • Bellman, R. E., & Zadeh, L. A. (1970). Decision-making in a fuzzy environment. Management Science, 17(4), B-141-B-164. https://doi.org/10.1287/mnsc.17.4.b141

  • Belton, V., & Stewart, T. (2002). Multiple criteria decision analysis: An integrated approach. Springer, US. https://doi.org/10.1007/978-1-4615-1495-4

    Article  Google Scholar 

  • Boehm, B., & Basili, V. R. (2001). Software defect reduction top 10 List. Computer, 10(1109/2), 962984.

    Google Scholar 

  • Brans, J. -P., & Mareschal, B. (2005). PROMETHEE methods. In International Series in Operations Research and Management Science, 78, 163–195. Springer New York LLC. https://doi.org/10.1007/0-387-23081-5_5

  • Catal, C. (2011). Software fault prediction: A literature review and current trends. Expert Systems with Applications, 38(4), 4626–4636. https://doi.org/10.1016/j.eswa.2010.10.024

    Article  Google Scholar 

  • Catal, C., & Diri, B. (2009a). A systematic review of software fault prediction studies. Expert Systems with Applications, 36(4), 7346–7354. https://doi.org/10.1016/j.eswa.2008.10.027

    Article  Google Scholar 

  • Catal, C., & Diri, B. (2009b). Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Information Sciences, 179(8), 1040–1058. https://doi.org/10.1016/J.INS.2008.12.001

    Article  Google Scholar 

  • Challagulla, V. U. B., Bastani, F. B., Yen, I. L., & Paul, R. A. (2008). Empirical assessment of machine learning based software defect prediction techniques. International Journal on Artificial Intelligence Tools, 17(2), 389–400. https://doi.org/10.1142/S0218213008003947

    Article  Google Scholar 

  • Chatterjee, S., & Maji, B. (2016). A new fuzzy rule based algorithm for estimating software faults in early phase of development. Soft Computing, 20(10), 4023–4035. https://doi.org/10.1007/s00500-015-1738-x

    Article  Google Scholar 

  • Chen, C. T. (2000). Extensions of the TOPSIS for group decision-making under fuzzy environment. Fuzzy Sets and Systems, 114(1), 1–9. https://doi.org/10.1016/S0165-0114(97)00377-1

    Article  MATH  Google Scholar 

  • Chen, C. T., Lin, C. T., & Huang, S. F. (2006). A fuzzy approach for supplier evaluation and selection in supply chain management. International Journal of Production Economics, 102(2), 289–301. https://doi.org/10.1016/j.ijpe.2005.03.009

    Article  Google Scholar 

  • Chen, S. -J., & Hwang, C. -L. (1992). Fuzzy multiple attribute decision making: Methods and applications (Vol. 375). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-46768-4

  • Collofello, J. S., & Woodfield, S. N. (1989). Evaluating the effectiveness of reliability-assurance techniques. Journal of Systems and Software, 9(3), 191–195. https://doi.org/10.1016/0164-1212(89)90039-3

    Article  Google Scholar 

  • Das Dôres, S. N., Alves, L., Ruiz, D. D., & Barros, R. C. (2016). A meta-learning framework for algorithm recommendation in software fault prediction. Proceedings of the ACM Symposium on Applied Computing, pp 1486–1491. https://doi.org/10.1145/2851613.2851788

  • Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.

    MathSciNet  MATH  Google Scholar 

  • Dodgson, J., Spackman, M., Pearman, A., Phillips, L., Dodgson, J., Spackman, M., et al. (2009). Multi-criteria analysis: a manual. London: Department for Communities and Local Government. http://eprints.lse.ac.uk/12761/. (Accessed 3 Jan 2020).

  • Fenton, N., & Bieman, J. (2014). Software Metrics: A Rigorous and Practical Approach. https://doi.org/10.1201/b17461

    Article  Google Scholar 

  • Fenton, N., Neil, M., Marsh, W., Hearty, P., Radlinski, L., & Krause, P. (2008). On the effectiveness of early life cycle defect prediction with Bayesian nets. Empirical Software Engineering, 13(5), 499–537. https://doi.org/10.1007/s10664-008-9072-x

    Article  Google Scholar 

  • Fenton, N., Neil, M., Marsh, W., Hearty, P., Radliński, Ł., & Krause, P. (2007). Project data incorporating qualitative factors for improved software defect prediction. In Third International Workshop on Predictor Models in Software Engineering.

  • Figueira, J. R., Mousseau, V., & Roy, B. (2016). ELECTRE methods. International Series in Operations Research and Management Science, 233, 155–185. https://doi.org/10.1007/978-1-4939-3094-4_5

    Article  MATH  Google Scholar 

  • Frank, E., Mark A., H., & Ian H., W. (2016). The WEKA workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques.” Morgan Kaufmann, Fourth Edition.

  • Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1), 86–92. https://www.jstor.org/stable/2235971. Accessed 28 March 2022.

  • Fu, W., Menzies, T., & Shen, X. (2016). Tuning for software analytics: Is it really necessary? Information and Software Technology, 76, 135–146. https://doi.org/10.1016/J.INFSOF.2016.04.017

    Article  Google Scholar 

  • Fulop, J. (2005). Introduction to decision making methods. Laboratory of Operations Research and Decision Systems: Computer and Automation Institute.

  • Ghotra, B., McIntosh, S., & Hassan, A. E. (2015). Revisiting the impact of classification techniques on the performance of defect prediction models. In Proceedings - International Conference on Software Engineering (Vol. 1, pp. 789–800). IEEE Computer Society. https://doi.org/10.1109/ICSE.2015.91

  • Goh, W. A. (2010). Applying multi-criteria decision analysis for software quality assessment methods (Master’s Thesis). Blekinge Institute of Technology, Sweden.

  • Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304. https://doi.org/10.1109/TSE.2011.103

    Article  Google Scholar 

  • Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques. A volume in The Morgan Kaufmann Series in Data Management Systems (3rd ed.). Elsevier Inc. https://doi.org/10.1016/B978-0-12-381479-1.00001-0

  • Hosseini, S., Turhan, B., & Gunarathna, D. (2017). A systematic literature review and meta-analysis on cross project defect prediction. IEEE Transactions on Software Engineering, 45(2), 111–147. https://doi.org/10.1109/TSE.2017.2770124

    Article  Google Scholar 

  • Hwang, C., & Yoon, K. (1981). Multiple attribute decision making: Methods and applications, a state of the art survey. Springer-Verlag (Vol. 1). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-48318-9

  • Iqbal, A., Aftab, S., Ali, U., Nawaz, Z., Sana, L., Ahmad, M., & Husen, A. (2019). Performance analysis of machine learning techniques on software defect prediction using NASA datasets. International Journal of Advanced Computer Science and Applications, 10(5), 300–308. https://doi.org/10.14569/ijacsa.2019.0100538

  • Jiang, Y., Lin, J., Cukic, B., Lin, S., & Hu, Z. (2013). S42-replacing code metrics in software fault prediction with early life cycle metrics. Third International Conference on Information Science and Technology. https://doi.org/10.1109/SCC.2014.108

    Article  Google Scholar 

  • Jones, C., & Bonsignour, O. (2011). The Economics of Software Quality (1st ed.). Addison-Wesley Professional.

  • Klir, G. J., & Yuan, B. (1995). Fuzzy sets and fuzzy logic : theory and applications. Prentice Hall PTR.

  • Kou, G., Peng, Y., Shi, Y., & Wu, W. (2012). Classifier evaluation for software defect prediction. Studies in Informatics and Control, 21(2), 117–126. https://doi.org/10.24846/v21i2y201201

  • Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer. https://doi.org/10.1007/978-1-4614-6849-3

    Book  MATH  Google Scholar 

  • Ma, Y., Zhu, S., Qin, K., & Luo, G. (2014). Combining the requirement information for software defect estimation in design time. Information Processing Letters, 114(9), 469–474. https://doi.org/10.1016/j.ipl.2014.03.012

    Article  MathSciNet  MATH  Google Scholar 

  • Mahdavi, I., Heidarzade, A., Sadeghpour-Gildeh, B., & Mahdavi-Amiri, N. (2009). A general fuzzy TOPSIS model in multiple criteria decision making. International Journal of Advanced Manufacturing Technology, 45(3–4), 406–420. https://doi.org/10.1007/s00170-009-1971-5

    Article  Google Scholar 

  • Malhotra, R. (2015a). A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing, 27, 504–518. https://doi.org/10.1016/j.asoc.2014.11.023

    Article  Google Scholar 

  • Malhotra, R. (2015b). A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing Journal, 27, 504–518. https://doi.org/10.1016/j.asoc.2014.11.023

    Article  Google Scholar 

  • MATLAB. (2016). Fuzzy logic toolbox 2.2.23 version 9.0.0.341360 (R2016a). Natick, Massachusetts: The Mathworks Inc.

  • McCabe, T. J. (1976). A complexity measure. IEEE Transactions on Software Engineering, SE-2(4), 308–320. https://doi.org/10.1109/TSE.1976.233837

  • Menzies, T., Caglayan, B., Kocaguneli, E., Krall, J., Peters, F., & Turhan, B. (2012). The promise repository of empirical software engineering data. Available: promisedata. googlecode. com. North Carolina State University, Department of Computer Science. http://promise.site.uottawa.ca/SERepository/datasets-page.html

  • Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13. https://doi.org/10.1109/TSE.2007.256941

    Article  Google Scholar 

  • Menzies, T. (2008). nasa93 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.268419

  • Minitab. (2017). Statistical software, version 18.1. Minitab Incorporation, State College.

  • Motro, A. (1996). Sources of uncetainty, imprecision, and inconsistency in information systems. Uncertainty Management in Information Systems. https://doi.org/10.1080/03639040801928762

    Article  Google Scholar 

  • Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., & Yu, B. (2019). Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences of the United States of America, 116(44), 22071–22080. https://doi.org/10.1073/pnas.1900654116

    Article  MathSciNet  MATH  Google Scholar 

  • Nădăban, S., Dzitac, S., & Dzitac, I. (2016). Fuzzy TOPSIS: A general view. Procedia Computer Science, 91(December 2016), 823–831. https://doi.org/10.1016/j.procs.2016.07.088

  • Ozakinci, R., & Tarhan, A. (2016). The role of process in early software defect prediction: Methods, attributes and metrics. Communications in Computer and Information Science, 609, 287–300. https://doi.org/10.1007/978-3-319-38980-6_21

    Article  Google Scholar 

  • Özakıncı, R., & Tarhan, A. (2017). Paper repository and references for “Early software defect prediction: A systematic map and review.” https://doi.org/10.5281/ZENODO.3621223

  • Özakıncı, R., & Tarhan, A. (2018). Early software defect prediction: A systematic map and review. Journal of Systems and Software, 144, 216–239. https://doi.org/10.1016/j.jss.2018.06.025

    Article  Google Scholar 

  • Özakıncı, R., & Tarhan, A. (2019). An evaluation approach for selecting suitable defect prediction method at early phases. In Proceedings - 45th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2019 (pp. 199–203). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/SEAA.2019.00040

  • Özakıncı, R., & Tarhan, A. (2021, April 6). A decision analysis approach for selecting software defect prediction method in the early phases—Case study data, experiments, and results. https://doi.org/10.5281/zenodo.6478564

  • Pandey, A. K., & Goyal, N. K. (2009). A fuzzy model for early software fault prediction using process maturity and software metrics. In International Journal of Electronics Engineering, 1, 239–245. https://doi.org/10.1007/978-81-322-1176-1

  • Pandey, A. K., & Goyal, N. K. (2013). Early software reliability prediction. Studies in fuzziness and soft computing (Vol. 303). New Delhi: Springer. https://doi.org/10.1007/978-81-322-1176-1

  • Peng, Y., Kou, G., Wang, G., Wu, W., & Shi, Y. (2011). Ensemble of Software Defect Predictors: An Ahp-Based Evaluation Method. International Journal of Information Technology & Decision Making, 10(01), 187–206. https://doi.org/10.1142/s0219622011004282

    Article  Google Scholar 

  • Pereira, D. G., Afonso, A., & Medeiros, F. M. (2015). Overview of Friedmans test and post-hoc analysis. Communications in Statistics: Simulation and Computation, 44(10), 2636–2653. https://doi.org/10.1080/03610918.2014.931971

    Article  MathSciNet  MATH  Google Scholar 

  • Pohlert, T. (2021). PMCMRplus: Calculate pairwise multiple comparisons of mean rank sums extended (version 1.9.3). Retrieved April 20, 2022, from https://cran.r-project.org/web/packages/PMCMRplus/index.html

  • Portugal, I., Alencar, P., & Cowan, D. (2018). The use of machine learning algorithms in recommender systems: A systematic review. Expert Systems with Applications, 97, 205–227. https://doi.org/10.1016/j.eswa.2017.12.020

    Article  Google Scholar 

  • Quinlan, J. R. (1990). Decision trees and decision-making. IEEE Transactions on Systems, Man and Cybernetics, 20(2), 339–346. https://doi.org/10.1109/21.52545

    Article  Google Scholar 

  • Radjenović, D., Heričko, M., Torkar, R., & Živkovič, A. (2013). Software fault prediction metrics: A systematic literature review. Information and Software Technology, 55(8), 1397–1418. https://doi.org/10.1016/j.infsof.2013.02.009

    Article  Google Scholar 

  • Rana, R. (2015). Software defect prediction techniques in automotive domain: Evaluation, selection and adoption (Doctorate Thesis). Chalmers University of Technology & University of Gothenburg. https://doi.org/10.13140/RG.2.1.1452.8160

  • Rathore, S. S., & Kumar, S. (2017). A decision tree logic based recommendation system to select software fault prediction techniques. Computing, 99(3), 255–285. https://doi.org/10.1007/s00607-016-0489-6

    Article  MathSciNet  Google Scholar 

  • Saaty, T. L. (1986). Axiomatic foundation of the analytic hierarchy process. Management Science, 32(7), 841–855. https://doi.org/10.1287/mnsc.32.7.841

    Article  MathSciNet  MATH  Google Scholar 

  • Sandhu, P. S., Lata, S., & Grewal, D. K. (2012). Neural network approach for software defect prediction based on quantitative and qualitative factors. International Journal of Computer Theory and Engineering, 4(2), 298–303.

    Article  Google Scholar 

  • Shepperd, M., Song, Q., Sun, Z., & Mair, C. (2013a). NASA MDP dataset. A backup site of NASA defect datasets that were originally published by Shepperd et al. (2013a).

  • Shepperd, M., Song, Q., Sun, Z., & Mair, C. (2013b). Data quality: Some comments on the NASA software defect datasets. IEEE Transactions on Software Engineering, 39(9), 1208–1215. https://doi.org/10.1109/TSE.2013.11

    Article  Google Scholar 

  • Singh, P., Verma, S., & Vyas, O. P. (2014a). Cross project software fault prediction at design phase. International Journal of Computer, Electrical, Automation, Control and Information Engineering, 9(3), 800–8005. https://doi.org/10.5370/JEET.2014.9.4.742

    Article  Google Scholar 

  • Singh, P., Verma, S., & Vyas, O. P. (2014b). Software fault prediction at design phase. Journal of Electrical Engineering and Technology, 9(5), 1739–1745. https://doi.org/10.5370/JEET.2014.9.5.1739

    Article  Google Scholar 

  • Sitorus, F., Cilliers, J. J., & Brito-Parada, P. R. (2019). Multi-criteria decision making for the choice problem in mining and mineral processing: Applications and trends. Expert Systems with Applications, 121, 393–417. https://doi.org/10.1016/j.eswa.2018.12.001

    Article  Google Scholar 

  • Sodhi, B. & Tadinada, P. (2012). A simplified description of Fuzzy TOPSIS. ArXiv. https://arxiv.org/abs/1205.5098

  • Song, Q., Jia, Z., Shepperd, M., Ying, S., & Liu, J. (2011). A general software defect-proneness prediction framework. IEEE Transactions on Software Engineering, 37(3), 356–370.

    Article  Google Scholar 

  • Tantithamthavorn, C., McIntosh, S., Hassan, A. E., & Matsumoto, K. (2019). The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering, 45(7), 683–711. https://doi.org/10.1109/TSE.2018.2794977

    Article  Google Scholar 

  • Triantaphyllou, E. (2000). Multi-criteria decision making methods: A comparative study. Applied Optimization (Vol. 44). Boston, MA: Springer US. https://doi.org/10.1007/978-1-4757-3157-6

  • Wahono, R. S. (2015). A systematic literature review of software defect prediction: Research trends, datasets, methods and frameworks. Journal of Software Engineering, 1, 1–16.

  • Wahyudin, D., Ramler, R., & Biffl, S. (2008). A framework for defect prediction in specific software project contexts. In Proceedings of the Third IFIP TC 2 Central and East European conference on Software engineering techniques (pp. 261–274). Brno, Czech Republic: Springer-Verlag Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22386-0_20

  • Wątróbski, J., Jankowski, J., Ziemba, P., Karczmarczyk, A., & Zioło, M. (2019). Generalised framework for multi-criteria method selection. Omega (united Kingdom), 86, 107–124. https://doi.org/10.1016/j.omega.2018.07.004

    Article  Google Scholar 

  • Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Pratical machine learning tool and tecniques (3rd ed.). Morgan Kaufmann Publishers Inc.

    Google Scholar 

  • Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslén, A. (2012). Experimentation in software engineering (Vol. 9783642290). Springer Publishing Company, Incorporated. https://doi.org/10.1007/978-3-642-29044-2

  • Wu, W. (2015). Extension of analytic hierarchy model for high-efficiency clustering in software defect prediction. International Journal of Management Science., 2(2), 13–20.

    Google Scholar 

  • Yadav, D. K., Chaturvedi, S. K., & Misra, R. B. (2012). Early software defects prediction using fuzzy logic. International Journal of Performability Engineering, 8(4), 399–408.

    Google Scholar 

  • Yin, R. K. (2017). Case study research and applications: Design and methods (6th ed.). Thousand Oaks, California: SAGE Publications, Inc.

  • Zhang, W., Yang, Y., & Wang, Q. (2011). Handling missing data in software effort prediction with naive Bayes and EM algorithm. In 7th International Conference on Predictive Models in Software Engineering (Promise ’11). https://doi.org/10.1145/2020390.2020394

  • Zhou, Y., Fenton, N., Neil, M., & Zhu, C. (2013). Incorporating expert judgement into bayesian network machine learning. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (pp. 3249–3250).

Download references

Acknowledgements

The authors would like to thank the Editor-in-Chief and the anonymous reviewers for their valuable comments and suggestions.

Funding

This research received no external funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rana Özakıncı.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Özakıncı, R., Kolukısa Tarhan, A. A decision analysis approach for selecting software defect prediction method in the early phases. Software Qual J 31, 121–177 (2023). https://doi.org/10.1007/s11219-022-09595-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-022-09595-0

Keywords

Navigation