Skip to main content
Log in

A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA)

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Software quality is an important factor in the success of software companies. Traditional software quality assurance techniques face some serious limitations especially in terms of time and budget. This leads to increase in the use of machine learning classification techniques to predict software faults. Software fault prediction can help developers to uncover software problems in early stages of software life cycle. The extent to which these techniques can be generalized to different sizes of software, class imbalance problem, and identification of discriminative software metrics are the most critical challenges. In this paper, we have analyzed the performance of nine widely used machine learning classifiers—Bayes Net, NB, artificial neural network, support vector machines, K nearest neighbors, AdaBoost, Bagging, Zero R, and Random Forest for software fault prediction. Two standard sampling techniques—SMOTE and Resample with substitution are used to handle the class imbalance problem. We further used FLDA-based feature selection approach in combination with SMOTE and Resample to select most discriminative metrics. Then the top four classifiers based on performance are used for software fault prediction. The experimentation is carried out over 15 publically available datasets (small, medium and large) which are collected from PROMISE repository. The proposed Resample-FLDA method gives better performance as compared to existing methods in terms of precision, recall, f-measure and area under the curve.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Andr B et al (2010) A symbolic fault-prediction model based on multiobjective particle swarm optimization. J Syst Softw 83(5):868–882

    Article  Google Scholar 

  2. Manjula C, Florence L (2018) Deep neural network based hybrid approach for software defect prediction using software metrics. Cluster Comput. https://doi.org/10.1007/s10586-018-1696-z

  3. Rathore SS, Kumar S (2017) Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl 82(Supplement C):357–382

    Article  Google Scholar 

  4. Aljamaan HI, Elish MO (2009) An empirical study of bagging and boosting ensembles for identifying faulty classes in object-oriented software. In: IEEE Symposium on Computational Intelligence and Data Mining, 2009. CIDM’09. IEEE

  5. Chiu K-C, Huang Y-S, Lee T-Z (2008) A study of software reliability growth from the perspective of learning effects. Reliab Eng Syst Saf 93(10):1410–1421

    Article  Google Scholar 

  6. Gao K et al (2011) Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp 41(5):579–606

    Article  Google Scholar 

  7. Gray AR, MacDonell SG (1997) A comparison of techniques for developing predictive models of software metrics. Inf Softw Technol 39(6):425–437

    Article  Google Scholar 

  8. Sharma D, Chandra P (2018) Software fault prediction using machine-learning techniques. In: Smart computing and informatics. Springer, pp 541–549

  9. Muhamad FPB, Siahaan DO, Fatichah C (2018) Software fault prediction using filtering feature selection in cluster-based classification. IPTEK Proc Ser 4(1):59–64

    Article  Google Scholar 

  10. Hall T et al (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304

    Article  Google Scholar 

  11. Helmer G et al (2007) Software fault tree and coloured Petri net—based specification, design and implementation of agent-based intrusion detection systems. Int J Inf Comput Secur 1(1–2):109–142

    Google Scholar 

  12. Graves TL et al (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661

    Article  Google Scholar 

  13. Bell RM, Ostrand TJ, Weyuker EJ (2006) Looking for bugs in all the right places. In: Proceedings of the 2006 International Symposium on Software Testing and Analysis. ACM

  14. Weyuker EJ, Ostrand TJ, Bell RM (2007) Using developer information as a factor for fault prediction. In: International Workshop on Predictor Models in Software Engineering. PROMISE’07: ICSE Workshops 2007. IEEE

  15. Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. SIGSOFT Softw Eng Notes 29(4):86–96

    Article  Google Scholar 

  16. Janes A et al (2006) Identification of defect-prone classes in telecommunication software systems using design metrics. Inf Sci 176(24):3711–3734

    Article  Google Scholar 

  17. Mısırlı AT, Bener AB, Turhan B (2011) An industrial case study of classifier ensembles for locating software defects. Softw Qual J 19(3):515–536

    Article  Google Scholar 

  18. Zheng J (2010) Cost-sensitive boosting neural networks for software defect prediction. Expert Syst Appl 37(6):4537–4543

    Article  Google Scholar 

  19. Twala B (2011) Software faults prediction using multiple classifiers. In: 2011 3rd International Conference on Computer Research and Development (ICCRD). IEEE

  20. Xu W et al (2011) Oncometabolite 2-hydroxyglutarate is a competitive inhibitor of α-ketoglutarate-dependent dioxygenases. Cancer Cell 19(1):17–30

    Article  MathSciNet  Google Scholar 

  21. Khoshgoftaar TM, Geleyn E, Nguyen L (2003) Empirical case studies of combining software quality classification models. In: Third International Conference on Quality Software, 2003. Proceedings. IEEE

  22. Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443

    Article  Google Scholar 

  23. Kamei Y et al (2007) The effects of over and under sampling on fault-prone module detection. In: First International Symposium on Empirical Software Engineering and Measurement, 2007. ESEM 2007. IEEE

  24. He P et al (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190

    Article  Google Scholar 

  25. Galar M et al (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(4):463–484

    Article  Google Scholar 

  26. Rodriguez D et al (2014) Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering. ACM

  27. Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering. ACM, Leipzig, pp 181–190

  28. Alexandre-Cortizo E, Rosa-Zurera M, Lopez-Ferreras F (2005) Application of fisher linear discriminant analysis to speech/music classification. In: EUROCON 2005-The International Conference on Computer as a Tool. IEEE

  29. Al Hindi A et al (2014) Automatic pronunciation error detection of nonnative Arabic Speech. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA). IEEE

  30. Franco H et al (1999) Automatic detection of phone-level mispronunciation for language learning. In: EUROSPEECH

  31. Strik H et al (2009) Comparing different approaches for automatic pronunciation error detection. Speech Commun 51(10):845–852

    Article  Google Scholar 

  32. Truong K et al (2004) Automatic pronunciation error detection: an acoustic-phonetic approach. In: InSTIL/ICALL Symposium 2004

  33. Ghazanfar MA (2015) Experimenting switching hybrid recommender systems. Intell Data Anal 19(4):845–877

    Article  Google Scholar 

  34. Singh P et al (2017) Fuzzy rule-based approach for software fault prediction. IEEE Trans Syst Man Cybern Syst 47(5):826–837

    Article  Google Scholar 

  35. Malhotra R (2014) Comparative analysis of statistical and machine learning methods for predicting faulty modules. Appl Soft Comput 21:286–297

    Article  Google Scholar 

  36. Malhotra R, Pritam N, Singh Y (204) On the applicability of evolutionary computation for software defect prediction. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE

  37. Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518

    Article  Google Scholar 

  38. Malhotra R, Bansal AJ (2015) Fault prediction considering threshold effects of object-oriented metrics. Expert Syst 32(2):203–219

    Article  Google Scholar 

  39. Stanic B, Afzal W (2017) Process metrics are not bad predictors of fault proneness. In: The 2017 IEEE International Workshop on Software Engineering and Knowledge Management SEKM’17, 25 July 2017, Prague, Sweden

  40. Shatnawi R (2017) The application of ROC analysis in threshold identification, data imbalance and metrics selection for software fault prediction. Innov Syst Softw Eng 13(2–3):201–217

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1A09919551).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seungmin Rho.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kalsoom, A., Maqsood, M., Ghazanfar, M.A. et al. A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA). J Supercomput 74, 4568–4602 (2018). https://doi.org/10.1007/s11227-018-2326-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2326-5

Keywords

Navigation